# Scatter Plot

- **Type**: **Correlation**
- **Purpose**: A scatter plot is used to visualize the **relationship between two continuous variables** to identify potential correlations.

- **How It Works**:
  - Each point on the scatter plot represents an observation with two values: one for the **x-axis (independent variable)** and one for the **y-axis (dependent variable)**.
  - The overall pattern of the scatter points indicates:
    - **Positive correlation**: As one variable increases, the other also increases.
    - **Negative correlation**: As one variable increases, the other decreases.
    - **No correlation**: No clear relationship between the variables.

- **Common Use Cases**:
  - Exploring **relationships** between two variables, such as **height vs. weight** or **age vs. income**.
  - Detecting **outliers** or **clusters** in the data.

## Customization Parameters

### **Matplotlib Customization**

- **`cmap`**: Sets the color of the points.
- **`alpha`**: Controls the transparency of the points (range: 0 to 1).
- **`s`**: Sets the size of the scatter points.
- **`edgecolor`**: Sets the color of the border around the points.
- **`marker`**: Specifies the shape of the points (e.g., `'o'` for circles, `'x'` for crosses).

### **Seaborn Customization**

- **`hue`**: Colors the points by a categorical variable.
- **`style`**: Changes the shape of the points based on a categorical variable.
- **`palette`**: Defines the color palette used for the `hue` variable.
- **`size`**: Scales the size of points according to a continuous or categorical variable.


In [2]:
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
import seaborn as sns

In [None]:
iris = datasets.load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df["type"] = pd.DataFrame(data=iris.target)
# Define a function to map the values
def map_flower_type(type_value: int):
    if type_value == 0: return 'setosa'
    if type_value == 1: return 'versicolor'
    if type_value == 2: return 'virginica'
    else: return 'Unknown'

df['flower'] = df['type'].apply(map_flower_type)

In [None]:
plt.scatter(
    x=df.index,
    y=df["petal width (cm)"],
    c=df["type"],
    s=100,
    cmap="cool",
    alpha=0.7,
    edgecolors="black",
    marker="o",
)
plt.title("Petal Width")
plt.xlabel("Index")
plt.ylabel("Petal Width (cm)")
plt.show()

In [None]:
sns.scatterplot(
    x=df.index,
    y="petal width (cm)",
    hue="flower",
    style="flower",
    palette="cool",
    alpha=0.7,
    # size="sepal length (cm)",
    edgecolor="black",
    data=df,
)
plt.title("Petal Width")
plt.xlabel("Index")
plt.ylabel("Petal Width (cm)")
plt.show()

In [None]:
sns.lmplot(
    y="sepal width (cm)",
    x="petal width (cm)",
    hue="flower",
    data=df,
)
plt.title("Petal Width")
plt.xlabel("Index")
plt.ylabel("Petal Width (cm)")
plt.show()