# Box Plot

- **Type**: **Distribution / Summary Statistics**
- **Purpose**: A box plot is used to visualize the **distribution of a dataset** by showing its **summary statistics**, including the median, quartiles, and potential outliers. It helps in understanding the **spread** and **skewness** of the data.

- **How It Works**:
  - The **box** represents the **interquartile range (IQR)**, which contains the middle 50% of the data.
  - The **line inside the box** represents the **median** (50th percentile).
  - The **whiskers** extend to the minimum and maximum values within 1.5 times the IQR.
  - **Outliers** are displayed as individual points outside the whiskers.

- **Common Use Cases**:
  - Comparing the distribution of **test scores** across different groups.
  - Visualizing **salary distributions** by department.
  - Summarizing and comparing distributions across multiple categories.

## Customization Parameters

### **Matplotlib Customization**

- **`patch_artist`**: If `True`, fills the box with color.
- **`notch`**: If `True`, adds a notch around the median to show confidence intervals.
- **`vert`**: Controls the orientation (`True` for vertical, `False` for horizontal).
- **`widths`**: Controls the width of the boxes.
- **`whiskerprops`**: Customizes the appearance of the whiskers (e.g., color, linewidth).

### **Seaborn Customization**

- **`hue`**: Colors the boxes based on a categorical variable.
- **`palette`**: Defines the color palette for the boxes.
- **`linewidth`**: Sets the thickness of the box lines.
- **`width`**: Controls the width of the boxes.
- **`fliersize`**: Controls the size of the outliers displayed as points.



In [1]:
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
import seaborn as sns

In [None]:
iris = datasets.load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df["type"] = pd.DataFrame(data=iris.target)
# Define a function to map the values
def map_flower_type(type_value: int):
    if type_value == 0: return 'setosa'
    if type_value == 1: return 'versicolor'
    if type_value == 2: return 'virginica'
    else: return 'Unknown'

df['flower'] = df['type'].apply(map_flower_type)

In [None]:
flower_types = df["flower"].unique()
sepal_length_data = []

for flower_type in flower_types:
    # Extract sepal length data for each flower type
    data_subset = df[df["flower"] == flower_type]["sepal length (cm)"]
    sepal_length_data.append(data_subset)

# Create separate boxplots for each flower type data
for i, data in enumerate(sepal_length_data):
    plt.boxplot(
        data,
        positions=[i],
        notch=False,
        patch_artist=True,
        label=flower_types[i],
        vert=True,
        widths=0.5,
    )

plt.title("Box Plot of Sepal Length by Flower Type (Matplotlib)")
plt.xlabel("Flower Type")
plt.ylabel("Sepal Length (cm)")
plt.xticks(range(len(flower_types)), flower_types)
plt.show()

In [None]:
sns.boxplot(
    x=df["flower"],
    y=df["sepal length (cm)"],
    hue="flower",
    palette="cool",
    fill=False,
    linewidth=1.5,
    width=0.4,
    legend=True,
    data=df,
)
plt.title('Box Plot of Sepal Lengths')
plt.xlabel('Flower Type')
plt.ylabel('Sepal Length (cm)')
plt.show()