# Histogram

- **Type**: **Distribution**
- **Purpose**: A histogram is used to visualize the **distribution of a single continuous variable** by dividing the data into bins and displaying the frequency of observations within each bin.

- **How It Works**:
  - The x-axis represents the **range of values** (bins) for the variable, and the y-axis represents the **frequency** or **count** of observations that fall within each bin.
  - The shape of the histogram gives insight into the **distribution** of the data, such as:
    - **Normal distribution**: A symmetric bell curve.
    - **Skewed distribution**: The data is skewed either left or right.
    - **Uniform distribution**: The frequency is relatively constant across bins.

- **Common Use Cases**:
  - Understanding the **distribution** of variables such as **income levels**, **exam scores**, or **ages**.
  - Identifying **skewness**, **kurtosis**, and **outliers** in the data.

## Customization Parameters

### **Matplotlib Customization**

- **`bins`**: Number of bins (bars) in the histogram.
- **`color`**: Color of the bars in the histogram.
- **`alpha`**: Controls the transparency of the bars (range: 0 to 1).
- **`edgecolor`**: Color of the border around each bar.
- **`density`**: If `True`, the histogram is normalized to show a **probability density** instead of counts.

### **Seaborn Customization**

- **`bins`**: Number of bins (bars) in the histogram.
- **`kde`**: If `True`, adds a **kernel density estimate (KDE)** line to the histogram for smoother distribution representation.
- **`color`**: Color of the bars in the histogram.
- **`multiple`**: Allows for stacking or overlaying multiple histograms (`'layer'`, `'stack'`, etc.).
- **`stat`**: Defines the vertical axis (`'count'`, `'density'`, `'probability'`).



In [1]:
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
import seaborn as sns

In [None]:
iris = datasets.load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df["type"] = pd.DataFrame(data=iris.target)
# Define a function to map the values
def map_flower_type(type_value: int):
    if type_value == 0: return 'setosa'
    if type_value == 1: return 'versicolor'
    if type_value == 2: return 'virginica'
    else: return 'Unknown'

df['flower'] = df['type'].apply(map_flower_type)

In [None]:
plt.figure(figsize=(8, 6))
plt.hist(
    df["petal length (cm)"],
    bins=10,
    density=False,
    color="green",
)
plt.title("Petal Length")
plt.xlabel("Petal Length (cm)")
plt.ylabel("Frequency")
plt.show()

In [None]:
sns.histplot(
    x="petal length (cm)",
    bins=10,
    hue="flower",
    stat="count",
    color="green",
    kde=True,
    multiple="layer",
    data=df,
)

plt.title("Petal Length Distribution")
plt.xlabel("Petal Length (cm)")
plt.ylabel("Count")  
plt.show()