# Boxplot versus Violin plot

This notebook generates two synthetic datasets (one normal, one bimodal) and visualizes them with seaborn using a boxplot and a violin plot. The goal is to highlight how violin plots can reveal multi-modal structure better than boxplots.

All plots and data are saved to disk for later reuse.

## Imports and setup
- Import required libraries
- Set a random seed for reproducibility
- Create an output folder for results
- Define file paths for saved outputs

In [1]:
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Reproducibility
rng = np.random.default_rng(42)

# Output directory and file paths
out_dir = "results"
os.makedirs(out_dir, exist_ok=True)
data_path = os.path.join(out_dir, "synthetic_data.csv")
boxplot_path = os.path.join(out_dir, "boxplot.png")
violinplot_path = os.path.join(out_dir, "violinplot.png")

# Seaborn style
sns.set(style="whitegrid", context="notebook")

## Generate synthetic data
- Normal distribution with mean=50
- Bimodal distribution composed of two normal distributions with means 40 and 60
- Combine and store in a tidy DataFrame
- Save to CSV for later reuse

In [2]:
# Parameters
n = 500

# Normal distribution (centered around 50)
normal_vals = rng.normal(loc=50, scale=8, size=n)

# Bimodal distribution (two peaks around 40 and 60)
bimodal_vals = np.concatenate([
    rng.normal(loc=40, scale=6, size=n // 2),
    rng.normal(loc=60, scale=6, size=n // 2)
])

# Combine into a long-form DataFrame
values = np.concatenate([normal_vals, bimodal_vals])
groups = (["Normal (μ=50)"] * normal_vals.size) + (["Bimodal (μ≈40,60)"] * bimodal_vals.size)
df = pd.DataFrame({"group": groups, "value": values})

# Save data
df.to_csv(data_path, index=False)

## Boxplot
A boxplot summarizes data using quartiles and whiskers but does not explicitly show multi-modality.
- Create and save the boxplot
- Store the figure in a variable for potential reuse

In [3]:
fig_box, ax_box = plt.subplots(figsize=(6, 4), constrained_layout=True)
sns.boxplot(data=df, x="group", y="value", ax=ax_box)
ax_box.set_title("Boxplot: Normal vs Bimodal")
ax_box.set_xlabel("Distribution")
ax_box.set_ylabel("Value")
for label in ax_box.get_xticklabels():
    label.set_rotation(15)

fig_box.savefig(boxplot_path, dpi=200, bbox_inches="tight")
plt.close(fig_box)

## Violin plot
A violin plot shows the full distribution density, making the bimodal nature clearly visible.
- Create and save the violin plot with quartile markers
- Store the figure in a variable for potential reuse

In [4]:
fig_violin, ax_violin = plt.subplots(figsize=(6, 4), constrained_layout=True)
sns.violinplot(data=df, x="group", y="value", inner="quartile", cut=0, bw_adjust=0.9, ax=ax_violin)
ax_violin.set_title("Violin plot: Normal vs Bimodal")
ax_violin.set_xlabel("Distribution")
ax_violin.set_ylabel("Value")
for label in ax_violin.get_xticklabels():
    label.set_rotation(15)

fig_violin.savefig(violinplot_path, dpi=200, bbox_inches="tight")
plt.close(fig_violin)

## Conclusion
- The boxplot summarizes central tendency and spread but hides the two modes in the bimodal dataset.
- The violin plot reveals the bimodal structure via its density, making it better suited for diagnosing multi-modal data.