# Boxplot versus Violin plot

We generate two datasets: a normal distribution (mean 50) with a wider standard deviation and a bimodal distribution (means 40 and 60). The wider spread of the normal data makes the two boxplots look more similar, while the violin plot still reveals bimodality.

## Setup
- Import libraries
- Set random seed
- Create output folder and define file paths
- Configure seaborn style

In [1]:
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Reproducibility
rng = np.random.default_rng(0)

# Output paths
out_dir = "results"
os.makedirs(out_dir, exist_ok=True)
data_csv = os.path.join(out_dir, "box_vs_violin_data.csv")
box_png = os.path.join(out_dir, "boxplot.png")
violin_png = os.path.join(out_dir, "violinplot.png")

# Plot style
sns.set(style="whitegrid", context="notebook")

## Generate synthetic data
- Normal: mean 50, wider std (≈12) so the box looks similar to the bimodal one
- Bimodal: mixture of N(40, 6) and N(60, 6)
- Combine into a tidy DataFrame and save to CSV for inspection

In [2]:
n = 600
normal_vals = rng.normal(loc=50, scale=12, size=n)
bimodal_vals = np.concatenate([
    rng.normal(loc=40, scale=6, size=n // 2),
    rng.normal(loc=60, scale=6, size=n // 2)
])

values = np.concatenate([normal_vals, bimodal_vals])
groups = ["Normal (μ=50)"] * n + ["Bimodal (40,60)"] * n
df = pd.DataFrame({"group": groups, "value": values})

# Save data for reuse
df.to_csv(data_csv, index=False)

## Summaries (for comparison of box widths)
- Compute mean, std, and quartiles per group
- Save to CSV so we can inspect without printing in the notebook

In [3]:
summary = df.groupby("group")["value"].agg(
    mean="mean",
    std="std",
    q1=lambda s: s.quantile(0.25),
    median="median",
    q3=lambda s: s.quantile(0.75)
)
summary.to_csv(os.path.join(out_dir, "summary.csv"))

## Boxplot
Boxplots summarize center and spread; with the wider normal std, both boxes appear more similar despite different underlying shapes. We save the plot to disk.

In [4]:
fig_box, ax_box = plt.subplots(figsize=(6, 4), constrained_layout=True)
sns.boxplot(data=df, x="group", y="value", ax=ax_box)
ax_box.set_title("Boxplot: Normal vs Bimodal")
ax_box.set_xlabel("Distribution")
ax_box.set_ylabel("Value")
for t in ax_box.get_xticklabels():
    t.set_rotation(10)
fig_box.savefig(box_png, dpi=200, bbox_inches="tight")
plt.close(fig_box)

## Violin plot
Violin plots display the full density, clearly revealing the two peaks of the bimodal distribution even when the boxplots look similar. We save the plot to disk.

In [5]:
fig_violin, ax_violin = plt.subplots(figsize=(6, 4), constrained_layout=True)
sns.violinplot(data=df, x="group", y="value", inner="quartile", cut=0, bw_adjust=0.9, ax=ax_violin)
ax_violin.set_title("Violin plot: Normal vs Bimodal")
ax_violin.set_xlabel("Distribution")
ax_violin.set_ylabel("Value")
for t in ax_violin.get_xticklabels():
    t.set_rotation(10)
fig_violin.savefig(violin_png, dpi=200, bbox_inches="tight")
plt.close(fig_violin)

## Conclusion
- With a wider std for the normal data, the two boxplots look similar in spread and median.
- The violin plot still reveals the bimodal structure, illustrating why violins can be more informative for multi-modal data.