# Frequentist Inference Case Study - Part B

Joey Carroll — Real-world application using `insurance2.csv`. Question: do smokers have higher average medical charges than non‑smokers?

In [None]:
import pandas as pd, numpy as np
from scipy.stats import t, norm
import matplotlib.pyplot as plt

df = pd.read_csv(r'/mnt/data/frequentist_part_a_src/insurance2.csv')
df.head()

## Two-sample Welch t-test

In [None]:
g1 = df[df['smoker'].astype(str).str.lower().isin(['yes','y','1','true'])]['charges'].astype(float)
g0 = df[df['smoker'].astype(str).str.lower().isin(['no','n','0','false'])]['charges'].astype(float)
len(g1), len(g0), float(np.mean(g1)), float(np.mean(g0))

In [None]:
m1, s1, n1 = float(np.mean(g1)), float(np.std(g1, ddof=1)), len(g1)
m0, s0, n0 = float(np.mean(g0)), float(np.std(g0, ddof=1)), len(g0)
se = np.sqrt(s1**2/n1 + s0**2/n0)
t_stat = (m1-m0)/se
df_w = (s1**2/n1 + s0**2/n0)**2 / ((s1**2/n1)**2/(n1-1) + (s0**2/n0)**2/(n0-1))
p_val = 1 - t.cdf(t_stat, df_w)
(t_stat, df_w, p_val)

In [None]:
crit = t.ppf(0.975, df_w)
ci_low = (m1-m0) - crit*se
ci_high = (m1-m0) + crit*se
(ci_low, ci_high)

**Interpretation (student voice):** If the CI is entirely above 0 and the one‑sided p‑value < 0.05, smokers have higher average charges.