# Lab | Hypothesis Testing

**Objective**

Welcome to the Hypothesis Testing Lab, where we embark on an enlightening journey through the realm of statistical decision-making! In this laboratory, we delve into various scenarios, applying the powerful tools of hypothesis testing to scrutinize and interpret data.

From testing the mean of a single sample (One Sample T-Test), to investigating differences between independent groups (Two Sample T-Test), and exploring relationships within dependent samples (Paired Sample T-Test), our exploration knows no bounds. Furthermore, we'll venture into the realm of Analysis of Variance (ANOVA), unraveling the complexities of comparing means across multiple groups.

So, grab your statistical tools, prepare your hypotheses, and let's embark on this fascinating journey of exploration and discovery in the world of hypothesis testing!

**Challenge 1**

In this challenge, we will be working with pokemon data. The data can be found here:

- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/pokemon.csv

In [13]:
#libraries
import pandas as pd
import scipy.stats as st
import numpy as np



In [14]:
df = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/pokemon.csv")
df

Unnamed: 0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
1,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
2,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,Mega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,Charmander,Fire,,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...
795,Diancie,Rock,Fairy,50,100,150,100,150,50,6,True
796,Mega Diancie,Rock,Fairy,50,160,110,160,110,110,6,True
797,Hoopa Confined,Psychic,Ghost,80,110,60,150,130,70,6,True
798,Hoopa Unbound,Psychic,Dark,80,160,60,170,130,80,6,True


- We posit that Pokemons of type Dragon have, on average, more HP stats than Grass. Choose the propper test and, with 5% significance, comment your findings.

In [15]:
#code here
type_col = "Type 1" if "Type 1" in df.columns else (
    "type" if "type" in df.columns else [c for c in df.columns if "type" in c.lower()][0]
)

# Extract HP for Dragon vs Grass
dragon = df.loc[df[type_col].str.lower() == "dragon", "HP"].dropna().astype(float)
grass  = df.loc[df[type_col].str.lower() == "grass",  "HP"].dropna().astype(float)

print(f"Dragon: n={len(dragon)}, mean={dragon.mean():.2f}, sd={dragon.std(ddof=1):.2f}")
print(f"Grass : n={len(grass)},  mean={grass.mean():.2f},  sd={grass.std(ddof=1):.2f}")

# One-sided Welch's t-test: H0: mu_Dragon <= mu_Grass  vs  H1: mu_Dragon > mu_Grass
try:
    # SciPy ≥ 1.9 supports 'alternative'
    res = st.ttest_ind(dragon, grass, equal_var=False, alternative='greater')
    tstat, p_one = res.statistic, res.pvalue
    # compute Welch DOF for CI/effect size if you want
    s1, s2 = dragon.var(ddof=1), grass.var(ddof=1)
    n1, n2 = len(dragon), len(grass)
    se = np.sqrt(s1/n1 + s2/n2)
    dof = (s1/n1 + s2/n2)**2 / ((s1**2)/((n1**2)*(n1-1)) + (s2**2)/((n2**2)*(n2-1)))
except TypeError:
    # Fallback for older SciPy: compute one-sided p manually
    s1, s2 = dragon.var(ddof=1), grass.var(ddof=1)
    n1, n2 = len(dragon), len(grass)
    se = np.sqrt(s1/n1 + s2/n2)
    tstat = (dragon.mean() - grass.mean()) / se
    dof = (s1/n1 + s2/n2)**2 / ((s1**2)/((n1**2)*(n1-1)) + (s2**2)/((n2**2)*(n2-1)))
    p_one = st.t.sf(tstat, dof)  # one-sided tail

print(f"Welch t = {tstat:.3f}, df ≈ {dof:.1f}, one-sided p = {p_one:.4g}")

# (Optional) 95% CI for the mean difference (two-sided)
diff = dragon.mean() - grass.mean()
tcrit = st.t.ppf(0.975, dof)
ci_low, ci_high = diff - tcrit*se, diff + tcrit*se
print(f"Mean difference (Dragon - Grass) = {diff:.2f}  [95% CI: {ci_low:.2f}, {ci_high:.2f}]")

# (Optional) Effect size (Hedges' g)
sp2 = ((n1-1)*s1 + (n2-1)*s2) / (n1+n2-2)            # pooled variance
d = diff / np.sqrt(sp2)                               # Cohen's d
J = 1 - 3/(4*(n1+n2)-9)                               # small-sample correction
g = J * d
print(f"Hedges' g = {g:.3f}")


Dragon: n=32, mean=83.31, sd=23.80
Grass : n=70,  mean=67.27,  sd=19.52
Welch t = 3.335, df ≈ 50.8, one-sided p = 0.0007994
Mean difference (Dragon - Grass) = 16.04  [95% CI: 6.38, 25.70]
Hedges' g = 0.760


- We posit that Legendary Pokemons have different stats (HP, Attack, Defense, Sp.Atk, Sp.Def, Speed) when comparing with Non-Legendary. Choose the propper test and, with 5% significance, comment your findings.


In [16]:
#code here
# Columns
flag_col = "Legendary" if "Legendary" in df.columns else [c for c in df.columns if "legend" in c.lower()][0]
stats_cols = ["HP","Attack","Defense","Sp. Atk","Sp. Def","Speed"]
stats_cols = [c for c in stats_cols if c in df.columns]  # keep only those present

#  Split groups
leg = df[df[flag_col] == True][stats_cols].astype(float)
non = df[df[flag_col] == False][stats_cols].astype(float)

# Per-stat Welch t-tests (two-sided) + effect size
rows = []
for col in stats_cols:
    x, y = leg[col].dropna().values, non[col].dropna().values
    res = st.ttest_ind(x, y, equal_var=False)  # two-sided
    # Hedges' g
    n1, n2 = len(x), len(y)
    s1, s2 = np.var(x, ddof=1), np.var(y, ddof=1)
    sp2 = ((n1-1)*s1 + (n2-1)*s2) / (n1+n2-2)
    d = (x.mean() - y.mean()) / np.sqrt(sp2)
    J = 1 - 3/(4*(n1+n2)-9)  # small-sample correction
    g = J*d
    rows.append({
        "stat": col,
        "mean_legendary": x.mean(),
        "mean_nonlegend": y.mean(),
        "diff(L-N)": x.mean()-y.mean(),
        "t": res.statistic,
        "p_raw": res.pvalue,
        "hedges_g": g
    })

out = pd.DataFrame(rows)

#  Multiple-comparison correction (Holm)
try:
    from statsmodels.stats.multitest import multipletests
    rej, p_holm, _, _ = multipletests(out["p_raw"], method="holm")
    out["p_holm"] = p_holm
    out["reject_0.05"] = rej
except Exception:
    # simple Holm fallback
    order = np.argsort(out["p_raw"].values)
    m = len(out)
    holm = np.empty(m); holm[:] = np.nan
    for rank, idx in enumerate(order, start=1):
        holm[idx] = (m - rank + 1) * out.loc[idx, "p_raw"]
    # monotone adjustment
    for i in range(1, m):
        holm[order[i]] = max(holm[order[i]], holm[order[i-1]])
    out["p_holm"] = np.clip(holm, 0, 1)
    out["reject_0.05"] = out["p_holm"] < 0.05

# Nicely sorted table
print(out.sort_values("p_holm")[["stat","mean_legendary","mean_nonlegend","diff(L-N)","hedges_g","p_raw","p_holm","reject_0.05"]]
        .to_string(index=False))

#  (Optional) Global multivariate test (MANOVA)
try:
    from statsmodels.multivariate.manova import MANOVA
    # make a temporary df with clean column names for formula
    tmp = df[[flag_col]+stats_cols].dropna().copy()
    tmp = tmp.rename(columns={flag_col:"LegendaryFlag",
                              "Sp. Atk":"Sp_Atk","Sp. Def":"Sp_Def"})
    formula = "HP + Attack + Defense + Sp_Atk + Sp_Def + Speed ~ C(LegendaryFlag)"
    manova = MANOVA.from_formula(formula, data=tmp)
    print("\nMANOVA (Pillai’s trace):")
    print(manova.mv_test())  # look at Pillai/ Wilks p-values
except Exception as e:
    print("\nMANOVA skipped (statsmodels not available or column names differ).")


   stat  mean_legendary  mean_nonlegend  diff(L-N)  hedges_g        p_raw       p_holm  reject_0.05
Sp. Atk      122.184615       68.454422  53.730194  1.834685 1.551461e-21 9.308768e-21         True
  Speed      100.184615       65.455782  34.728833  1.262463 1.049016e-18 5.245082e-18         True
 Attack      116.676923       75.669388  41.007535  1.344181 2.520372e-16 1.008149e-15         True
Sp. Def      105.938462       68.892517  37.045945  1.426976 2.294933e-15 6.884798e-15         True
     HP       92.738462       67.182313  25.556149  1.038922 1.002691e-13 2.005382e-13         True
Defense       99.661538       71.559184  28.102355  0.928401 4.826998e-11 4.826998e-11         True

MANOVA (Pillai’s trace):
                   Multivariate linear model
                                                                
----------------------------------------------------------------
       Intercept         Value  Num DF  Den DF   F Value  Pr > F
----------------------------------

**Challenge 2**

In this challenge, we will be working with california-housing data. The data can be found here:
- https://raw.githubusercontent.com/data-bootcamp-v4/data/main/california_housing.csv

In [17]:
df = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/california_housing.csv")
df.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0


**We posit that houses close to either a school or a hospital are more expensive.**

- School coordinates (-118, 34)
- Hospital coordinates (-122, 37)

We consider a house (neighborhood) to be close to a school or hospital if the distance is lower than 0.50.

Hint:
- Write a function to calculate euclidean distance from each house (neighborhood) to the school and to the hospital.
- Divide your dataset into houses close and far from either a hospital or school.
- Choose the propper test and, with 5% significance, comment your findings.
 

In [18]:
# 0) Load
df = pd.read_csv("https://raw.githubusercontent.com/data-bootcamp-v4/data/main/california_housing.csv")

# 1) Columns
lon_col = next(c for c in df.columns if "long" in c.lower())     # 'longitude'
lat_col = next(c for c in df.columns if "lat"  in c.lower())     # 'latitude'
price_col = next(c for c in df.columns if "median_house_value" in c.lower())

# 2) Distances (euclidean on lon/lat degrees, as hinted)
school = (-118.0, 34.0)
hospital = (-122.0, 37.0)

def euclid_xy(x1, y1, x2, y2):
    return np.sqrt((x1 - x2)**2 + (y1 - y2)**2)

df["dist_school"]   = euclid_xy(df[lon_col], df[lat_col], school[0],   school[1])
df["dist_hospital"] = euclid_xy(df[lon_col], df[lat_col], hospital[0], hospital[1])
df["dist_min"]      = df[["dist_school","dist_hospital"]].min(axis=1)

# 3) Close vs Far
threshold = 0.50  # per instructions
df["close"] = df["dist_min"] < threshold

# 4) Prepare groups
close_vals = df.loc[df["close"], price_col].dropna().astype(float)
far_vals   = df.loc[~df["close"], price_col].dropna().astype(float)

print(f"Close: n={len(close_vals)}, mean=${close_vals.mean():,.0f}")
print(f"Far  : n={len(far_vals)},   mean=${far_vals.mean():,.0f}")

# 5) Hypothesis test (one-sided Welch t-test)
# H0: mu_close <= mu_far  vs  H1: mu_close > mu_far
try:
    # SciPy >= 1.9 supports 'alternative'
    res = st.ttest_ind(close_vals, far_vals, equal_var=False, alternative="greater")
    tstat, p_one = res.statistic, res.pvalue
except TypeError:
    # manual one-sided from two-sided
    res = st.ttest_ind(close_vals, far_vals, equal_var=False)
    tstat = res.statistic
    p_one = st.t.sf(tstat, df=min(len(close_vals)-1, len(far_vals)-1))  # conservative df

# Welch SE & dof (for reporting)
s1, s2 = close_vals.var(ddof=1), far_vals.var(ddof=1)
n1, n2 = len(close_vals), len(far_vals)
se = np.sqrt(s1/n1 + s2/n2)
dof = (s1/n1 + s2/n2)**2 / ((s1**2)/((n1**2)*(n1-1)) + (s2**2)/((n2**2)*(n2-1)))

diff = close_vals.mean() - far_vals.mean()
print(f"Welch t = {tstat:.3f}, df ≈ {dof:.1f}, one-sided p = {p_one:.4g}, diff = ${diff:,.0f}")

# 6) Effect size (Hedges' g)
sp2 = ((n1-1)*s1 + (n2-1)*s2) / (n1+n2-2)
d = diff / np.sqrt(sp2)
J = 1 - 3/(4*(n1+n2)-9)
g = J*d
print(f"Hedges' g = {g:.3f}")

# 7) (Optional) Robustness: one-sided Mann–Whitney U (close > far)
try:
    u_stat, p_mwu = st.mannwhitneyu(close_vals, far_vals, alternative="greater")
    print(f"Mann–Whitney U one-sided p = {p_mwu:.4g}")
except TypeError:
    pass


Close: n=6829, mean=$246,952
Far  : n=10171,   mean=$180,678
Welch t = 37.992, df ≈ 14571.2, one-sided p = 1.503e-301, diff = $66,274
Hedges' g = 0.595
Mann–Whitney U one-sided p = 0
