# Likelihood, Fisher's Exact Test, and Two-Proportion Z-Test for M7 and M8
This notebook computes:
- **Likelihood %** (Treatment % ÷ Control % × 100)
- **Fisher's Exact Test** (exact p-value)
- **Two-Proportion Z-Test** (approximate p-value)

Using the dataset's **success/failure counts** from the Control and Treatment groups, stratified by experience level ≤ 3.5 vs > 3.5.

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import fisher_exact, norm

# --- PARAMETERS ---
CSV_PATH = "./data/Milestones - timesStandard-prod-Final.csv"  # adjust if needed
EXPERIENCE_THRESHOLD = 3.5

# --- LOAD DATA ---
df = pd.read_csv(CSV_PATH)
df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]

# Build experience column (Control uses control_yoe, Treatment uses alt_yoe)
df['experience'] = np.where(df['group'].str.lower() == 'control', df['control_yoe'], df['alt_yoe'])
df['experience_bucket'] = df['experience'].apply(lambda x: 'Low (≤3.5)' if pd.notna(x) and x <= EXPERIENCE_THRESHOLD else ('High (>3.5)' if pd.notna(x) else np.nan))

# Helper: get counts of successes & totals
def get_counts(flag_col, group, bucket):
    sub = df[(df['group'] == group) & (df['experience_bucket'] == bucket)]
    n_total = len(sub)
    n_success = (sub[flag_col] == 'y').sum()
    return n_success, n_total

# Comparisons for M7 and M8
comparisons = [
    ("M7 Low exp", "sound_solution", "Low (≤3.5)"),
    ("M7 High exp", "sound_solution", "High (>3.5)"),
    ("M8 Low exp", "correct", "Low (≤3.5)"),
    ("M8 High exp", "correct", "High (>3.5)")
]

# Results list
results = []
for label, flag_col, bucket in comparisons:
    succ_t, total_t = get_counts(flag_col, "Treatment", bucket)
    succ_c, total_c = get_counts(flag_col, "Control", bucket)
    
    # Fisher's exact test
    odds, p_fisher = fisher_exact([[succ_t, total_t - succ_t], [succ_c, total_c - succ_c]])
    
    # Two-proportion z-test
    p1 = succ_t / total_t if total_t > 0 else np.nan
    p2 = succ_c / total_c if total_c > 0 else np.nan
    p_pool = (succ_t + succ_c) / (total_t + total_c) if (total_t + total_c) > 0 else np.nan
    se = np.sqrt(p_pool * (1 - p_pool) * (1 / total_t + 1 / total_c)) if total_t > 0 and total_c > 0 else np.nan
    z = (p1 - p2) / se if se != 0 else np.nan
    p_z = 2 * (1 - norm.cdf(abs(z))) if not np.isnan(z) else np.nan
    
    results.append({
        "Comparison": label,
        "Treatment successes": succ_t, "Treatment total": total_t,
        "Control successes": succ_c, "Control total": total_c,
        "Likelihood %": (p1 / p2) * 100 if p2 > 0 else np.nan,
        "Fisher odds ratio": odds, "Fisher p-value": p_fisher,
        "Z-test p-value": p_z
    })

results_df = pd.DataFrame(results)
results_df.round(4)

Unnamed: 0,Comparison,Treatment successes,Treatment total,Control successes,Control total,Likelihood %,Fisher odds ratio,Fisher p-value,Z-test p-value
0,M7 Low exp,5,6,2,7,291.6667,12.5,0.1026,0.0483
1,M7 High exp,5,7,3,6,142.8571,2.5,0.5921,0.4285
2,M8 Low exp,2,6,1,7,233.3333,3.0,0.5594,0.4164
3,M8 High exp,3,7,2,6,128.5714,1.5,1.0,0.7249


## Notes:
- **Likelihood %**: ratio of treatment success rate to control success rate × 100.
- **Fisher's exact test**: better for small sample sizes, returns exact p-values.
- **Two-proportion z-test**: normal approximation, may give smaller p-values in small-sample cases.
- If the p-values disagree, trust Fisher’s more for small n.