# Success Rates and Mean Times by Experience Threshold (3.5)
This notebook verifies the **success rate** and **mean time** for milestones **M7 (solution)** and **M8 (implement)** by **group** (Control vs Treatment) and by **experience level**.

### Definitions
- **Control experience column**: `Control YOE` (used only for rows in the Control group)
- **Treatment experience column**: `Alt YOE` (used only for rows in the Treatment group)
- **Experience threshold**: Low (≤ 3.5), High (> 3.5)
- **M7 success**: `sound_solution == 'y'`
- **M8 success**: `correct == 'y'`
- **M7 time**: `solution` (minutes)
- **M8 time**: `implement` (minutes)

Times are averaged **only among successful attempts**. The notebook also reports counts per bucket to make the denominators explicit.

In [None]:
# Path to the CSV file (change if needed)
CSV_PATH = "Milestones - timesStandard-prod-Final.csv"

import pandas as pd
import numpy as np

# Load data
df = pd.read_csv(CSV_PATH)

# Normalize column names for easier use
df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]

# Quick sanity check of the columns we rely on
needed = ['group', 'solution', 'implement', 'sound_solution', 'correct', 'control_yoe', 'alt_yoe']
missing = [c for c in needed if c not in df.columns]
assert not missing, f"Missing required columns: {missing}"

df.head(3)

In [None]:
# Build an experience column that uses control_yoe for Control rows and alt_yoe for Treatment rows
df['experience'] = np.where(df['group'].str.lower() == 'control', df['control_yoe'], df['alt_yoe'])

# Define experience buckets
def exp_bucket(x):
    if pd.isna(x):
        return np.nan
    return 'Low (≤3.5)' if x <= 3.5 else 'High (>3.5)'

df['experience_bucket'] = df['experience'].apply(exp_bucket)

# Helper to compute success rate and mean time by group & experience bucket
def summarize(milestone_time_col, success_flag_col):
    rows = []
    for g in ['Control', 'Treatment']:
        for bucket in ['Low (≤3.5)', 'High (>3.5)']:
            sub = df[(df['group'] == g) & (df['experience_bucket'] == bucket)]
            n_total = len(sub)
            succ = sub[sub[success_flag_col] == 'y']
            n_succ = len(succ)
            rate = (n_succ / n_total * 100.0) if n_total > 0 else float('nan')
            mean_time = succ[milestone_time_col].mean()
            rows.append({
                'Group': g,
                'Experience': bucket,
                'N total': n_total,
                'N success': n_succ,
                'Success %': rate,
                'Mean time (min)': mean_time
            })
    return pd.DataFrame(rows)

m7_table = summarize('solution', 'sound_solution')
m8_table = summarize('implement', 'correct')

m7_table.round(2)

In [None]:
m8_table.round(2)

In [None]:
# Merge M7 and M8 for a compact view
merged = m7_table.merge(m8_table, on=['Group','Experience'], suffixes=(' M7',' M8'))
merged = merged[[
    'Group','Experience',
    'N total M7','N success M7','Success % M7','Mean time (min) M7',
    'N total M8','N success M8','Success % M8','Mean time (min) M8'
]].round(2)
merged

## Optional: export results to CSV
Uncomment the cell below to write the tables to disk.

In [None]:
# m7_table.round(2).to_csv('m7_by_experience.csv', index=False)
# m8_table.round(2).to_csv('m8_by_experience.csv', index=False)
# merged.to_csv('m7_m8_merged_by_experience.csv', index=False)