# Verify Mean Cumulative Time to Correctly Complete Milestones
This notebook reproduces the computation requested for milestones **M1 (problem)**, **M7 (solution)**, and **M8 (implement)** using the CSV provided.

**Correctness filters**:
- M1: `right_problem == 'y'` and use `problem` column (minutes)
- M7: `sound_solution == 'y'` and use `solution` column (minutes)
- M8: `correct == 'y'` and use `implement` column (minutes)

It then computes the mean time (in minutes) by `group` (Control vs Treatment) and the percent difference `(Treatment - Control) / Control * 100`.

In [None]:
# Parameters
CSV_PATH = r"/mnt/data/Milestones - timesStandard-prod-Final.csv"  # change if your CSV is elsewhere

In [None]:
import pandas as pd

# Load and normalize columns
df = pd.read_csv(CSV_PATH)
df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]

# Filters for correctness
m1_df = df[df['right_problem'] == 'y']     # M1 uses 'problem' minutes
m7_df = df[df['sound_solution'] == 'y']    # M7 uses 'solution' minutes
m8_df = df[df['correct'] == 'y']           # M8 uses 'implement' minutes

# Means by group
m1_means = m1_df.groupby('group')['problem'].mean().rename('M1_mean_minutes')
m7_means = m7_df.groupby('group')['solution'].mean().rename('M7_mean_minutes')
m8_means = m8_df.groupby('group')['implement'].mean().rename('M8_mean_minutes')

results = pd.concat([m1_means, m7_means, m8_means], axis=1)
results.round(2)

In [None]:
# Build summary with % differences
def pct_diff(control, treatment):
    if pd.isna(control) or pd.isna(treatment) or control == 0:
        return float('nan')
    return (treatment - control) / control * 100.0

summary_rows = []
for m_col, label in [('M1_mean_minutes', 'M1 (problem)'),
                     ('M7_mean_minutes', 'M7 (solution)'),
                     ('M8_mean_minutes', 'M8 (implement)')]:
    control_val = results.loc['Control', m_col] if 'Control' in results.index else float('nan')
    treatment_val = results.loc['Treatment', m_col] if 'Treatment' in results.index else float('nan')
    summary_rows.append({
        'Milestone': label,
        'Control_mean_min': control_val,
        'Treatment_mean_min': treatment_val,
        '%_Difference_(Treatment_vs_Control)': pct_diff(control_val, treatment_val)
    })

summary = pd.DataFrame(summary_rows)
summary.round(2)

You can export the tables above or adapt the filters if your definition of correctness changes.