# Cohort Retention Teaching Notebook (v1.2)

This notebook teaches you how to **explain the cohort retention decision story yourself** using the real project outputs.

## Learning goal
By the end, you should be able to explain:
1. What decision is being supported
2. Why the evidence is trustworthy (Gates A/B/C + QA)
3. What each chart means
4. Which families to prioritize and why (directional, not causal)


## Story structure you should memorize
Use this sequence when presenting:
1. **Decision**: what choice the business needs to make
2. **Trust checks**: prove data quality before charts
3. **Chart 1**: overall retention shape by cohort and month
4. **Chart 2**: net proxy behavior (refund-aware)
5. **Chart 3**: family-level priority ranking
6. **Actions**: 2 plays and how to measure them


## Instructor mode: how to use this notebook
Read this like a lecture, not like a script to memorize line-by-line.

### Teaching cadence (recommended)
- 2 minutes: decision and business context
- 2 minutes: trust checks and gate discipline
- 4 minutes: chart walkthrough (1 minute per chart + transitions)
- 2 minutes: recommendation and guardrails

### Golden rule while speaking
Always separate:
- **Observed fact** (what the data says)
- **Interpretation** (what it might mean)
- **Action** (what we will test next)

That structure keeps your story rigorous and avoids accidental causal claims.


In [None]:
from pathlib import Path
import json
import re
import pandas as pd

# Robust repo root detection for notebook and nbconvert execution contexts.
cwd = Path.cwd().resolve()
REPO_ROOT = None
for root in [cwd] + list(cwd.parents):
    if (root / 'data_processed').exists() and (root / 'docs').exists():
        REPO_ROOT = root
        break
if REPO_ROOT is None:
    raise FileNotFoundError('Could not locate repo root with data_processed/ and docs/.')

DP = REPO_ROOT / 'data_processed'
DOCS = REPO_ROOT / 'docs'
EXPORTS = REPO_ROOT / 'exports'

required = [
    DP / 'customer_month_activity.csv',
    DP / 'orders.csv',
    DP / 'customers.csv',
    DP / 'gate_a.json',
    DP / 'confound_m2_family_all_vs_retail.csv',
    DP / 'chart1_logo_retention_heatmap.csv',
    DP / 'chart2_net_proxy_curves.csv',
    DP / 'chart3_m2_by_family.csv',
    DOCS / 'DRIVER_COVERAGE_REPORT.md',
    DOCS / 'QA_CHECKLIST.md',
    DOCS / 'DECISION_MEMO_1PAGE.md',
    EXPORTS / 'cohort_retention_story.html',
]
missing = [str(p.relative_to(REPO_ROOT)) for p in required if not p.exists()]
if missing:
    raise FileNotFoundError('Missing required artifacts: ' + ', '.join(missing))

cma = pd.read_csv(DP / 'customer_month_activity.csv')
orders = pd.read_csv(DP / 'orders.csv')
customers = pd.read_csv(DP / 'customers.csv')
chart1 = pd.read_csv(DP / 'chart1_logo_retention_heatmap.csv')
chart2 = pd.read_csv(DP / 'chart2_net_proxy_curves.csv')
chart3 = pd.read_csv(DP / 'chart3_m2_by_family.csv')
confound = pd.read_csv(DP / 'confound_m2_family_all_vs_retail.csv')
gate_a = json.loads((DP / 'gate_a.json').read_text(encoding='ascii'))

coverage_text = (DOCS / 'DRIVER_COVERAGE_REPORT.md').read_text(encoding='ascii')
qa_text = (DOCS / 'QA_CHECKLIST.md').read_text(encoding='utf-8')

m_gross = re.search(r'mapped to non-Other families: ([0-9]+\.?[0-9]*)%', coverage_text)
m_customer = re.search(r'customers with non-Other first_product_family: ([0-9]+\.?[0-9]*)%', coverage_text)
coverage_gross = float(m_gross.group(1)) if m_gross else None
coverage_customer = float(m_customer.group(1)) if m_customer else None

print('Repo root:', REPO_ROOT)
print('Loaded rows -> cma:', len(cma), 'orders:', len(orders), 'customers:', len(customers))
print('Loaded chart tables -> chart1:', len(chart1), 'chart2:', len(chart2), 'chart3:', len(chart3))


## Terminology Masterclass (speak these fluently)
Use this section to build vocabulary precision. In interviews and reviews, terminology clarity is a credibility multiplier.

When you define a term, always include:
1. What it is
2. How it is computed
3. Why it matters
4. What people often misunderstand


In [None]:
terms = pd.DataFrame([
    {
        'term': 'Cohort',
        'plain_english': 'A group of customers bucketed by first valid purchase month.',
        'how_computed': 'cohort_month = month(first_order_ts) for non-guest customers with >=1 valid purchase',
        'why_it_matters': 'Lets us compare retention trajectories fairly across customer start periods.',
        'common_misread': 'Treating cohorts as static segments rather than time-anchored start groups.'
    },
    {
        'term': 'months_since_first (0..6)',
        'plain_english': 'How many months after the cohort month we are measuring.',
        'how_computed': 'Period(M) difference: activity_month - cohort_month, constrained to 0..6',
        'why_it_matters': 'Aligns customer timelines so Month 2 means the same lifecycle stage for everyone.',
        'common_misread': 'Mixing calendar month with lifecycle month.'
    },
    {
        'term': 'Logo retention',
        'plain_english': 'Whether a customer placed any valid purchase in a given month (0/1).',
        'how_computed': 'is_retained_logo = 1 if orders_count_valid > 0 else 0',
        'why_it_matters': 'Captures repeat behavior independent of basket size.',
        'common_misread': 'Assuming retained logo implies strong revenue quality.'
    },
    {
        'term': 'Net retention proxy',
        'plain_english': 'Refund-aware value proxy normalized by month-0 gross baseline.',
        'how_computed': 'sum(net_revenue_proxy_total_t) / sum(gross_revenue_valid_t0), cohort-level',
        'why_it_matters': 'Shows value dynamics beyond pure repeat incidence.',
        'common_misread': 'Treating it as audited finance profit metric.'
    },
    {
        'term': 'first_product_family (driver)',
        'plain_english': 'Primary family inferred from the customer first valid order.',
        'how_computed': 'Max gross family within first valid order; NonMerch excluded from competition',
        'why_it_matters': 'Creates action-oriented entry-point segmentation for retention tests.',
        'common_misread': 'Thinking it is a stable preference label forever.'
    },
    {
        'term': 'is_credit_like',
        'plain_english': 'Order flagged as refund/credit behavior.',
        'how_computed': 'is_cancel_invoice OR order_net_proxy < 0',
        'why_it_matters': 'Prevents mixing credits into positive sale interpretation.',
        'common_misread': 'Ignoring credits and overstating value retention.'
    },
    {
        'term': 'Gate A',
        'plain_english': 'Validity trigger for strict purchase rules.',
        'how_computed': '% valid purchases with net<=0; trigger if >0.5%',
        'why_it_matters': 'Protects cohort definitions from financially inconsistent purchases.',
        'common_misread': 'Assuming default validity is always safe.'
    },
    {
        'term': 'Gate B',
        'plain_english': 'Coverage quality of product-family mapping.',
        'how_computed': '% gross mapped to non-Other + customer non-Other coverage',
        'why_it_matters': 'Ensures driver segmentation is informative, not mostly Other.',
        'common_misread': 'Optimizing for perfect mapping at cost of rule bloat.'
    },
    {
        'term': 'Gate C',
        'plain_english': 'Sensitivity check for wholesale-like confounding.',
        'how_computed': 'Compare M2 family retention: All vs Retail-only; material if >=5pp & n>=80',
        'why_it_matters': 'Separates true signal from segment-mix distortion risk.',
        'common_misread': 'Treating non-material differences as meaningful.'
    },
])

pd.set_option('display.max_colwidth', 120)
print(terms.to_string(index=False))


### Terminology drill (say this out loud)
- "A cohort is defined by first valid purchase month; months_since_first aligns lifecycle, not calendar."
- "Logo retention measures repeat incidence, while net retention proxy measures refund-aware value dynamics."
- "first_product_family is the frozen entry-point driver for prioritization, not a causal label."
- "Gate A/B/C are controls for validity, mapping coverage, and confound sensitivity before interpretation."


## Noobie One-Pass Map (memorize this first)
If everything feels like too much, reduce the whole project to 5 boxes:

1. **Question**: Which first-product families should we test first for retention?
2. **Trust**: Did Gates A/B/C and QA checks pass?
3. **Pattern**: What do the 3 charts say (overall trend, value trend, family ranking)?
4. **Decision**: Which 2-3 families do we prioritize now?
5. **Action**: What 2 experiments run next + what guardrails we monitor?

If you can say one sentence for each box, you can explain the project.


### Fill-in-the-blank speaking template (practice)
Use this template until it becomes automatic:

- "We analyzed cohorts to decide ________."
- "Before interpretation, we validated quality with Gate A/B/C: ________."
- "Chart 1 shows ________."
- "Chart 2 shows ________, which matters because ________."
- "Chart 3 shows top families ________ and weaker families ________."
- "So the immediate plays are ________ and ________, measured by ________, with guardrails ________."
- "This is directional, not causal, because ________."


In [None]:
# Quick story card: one-screen summary you can read before presenting
material_count = int(confound['material_sensitivity'].sum()) if 'material_sensitivity' in confound.columns else 0
best = chart3.sort_values('m2_logo_retention', ascending=False, kind='stable').iloc[0]
weak = chart3.sort_values('m2_logo_retention', ascending=True, kind='stable').iloc[0]

story_card = {
    'Decision': 'Prioritize first_product_family segments for retention tests',
    'Gate A': f"{gate_a['gate_a_pct_valid_nonpositive_net']:.4f}% non-positive valid net (trigger={gate_a['trigger_fired']})",
    'Gate B': f"{coverage_gross:.2f}% gross mapped non-Other; {coverage_customer:.2f}% customers non-Other",
    'Gate C': f"material sensitivity count={material_count}",
    'Top family': f"{best['family_group']} ({best['m2_logo_retention']*100:.1f}%, n={int(best['n_customers'])})",
    'Weak family': f"{weak['family_group']} ({weak['m2_logo_retention']*100:.1f}%, n={int(weak['n_customers'])})",
}
for k, v in story_card.items():
    print(f"- {k}: {v}")


## 1) Decision framing (say this first)
**Decision to support:** Which `first_product_family` groups should we prioritize for retention tests (replenishment nudges + returns mitigation)?

**Primary KPI in decision chart:** M2 logo retention (`months_since_first == 2`)

**Secondary KPI for risk context:** Net retention proxy (refund-aware, directional)


### Lecture note: why this decision framing works
Most analytics stories fail because they start with methods, not decisions.

Here, start with one sentence:
> "We are deciding which first-product-family segments to test first for retention gains."

Then translate business risk:
- If we choose wrong families, experiment budget is wasted.
- If we choose right families, we accelerate learning and retention impact.

This creates immediate relevance before any technical detail.


In [None]:
summary = pd.DataFrame([
    {
        'question': 'Which first_product_family should be prioritized for retention experiments?',
        'primary_kpi': 'M2 logo retention',
        'secondary_kpi': 'net retention proxy',
        'horizon': 'months 0..6',
        'driver': 'first_product_family only',
    }
])
print(summary.to_string(index=False))


## 2) Trust checks before charts
You should always prove these checks pass before talking about conclusions.


### Lecture note: what each trust check protects against
- **Uniqueness checks** prevent accidental double-counting.
- **Full grid check (7 rows/customer)** ensures denominator consistency across months.
- **Month 0 = 100%** validates cohort construction logic.
- **Non-negative sale proxy** prevents mislabeled credits as sales.

How to phrase it in a presentation:
> "Before interpretation, we validated structural integrity so the trends are not artifacts of data shape."


In [None]:
rows_per_customer = cma.groupby('customer_id').size()
month0 = cma[cma['months_since_first'] == 0]
sale_nonnegative = (orders.loc[orders['is_credit_like'] == 0, 'order_net_proxy'] >= 0).all()

checks = pd.DataFrame([
    {'check': 'orders.order_id unique', 'pass': bool(orders['order_id'].is_unique)},
    {'check': 'customers.customer_id unique', 'pass': bool(customers['customer_id'].is_unique)},
    {'check': 'full grid = 7 rows/customer', 'pass': bool((rows_per_customer == 7).all())},
    {'check': 'Month0 logo retention = 100%', 'pass': bool((month0['is_retained_logo'] == 1).all())},
    {'check': 'non-credit order_net_proxy non-negative', 'pass': bool(sale_nonnegative)},
])
print(checks.to_string(index=False))


## 3) Gate receipts (A/B/C)
Use these three lines to establish methodological discipline.


### Lecture note: how to explain Gates A/B/C in plain language
- **Gate A (validity)**: checks whether purchases marked valid are financially coherent.
- **Gate B (coverage)**: checks whether product-family mapping is specific enough to drive decisions.
- **Gate C (confound sensitivity)**: checks if wholesale-like behavior is distorting family comparisons.

When asked "Why should we trust this?", answer with the gates first, charts second.


In [None]:
material_count = int(confound['material_sensitivity'].sum()) if 'material_sensitivity' in confound.columns else 0
print(f"Gate A: {gate_a['gate_a_pct_valid_nonpositive_net']:.4f}% valid purchases with non-positive net; trigger={gate_a['trigger_fired']}")
print(f"Gate B: gross coverage non-Other={coverage_gross:.2f}%, customer non-Other={coverage_customer:.2f}%")
print(f"Gate C: material sensitivity count={material_count} (rows={len(confound)})")


## 4) Chart 1 teaching points (logo retention heatmap)
What to say:
- "This chart shows the retention shape across cohorts over months 0..6."
- "Month 0 is mechanically ~100% by cohort definition."
- "Look for faster vs slower decay across cohort rows."


### How to narrate Chart 1 confidently
Speak in three passes:
1. **Orientation**: "Rows are cohorts, columns are months since first purchase."
2. **Pattern**: "Retention decays over time; that is expected."
3. **Implication**: "The question becomes where decay is slower and more recoverable."

Avoid overfitting one cohort row; focus on repeated patterns across cohorts.


In [None]:
heat = chart1.copy()
heat['months_since_first'] = heat['months_since_first'].astype(int)
month2 = heat[heat['months_since_first'] == 2].copy()
month6 = heat[heat['months_since_first'] == 6].copy()
mean_m2 = month2['logo_retention'].mean()
mean_m6 = month6['logo_retention'].mean()
print(f'Average logo retention at Month 2 across cohorts: {mean_m2:.3f}')
print(f'Average logo retention at Month 6 across cohorts: {mean_m6:.3f}')
print('Interpretation: Month 6 lower than Month 2 indicates expected retention decay over time.')


## 5) Chart 2 teaching points (net retention proxy curves)
What to say:
- "These are eligible cohorts only (baseline denominator guard applied)."
- "Net proxy can diverge from logo retention because refunds/credits affect value even when logos retain."


### How to narrate Chart 2 without confusion
Common audience confusion: "Why does logo retention look okay while net proxy drops?"

Teaching answer:
- Logo retention only asks "did they buy?"
- Net proxy asks "what was the value after returns/credits?"

So divergence is not contradictory; it indicates value leakage risk despite repeat behavior.


In [None]:
m2 = chart2[chart2['months_since_first'] == 2].copy().sort_values('m2_logo_retention', ascending=False, kind='stable').reset_index(drop=True)
print('Cohorts used in Chart 2 (ranked by M2 logo retention):')
print(m2[['cohort_month', 'n_customers', 'm2_logo_retention', 'net_retention_proxy']].to_string(index=False))

if len(m2) >= 2:
    top = m2.iloc[0]
    bot = m2.iloc[-1]
    print(f"Top vs bottom M2 logo gap: {(top['m2_logo_retention'] - bot['m2_logo_retention']) * 100:.1f}pp")


## 6) Chart 3 teaching points (M2 retention by first family)
What to say:
- "This is the prioritization chart: which first families have stronger/weaker M2 repeat?"
- "Bars are annotated with n so we can balance signal strength vs sample size."


### How to narrate Chart 3 for decision-making
Use this framing:
- "This is a **prioritization** chart, not a causal attribution chart."
- "We choose where to test first based on M2 signal strength and sample size."
- "Top bars suggest replenishment opportunities; lower bars suggest friction/returns mitigation opportunities."

Always mention `n_customers` when comparing bars.


In [None]:
fam = chart3.sort_values(['m2_logo_retention', 'n_customers'], ascending=[False, False], kind='stable').reset_index(drop=True)
print('Family ranking used for action prioritization:')
print(fam.to_string(index=False))

top3 = fam.head(3)
bottom2 = fam.tail(2)
print('\nTop 3 candidate families for retention tests:')
print(top3[['family_group', 'm2_logo_retention', 'n_customers']].to_string(index=False))
print('\nLower-performing families to consider for returns/credit mitigation:')
print(bottom2[['family_group', 'm2_logo_retention', 'n_customers']].to_string(index=False))


## 7) Confounds and limitations (must state aloud)
- Findings are **directional, not causal**.
- `first_product_family` is derived from text mapping and can be imperfect.
- NonMerch (`*_NonMerch`) is excluded from first-family competition by design.
- Gate C compares All vs Retail-only to check wholesale-like sensitivity.


### Objection handling (interview-safe)
If challenged, use these responses:

- **"Is this causal?"**
  No. It is directional segmentation evidence; causal claims require controlled experiments.

- **"Could mapping bias this?"**
  Yes, partially. That is why Gate B coverage is monitored and rules are auditable.

- **"Could wholesale behavior skew this?"**
  We run Gate C All vs Retail-only sensitivity and report material differences.

- **"Why only 0..6 months?"**
  Horizon is frozen by spec for comparability and complete customer-month grids.


In [None]:
qa_pass = {
    'full_grid_pass': 'customer_month_activity has exactly 7 rows/customer for months 0..6 PASS' in qa_text,
    'month0_pass': 'Month0 logo retention ~100% (after exclusions) PASS' in qa_text,
    'credit_alignment_pass': 'is_credit_like applied to orders.financial_status and transactions.kind PASS' in qa_text,
}
print('QA receipt flags:', qa_pass)
print('Confound material sensitivity count:', int(confound['material_sensitivity'].sum()) if 'material_sensitivity' in confound.columns else 'N/A')


## 8) 60-second talk track generator
Run the next cell and practice reading it out loud. Keep this structure in interviews:
1) Decision
2) Trust checks
3) 3 chart takeaways
4) Action plays + guardrails


### Turn the talk track into a repeatable explanation
Practice in this exact order:
1. One-line decision
2. Three gate receipts
3. One line per chart
4. Two test plays
5. One caveat sentence (directional, not causal)

If you can do this in under 60 seconds with confidence, your story is interview-ready.


In [None]:
gate_a_pct = gate_a['gate_a_pct_valid_nonpositive_net']
material_count = int(confound['material_sensitivity'].sum()) if 'material_sensitivity' in confound.columns else 0

best = chart3.sort_values('m2_logo_retention', ascending=False, kind='stable').iloc[0]
worst = chart3.sort_values('m2_logo_retention', ascending=True, kind='stable').iloc[0]
spread_pp = (best['m2_logo_retention'] - worst['m2_logo_retention']) * 100

script_lines = [
    "Decision: prioritize first_product_family cohorts for retention experiments.",
    f"Data trust: Gate A {gate_a_pct:.4f}% (trigger={gate_a['trigger_fired']}), Gate B {coverage_gross:.2f}% mapped, Gate C material={material_count}.",
    "Chart 1: cohort retention decays over months 0..6, as expected.",
    "Chart 2: net proxy curves show value-level divergence beyond logo retention.",
    f"Chart 3: best family {best['family_group']} ({best['m2_logo_retention']*100:.1f}%, n={int(best['n_customers'])}), weakest {worst['family_group']} ({worst['m2_logo_retention']*100:.1f}%, n={int(worst['n_customers'])}), spread {spread_pp:.1f}pp.",
    "Action: run replenishment nudges on top families and returns/credit mitigation on weaker families.",
    "Caveat: directional evidence only, not causal proof."
]
for i, line in enumerate(script_lines, start=1):
    print(f"{i}. {line}")


## Practice checklist for yourself
Before presenting, confirm you can answer:
- Why Month 0 is ~100%
- Why net proxy can move differently from logo retention
- Why family ranking is a prioritization heuristic (not causal truth)
- What experiment you would run first and what guardrail you would watch


## Self-quiz (answer out loud)
1. Why is Month 0 expected to be 100%?
2. What does Gate B protect against?
3. Why can net proxy fall while logo retention stays stable?
4. What makes Chart 3 actionable but non-causal?
5. What are the two immediate plays and their guardrails?

If you can answer these cleanly, you can explain the cohort decision story end-to-end.


## Mini Quiz (self-check)
Try answering without looking up the notebook.

1. What does `months_since_first = 2` mean in plain English?
2. Why do we require a full 0..6 grid for each cohort customer?
3. What business risk does Gate B protect against?
4. Why can logo retention stay stable while net proxy drops?
5. What is the single driver used for family segmentation in this project?
6. Name the two action plays and one guardrail for each.


### Mini Quiz Answer Key (grade yourself)
1. Month 2 means two lifecycle months after first valid purchase month.
2. Full grid prevents denominator drift and makes month-to-month comparisons valid.
3. Gate B protects against `Other` dominating and making segmentation too vague for decisions.
4. Logo is incidence (did buy), net proxy is value after returns/credits.
5. Driver is `first_product_family` only.
6. Plays: replenishment nudges (guardrail: no rise in credits/cancellations), returns/credits mitigation (guardrail: margin/support burden).

Scoring:
- 6/6: ready to present solo
- 4-5/6: review terminology + Chart 2 explanation
- <=3/6: re-read Noobie One-Pass Map and repeat talk-track cell
