# 06_summary_tables_for_readme

## Notebook Purpose
- Load saved metrics and reports from `artifacts/metrics/` and `artifacts/reports/`
- Auto-generate README-ready result summaries (classification, calibration, targeting)
- Output `artifacts/reports/readme_snippets.md` for copy-paste into `README.md`
- Print key highlights (Top1% / Top10% revenue_capture and purchase_rate) for recommended scores

## Context
- Shared inputs/outputs and execution conventions are documented in the project README.

In [1]:
# ============ Common PATH (local only) ============
from pathlib import Path

PROJECT_ROOT = Path(r"C:\Users\seony\Desktop\personal_project\purchase_prediction")

DATA_DIR = PROJECT_ROOT / "data"
RAW_DIR = DATA_DIR / "raw"
PROCESSED_DIR = DATA_DIR / "processed"

ARTIFACTS_DIR = PROJECT_ROOT / "artifacts"
MODELS_DIR = ARTIFACTS_DIR / "models"
PRED_DIR = ARTIFACTS_DIR / "predictions"
REPORTS_DIR = ARTIFACTS_DIR / "reports"
METRICS_DIR = ARTIFACTS_DIR / "metrics"
FIGURES_DIR = ARTIFACTS_DIR / "figures"

for d in [RAW_DIR, PROCESSED_DIR, MODELS_DIR, PRED_DIR, REPORTS_DIR, METRICS_DIR, FIGURES_DIR]:
    d.mkdir(parents=True, exist_ok=True)

print("PROJECT_ROOT:", PROJECT_ROOT)
print("REPORTS_DIR:", REPORTS_DIR)
print("METRICS_DIR:", METRICS_DIR)

PROJECT_ROOT: C:\Users\seony\Desktop\personal_project\purchase_prediction
REPORTS_DIR: C:\Users\seony\Desktop\personal_project\purchase_prediction\artifacts\reports
METRICS_DIR: C:\Users\seony\Desktop\personal_project\purchase_prediction\artifacts\metrics


In [2]:
import numpy as np
import pandas as pd
from pathlib import Path

In [3]:
# ============ Inputs (artifacts) ============
PURCHASE_VALID = METRICS_DIR / "purchase_metrics_valid.csv"
PURCHASE_TEST  = METRICS_DIR / "purchase_metrics_test.csv"
THRESH_SEL     = METRICS_DIR / "threshold_selection.csv"

CAL_VALID = METRICS_DIR / "calibration_metrics_valid.csv"
CAL_TEST  = METRICS_DIR / "calibration_metrics_test.csv"

PRED_TEST_CAL = PRED_DIR / "predictions_test_calibrated.csv"
PRED_VALID_CAL = PRED_DIR / "predictions_valid_calibrated.csv"

ALPHA_BETA_BEST = REPORTS_DIR / "alpha_beta_best.csv"

CURVE_COMPARE = REPORTS_DIR / "revenue_capture_curve_compare.csv"

README_SNIPPETS_OUT = REPORTS_DIR / "readme_snippets.md"

print("Exists:")
for p in [PURCHASE_VALID, PURCHASE_TEST, THRESH_SEL, CAL_VALID, CAL_TEST, PRED_TEST_CAL, ALPHA_BETA_BEST, CURVE_COMPARE]:
    print(" -", p.name, ":", p.exists())

Exists:
 - purchase_metrics_valid.csv : True
 - purchase_metrics_test.csv : True
 - threshold_selection.csv : True
 - calibration_metrics_valid.csv : True
 - calibration_metrics_test.csv : True
 - predictions_test_calibrated.csv : True
 - alpha_beta_best.csv : True
 - revenue_capture_curve_compare.csv : True


In [4]:
# ============ Helpers ============
def _load_csv(path: Path) -> pd.DataFrame:
    if not path.exists():
        raise FileNotFoundError(f"Missing file: {path}")
    return pd.read_csv(path)

def _fmt(x, nd=4):
    if pd.isna(x):
        return ""
    try:
        return f"{float(x):.{nd}f}"
    except Exception:
        return str(x)

def _pick_best_model(valid_df: pd.DataFrame, key: str = "pr_auc") -> str:
    # Expect a column called "model"
    if "model" not in valid_df.columns:
        raise ValueError("purchase_metrics_valid.csv must have a 'model' column.")
    if key not in valid_df.columns:
        raise ValueError(f"purchase_metrics_valid.csv missing key column: {key}")
    return str(valid_df.sort_values(key, ascending=False).iloc[0]["model"])

def _topk_from_compare(compare_df: pd.DataFrame, score: str, frac: float) -> pd.Series:
    sub = compare_df[(compare_df["score"] == score) & (compare_df["top_frac"] == frac)]
    if len(sub) == 0:
        # fallback: nearest
        s2 = compare_df[compare_df["score"] == score].copy()
        i = int(np.argmin(np.abs(s2["top_frac"].to_numpy() - frac)))
        return s2.iloc[i]
    return sub.iloc[0]

In [5]:
# ============ Load artifacts ============
purchase_valid = _load_csv(PURCHASE_VALID)
purchase_test  = _load_csv(PURCHASE_TEST)

cal_valid = _load_csv(CAL_VALID)
cal_test  = _load_csv(CAL_TEST)

pred_test = _load_csv(PRED_TEST_CAL)
pred_valid = _load_csv(PRED_VALID_CAL)

curve_compare = _load_csv(CURVE_COMPARE)
ab_best = _load_csv(ALPHA_BETA_BEST)

best_model = _pick_best_model(purchase_valid, key="pr_auc")

print("Best purchase model by VALID PR-AUC:", best_model)
print("Calibration methods in predictions file:", sorted(pred_test["p_cal_method"].astype(str).unique().tolist()))
print("Best alpha/beta (from file):", ab_best.iloc[0].to_dict())

Best purchase model by VALID PR-AUC: catboost
Calibration methods in predictions file: ['isotonic']
Best alpha/beta (from file): {'alpha': 2.0, 'beta': 0.25, 'revenue_capture@1pct_valid': 0.3988299786298735, 'revenue_capture@10pct_valid': 0.7934561960973923, 'purchase_rate@1pct_valid': 0.0484044460380064}


In [6]:
# ============ Build purchase summary table (VALID/TEST) ============
# Expected columns produced by notebook 02
cols_core = ["base_rate", "roc_auc", "pr_auc", "logloss", "brier"]
cols_thr1 = [
    "threshold_f1_max", "threshold_f1_max_precision", "threshold_f1_max_recall", "threshold_f1_max_f1",
    "threshold_f1_max_accuracy", "threshold_f1_max_tp", "threshold_f1_max_fp", "threshold_f1_max_tn",
    "threshold_f1_max_fn", "threshold_f1_max_predicted_positive_rate"
]
cols_thr2 = [
    "threshold_precision_10pct", "threshold_precision_10pct_precision", "threshold_precision_10pct_recall",
    "threshold_precision_10pct_f1", "threshold_precision_10pct_accuracy", "threshold_precision_10pct_tp",
    "threshold_precision_10pct_fp", "threshold_precision_10pct_tn", "threshold_precision_10pct_fn",
    "threshold_precision_10pct_predicted_positive_rate"
]

def _row_for(model_name: str, df: pd.DataFrame) -> pd.Series:
    s = df[df["model"] == model_name]
    if len(s) == 0:
        raise ValueError(f"Model '{model_name}' not found in metrics table.")
    return s.iloc[0]

pv = _row_for(best_model, purchase_valid)
pt = _row_for(best_model, purchase_test)

purchase_summary = pd.DataFrame([
    {"split": "valid", **{k: pv.get(k, np.nan) for k in (cols_core + cols_thr1 + cols_thr2)}},
    {"split": "test",  **{k: pt.get(k, np.nan) for k in (cols_core + cols_thr1 + cols_thr2)}},
])

display(purchase_summary)

Unnamed: 0,split,base_rate,roc_auc,pr_auc,logloss,brier,threshold_f1_max,threshold_f1_max_precision,threshold_f1_max_recall,threshold_f1_max_f1,...,threshold_precision_10pct,threshold_precision_10pct_precision,threshold_precision_10pct_recall,threshold_precision_10pct_f1,threshold_precision_10pct_accuracy,threshold_precision_10pct_tp,threshold_precision_10pct_fp,threshold_precision_10pct_tn,threshold_precision_10pct_fn,threshold_precision_10pct_predicted_positive_rate
0,valid,0.002066,0.87799,0.04684,0.413211,0.123886,0.955583,0.104874,0.123264,0.113328,...,0.954489,0.099863,0.126736,0.111706,0.995836,73,658,277613,503,0.002622
1,test,0.001976,0.871106,0.040651,0.412595,0.123964,0.955583,0.093923,0.102,0.097795,...,0.954489,0.089831,0.106,0.097248,0.996111,53,537,251989,447,0.002332


In [7]:
# ============ Build calibration summary table (VALID/TEST) ============
# calibration_metrics_{split}.csv is expected to include rows for raw/platt/isotonic
def _cal_block(df: pd.DataFrame, split_name: str) -> pd.DataFrame:
    # support either {method, base_rate, logloss, brier, p_mean, p_p99, p_max} or a superset
    need = ["method", "base_rate", "logloss", "brier", "p_mean"]
    missing = [c for c in need if c not in df.columns]
    if missing:
        raise ValueError(f"Missing columns in calibration metrics: {missing}")
    out = df.copy()
    out.insert(0, "split", split_name)
    return out[["split", "method", "base_rate", "p_mean", "logloss", "brier"]]

cal_summary = pd.concat([
    _cal_block(cal_valid, "valid"),
    _cal_block(cal_test, "test")
], axis=0, ignore_index=True)

display(cal_summary)

Unnamed: 0,split,method,base_rate,p_mean,logloss,brier
0,valid,raw,0.002066,0.285296,0.413211,0.123886
1,valid,platt,0.002066,0.002088,0.011697,0.002009
2,valid,isotonic,0.002066,0.002066,0.01159,0.002003
3,test,raw,0.001976,0.284399,0.412595,0.123964
4,test,platt,0.001976,0.002069,0.011574,0.001931
5,test,isotonic,0.001976,0.002043,0.011697,0.001932


In [8]:
# ============ Choose recommended scores (objective-driven) ============
# Candidates present in curve_compare (created by notebook 05)
scores = sorted(curve_compare["score"].unique().tolist())
print("Scores available:", scores)

# Objective A: conversion -> maximize purchase_rate@1% on TEST among p-based scores
p_candidates = [s for s in scores if s.startswith("p_")]
conv_best = max(
    p_candidates,
    key=lambda s: float(_topk_from_compare(curve_compare, s, 0.01)["purchase_rate@k"])
)

# Objective B: revenue -> maximize revenue_capture@1% on TEST among EV-cal candidates
ev_candidates = [s for s in scores if s.startswith("ev_cal")]
rev_best = max(
    ev_candidates,
    key=lambda s: float(_topk_from_compare(curve_compare, s, 0.01)["revenue_capture@k"])
)

# Objective C: hybrid -> use tuned
hybrid_best = "tuned" if "tuned" in scores else None

print("Recommended (conversion):", conv_best)
print("Recommended (revenue):", rev_best)
print("Recommended (hybrid):", hybrid_best)

Scores available: ['ev', 'ev_cal', 'ev_cal_isotonic', 'ev_cal_platt', 'p_cal', 'p_cal_isotonic', 'p_cal_platt', 'p_hat', 'tuned']
Recommended (conversion): p_cal
Recommended (revenue): ev_cal_platt
Recommended (hybrid): tuned


In [9]:
# ============ Top-1% / Top-10% numbers for README ============
def _kpi_row(score: str, frac: float) -> dict:
    r = _topk_from_compare(curve_compare, score, frac)
    return {
        "score": score,
        "top_frac": float(frac),
        "revenue_capture": float(r["revenue_capture@k"]),
        "purchase_rate": float(r["purchase_rate@k"]),
    }

rows = []
for s in [conv_best, rev_best, hybrid_best]:
    if s is None:
        continue
    rows.append(_kpi_row(s, 0.01))
    rows.append(_kpi_row(s, 0.10))

readme_topk = pd.DataFrame(rows).sort_values(["top_frac", "revenue_capture"], ascending=[True, False])
display(readme_topk)

Unnamed: 0,score,top_frac,revenue_capture,purchase_rate
2,ev_cal_platt,0.01,0.451205,0.045041
4,tuned,0.01,0.434903,0.039905
0,p_cal,0.01,0.400094,0.053734
3,ev_cal_platt,0.1,0.814042,0.012449
5,tuned,0.1,0.794817,0.012963
1,p_cal,0.1,0.748856,0.013121


In [10]:
# ============ Generate README snippets (markdown) ============
# Purchase metrics section (compact)
def _purchase_lines(split: str, s: pd.Series) -> str:
    return "\n".join([
        f"- **{split.upper()}** base_rate={_fmt(s['base_rate'],6)}, ROC-AUC={_fmt(s['roc_auc'])}, PR-AUC={_fmt(s['pr_auc'])}, LogLoss={_fmt(s['logloss'])}, Brier={_fmt(s['brier'])}",
        f"  - F1-max thr={_fmt(s['threshold_f1_max'])}: P={_fmt(s['threshold_f1_max_precision'])}, R={_fmt(s['threshold_f1_max_recall'])}, F1={_fmt(s['threshold_f1_max_f1'])}, PPR={_fmt(s['threshold_f1_max_predicted_positive_rate'])}",
        f"  - P>=10% thr={_fmt(s['threshold_precision_10pct'])}: P={_fmt(s['threshold_precision_10pct_precision'])}, R={_fmt(s['threshold_precision_10pct_recall'])}, F1={_fmt(s['threshold_precision_10pct_f1'])}, PPR={_fmt(s['threshold_precision_10pct_predicted_positive_rate'])}",
    ])

# Calibration section
def _cal_lines(split: str, df: pd.DataFrame) -> str:
    rows = []
    for method in ["raw", "platt", "isotonic"]:
        sub = df[(df["split"] == split) & (df["method"] == method)]
        if len(sub) == 0:
            continue
        r = sub.iloc[0]
        rows.append(f"- **{split.upper()} {method}** p_mean={_fmt(r['p_mean'],6)}, logloss={_fmt(r['logloss'],6)}, brier={_fmt(r['brier'],6)}")
    return "\n".join(rows)

# Targeting section
def _topk_line(score: str) -> str:
    r1 = _topk_from_compare(curve_compare, score, 0.01)
    r10 = _topk_from_compare(curve_compare, score, 0.10)
    return (f"- **{score}**: Top1% revenue_capture={_fmt(r1['revenue_capture@k'],4)}, purchase_rate={_fmt(r1['purchase_rate@k'],4)} | "
            f"Top10% revenue_capture={_fmt(r10['revenue_capture@k'],4)}, purchase_rate={_fmt(r10['purchase_rate@k'],4)}")

method_in_file = str(pred_test["p_cal_method"].astype(str).unique()[0])
ab = ab_best.iloc[0].to_dict()
alpha = ab.get("alpha", "")
beta = ab.get("beta", "")

md = []
md.append("# Results summary (auto-generated)\n")
md.append("## Purchase prediction (classification)\n")
md.append(f"Best model by VALID PR-AUC: **{best_model}**\n")
md.append(_purchase_lines("valid", purchase_summary.loc[purchase_summary['split']=='valid'].iloc[0]))
md.append(_purchase_lines("test",  purchase_summary.loc[purchase_summary['split']=='test'].iloc[0]))
md.append("\n## Calibration effect\n")
md.append(f"Best method by VALID logloss: **{method_in_file}**\n")
md.append(_cal_lines("valid", cal_summary))
md.append(_cal_lines("test",  cal_summary))

md.append("\n## Campaign targeting (ranking) — recommendations\n")
md.append(f"- **Conversion-first**: use **{conv_best}** (maximize purchase_rate@1%)")
md.append(f"- **Revenue-first**: use **{rev_best}** (maximize revenue_capture@1%)")
if hybrid_best:
    md.append(f"- **Hybrid**: use **{hybrid_best}** (tuned score), best params on VALID: alpha={alpha}, beta={beta}")
md.append("\nTop-k highlights (TEST):\n")
md.append(_topk_line(conv_best))
md.append(_topk_line(rev_best))
if hybrid_best:
    md.append(_topk_line(hybrid_best))

md.append("\n## Notes / limitations\n")
md.append("- This evaluation uses weekly snapshots with a fixed history window (23d) and label window (7d).")
md.append("- Results reflect one time span (2020-09 to 2021-02). For stronger confidence, extend to longer periods and run rolling backtests (multiple cutoffs) to monitor drift in PR-AUC, calibration (p_mean/logloss), and top-k revenue capture.\n")

md_text = "\n".join(md)

with open(README_SNIPPETS_OUT, "w", encoding="utf-8") as f:
    f.write(md_text)

print("Saved:", README_SNIPPETS_OUT)
print("\n--- README_SNIPPETS (preview) ---\n")
print(md_text[:2000] + ("\n...\n" if len(md_text) > 2000 else ""))

Saved: C:\Users\seony\Desktop\personal_project\purchase_prediction\artifacts\reports\readme_snippets.md

--- README_SNIPPETS (preview) ---

# Results summary (auto-generated)

## Purchase prediction (classification)

Best model by VALID PR-AUC: **catboost**

- **VALID** base_rate=0.002066, ROC-AUC=0.8780, PR-AUC=0.0468, LogLoss=0.4132, Brier=0.1239
  - F1-max thr=0.9556: P=0.1049, R=0.1233, F1=0.1133, PPR=0.0024
  - P>=10% thr=0.9545: P=0.0999, R=0.1267, F1=0.1117, PPR=0.0026
- **TEST** base_rate=0.001976, ROC-AUC=0.8711, PR-AUC=0.0407, LogLoss=0.4126, Brier=0.1240
  - F1-max thr=0.9556: P=0.0939, R=0.1020, F1=0.0978, PPR=0.0021
  - P>=10% thr=0.9545: P=0.0898, R=0.1060, F1=0.0972, PPR=0.0023

## Calibration effect

Best method by VALID logloss: **isotonic**

- **VALID raw** p_mean=0.285296, logloss=0.413211, brier=0.123886
- **VALID platt** p_mean=0.002088, logloss=0.011697, brier=0.002009
- **VALID isotonic** p_mean=0.002066, logloss=0.011590, brier=0.002003
- **TEST raw** p_mean=0.2