# 🎓 Mafia Game — Pre-Game Winner Prediction

This notebook builds a **leak-free, pre-game prediction model** for Mafia games.
We go from **clean per-player rows** to **per-game team probabilities** using:
- Temporal **Elo with decay** (skills evolve over time)
- **Role & side** rolling performance
- **Breaks**/freshness
- **Role-specific history** (experience & win rates on each role)
- **Synergy** (same-team familiarity) & **Enemy familiarity** (cross-team history)
- **Streaks** (win/loss momentum)
- **Meta-eras** (ruleset changes over time)
- **Team aggregation** → opponent deltas
- **LightGBM** (main) and optional **CatBoost** (comparison), with **probability calibration**
- Proper **time-aware evaluation** (holdout = last 15% by time proxy)


## 1) Environment & Imports

If something is missing, install via the first cell. Then import everything we need.


In [1]:
# If needed, uncomment to install packages
# !pip install -q lightgbm catboost scikit-learn optuna matplotlib pandas numpy

import pandas as pd
import numpy as np
from pathlib import Path

# Modeling & metrics
from sklearn.model_selection import GroupKFold
from sklearn.metrics import log_loss, roc_auc_score, brier_score_loss
from sklearn.calibration import CalibratedClassifierCV, calibration_curve
from lightgbm import LGBMClassifier, early_stopping, log_evaluation
from sklearn.base import clone
from sklearn.calibration import CalibratedClassifierCV

# Optional
try:
    from catboost import CatBoostClassifier, Pool
    CATBOOST_AVAILABLE = True
except Exception:
    CATBOOST_AVAILABLE = False

import matplotlib.pyplot as plt


## 2) Configuration & Data Load

- **Input:** a cleaned per-player table (one row per player per game).  
  Columns required (min): `id, game_id, player_id, role, team, place, game_points, team_win`  
- **Assumptions:** each game has **10 players**; exactly **one team wins** (7 winners if citizens, 3 if mafia).  
- **Time proxy:** `id` increases with time.

> Update `DATA_CSV` if your file is in a different location.


In [2]:
DATA_CSV = Path("cleaned/mafia_clean.csv")   # put the CSV next to this notebook or provide an absolute path
OUT_DIR  = Path("cleaned"); OUT_DIR.mkdir(exist_ok=True, parents=True)

df = pd.read_csv(DATA_CSV)
print("Loaded:", df.shape, "columns:", len(df.columns))
assert {'id','game_id','player_id','role','team','game_points','team_win'}.issubset(df.columns), \
    "Missing required columns in the cleaned dataset."

# Basic coercions
df['id'] = pd.to_numeric(df['id'], errors='coerce').astype('int64')
df['game_id'] = pd.to_numeric(df['game_id'], errors='coerce').astype('int64')
df['player_id'] = pd.to_numeric(df['player_id'], errors='coerce').astype('int64')
df['team_win'] = pd.to_numeric(df['team_win'], errors='coerce').astype('int8')
df['team'] = df['team'].astype('category')
df['role'] = df['role'].astype('category')

# Seat/position optional column name normalization (if present)
if 'place' in df.columns:
    df['place'] = pd.to_numeric(df['place'], errors='coerce').fillna(0).astype('int16')


Loaded: (802820, 21) columns: 21


  df = pd.read_csv(DATA_CSV)


## 3) Helper Utilities

Small helpers for quantiles and sanity checks.


In [3]:
def q25(x): return np.nanpercentile(x, 25)
def q75(x): return np.nanpercentile(x, 75)

def sanity_assert_two_rows_per_game(team_tall):
    cnt = team_tall.groupby('game_id').size().value_counts()
    print("Rows per game distribution:\n", cnt.head())
    assert 2 in cnt.index.tolist(), "Every game should have exactly 2 rows (one per team)."

def expected_calibration_error(y_true, y_prob, n_bins=10):
    bins = np.linspace(0.0, 1.0, n_bins+1)
    inds = np.digitize(y_prob, bins) - 1
    ece = 0.0
    for b in range(n_bins):
        mask = inds == b
        if mask.sum() == 0: 
            continue
        conf = y_prob[mask].mean()
        acc  = y_true[mask].mean()
        ece += (mask.mean()) * abs(acc - conf)
    return ece


## 4) Feature Engineering (Player-level)

We compute **pre-game** features only (no leakage):
- **Temporal Elo with decay** (global, by side, by role)
- **Side & role** rolling performance
- **Breaks/freshness** via id gaps
- **Role-specific history** (experience and WR on that role)
- **Same-team synergy** & **Enemy familiarity**
- **Streaks** (win/loss)
- **Meta eras** (bucket by id)


### 4.1 Meta eras & gap features
- `meta_period`: bucket `id` into eras to capture rule changes.
- `gap_id` per player → `gap_id_clipped` (bounded) and `long_break_flag`.


In [4]:
# Meta eras
bins   = [0, 200_000, 400_000, 600_000, 800_000, 1_000_000_000]
labels = [1, 2, 3, 4, 5]
df['meta_period'] = pd.cut(df['id'], bins=bins, labels=labels, include_lowest=True).astype('int8')

# Gap per player (id as time proxy)
df = df.sort_values(['player_id','id']).copy()
df['gap_id'] = df.groupby('player_id')['id'].diff().fillna(0).astype('int64')
df['gap_id_clipped'] = np.clip(df['gap_id'], 0, 5000).astype('int32')
GAP_THRESH = 381  # adjust via quantiles if desired
df['long_break_flag'] = (df['gap_id'] >= GAP_THRESH).astype('int8')

# Restore global order
df = df.sort_values('id').reset_index(drop=True)


### 4.2 Temporal Elo with decay
We update Elo **after** each game. Each update is scaled by `exp(-gap/tau)` so **older inactivity** reduces update size.


In [None]:
def compute_elos(dfin, init=1500, k=24, tau=300.0):
    d = dfin.sort_values('id').copy()
    elo_global, elo_side, elo_role = {}, {}, {}
    last_seen = {}
    outs = []

    for gid, g in d.groupby('game_id', sort=False):
        cur = g.copy()
        cur['pre_elo']      = [elo_global.get(pid, init) for pid in cur['player_id']]
        cur['pre_elo_side'] = [elo_side.get((pid, team), init) for pid, team in zip(cur['player_id'], cur['team'])]
        cur['pre_elo_role'] = [elo_role.get((pid, role), init) for pid, role in zip(cur['player_id'], cur['role'])]

        maf_mask  = cur['team'].eq('mafia')
        mafia_mu  = cur.loc[maf_mask, 'pre_elo'].mean()
        citizen_mu= cur.loc[~maf_mask, 'pre_elo'].mean()
        exp_mafia = 1.0 / (1.0 + 10 ** ((citizen_mu - mafia_mu)/400))
        mafia_res = int(cur.loc[maf_mask, 'team_win'].iloc[0])

        for _, r in cur.iterrows():
            pid, side, role, rid = int(r['player_id']), r['team'], r['role'], int(r['id'])
            gap = rid - last_seen.get(pid, rid)
            decay = float(np.exp(-max(gap,0)/float(tau)))
            exp = exp_mafia if side=='mafia' else (1-exp_mafia)
            act = mafia_res if side=='mafia' else (1-mafia_res)
            delta = k * decay * (act - exp)

            elo_global[pid] = elo_global.get(pid,  init) + delta
            elo_side[(pid, side)] = elo_side.get((pid, side), init) + delta
            elo_role[(pid, role)] = elo_role.get((pid, role), init) + delta
            last_seen[pid] = rid

        outs.append(cur[['game_id','player_id','pre_elo','pre_elo_side','pre_elo_role']])

    elo_df = pd.concat(outs, ignore_index=True)
    return d.merge(elo_df, on=['game_id','player_id'], how='left')

work_players = compute_elos(df, init=1500, k=24, tau=300.0)


### 4.3 Side-aware rolling win rates
We track recent **team win** rates for each player **on each side** separately (mafia/citizens).


In [None]:
def add_rolling_stats_side(df, windows=(5,20)):
    d = df.sort_values(['player_id','id']).copy()
    for side in ['mafia','citizens']:
        mask = d['team'].eq(side)
        d.loc[mask, f'roll5_win_rate_{side}']  = d.loc[mask].groupby('player_id')['team_win'].shift(1).rolling(windows[0], min_periods=1).mean().values
        d.loc[mask, f'roll20_win_rate_{side}'] = d.loc[mask].groupby('player_id')['team_win'].shift(1).rolling(windows[1], min_periods=1).mean().values
        d.loc[~mask, f'roll5_win_rate_{side}']  = 0.0
        d.loc[~mask, f'roll20_win_rate_{side}'] = 0.0
    return d

work_players = add_rolling_stats_side(work_players)


### 4.4 Role-specific history
For each `(player, role)` compute:
- `games_in_role` (prior count)
- `win_rate_role_<role>_last{W}` for W in {5, 20, 50}


In [None]:
def add_role_history_stats(df, windows=(5,20,50)):
    d = df.sort_values(['player_id','role','id']).copy()
    out = []
    for (pid, role), g in d.groupby(['player_id','role'], sort=False):
        g = g.copy()
        past = g['team_win'].shift(1)
        g['games_in_role'] = np.arange(len(g), dtype=np.int32)
        for w in windows:
            g[f'win_rate_role_{role}_last{w}'] = past.rolling(w, min_periods=1).mean()
        out.append(g)
    return pd.concat(out, ignore_index=True).sort_values('id').reset_index(drop=True)

work_players = add_role_history_stats(work_players, windows=(5,20,50))


### 4.5 Same-team synergy
Count prior **same-team co-plays** for all teammate pairs before this game, aggregate per team.


In [None]:
from itertools import combinations

def add_synergy_features(df):
    d = df.copy()
    game_order = (d.groupby('game_id')['id'].max().sort_values().index.tolist())
    pair_counts = {}
    out_rows = []

    for gid in game_order:
        g = d[d['game_id'] == gid]
        for team in ['mafia', 'citizens']:
            players = g.loc[g['team']==team, 'player_id'].dropna().astype(int).tolist()
            vals = [pair_counts.get((a,b,team), 0) for a,b in combinations(sorted(players), 2)] if len(players)>=2 else []
            s_mean = float(np.mean(vals)) if vals else 0.0
            s_max  = float(np.max(vals))  if vals else 0.0
            out_rows.append((gid, team, s_mean, s_max))
        # update after
        for team in ['mafia', 'citizens']:
            players = g.loc[g['team']==team, 'player_id'].dropna().astype(int).tolist()
            if len(players)>=2:
                for a,b in combinations(sorted(players), 2):
                    pair_counts[(a,b,team)] = pair_counts.get((a,b,team),0) + 1

    team_synergy = pd.DataFrame(out_rows, columns=['game_id','team','synergy_mean_team','synergy_max_team'])
    return d.merge(team_synergy, on=['game_id','team'], how='left')

work_players = add_synergy_features(work_players)


### 4.6 Enemy familiarity (cross-team history)
Count how often each player has faced each opponent **before** this game. Aggregate per team.


In [None]:
from itertools import product

def add_enemy_familiarity_features(df):
    d = df.sort_values('id').copy()
    game_order = (d.groupby('game_id')['id'].max().sort_values().index.tolist())
    faced_counts = {}
    out_rows = []

    for gid in game_order:
        g = d[d['game_id'] == gid]
        maf = g[g['team']=='mafia']['player_id'].dropna().astype(int).tolist()
        cit = g[g['team']=='citizens']['player_id'].dropna().astype(int).tolist()

        pairs_maf = [faced_counts.get(tuple(sorted([a,b])), 0) for a,b in product(maf, cit)]
        pairs_cit = [faced_counts.get(tuple(sorted([a,b])), 0) for a,b in product(cit, maf)]

        def stats(vals):
            return (float(np.mean(vals)) if vals else 0.0,
                    float(np.max(vals))  if vals else 0.0)

        maf_mean, maf_max = stats(pairs_maf)
        cit_mean, cit_max = stats(pairs_cit)

        out_rows.append((gid,'mafia',    maf_mean, maf_max))
        out_rows.append((gid,'citizens', cit_mean, cit_max))

        for a,b in product(maf, cit):
            key = tuple(sorted([int(a),int(b)]))
            faced_counts[key] = faced_counts.get(key, 0) + 1

    fam = pd.DataFrame(out_rows, columns=['game_id','team','enemy_fam_mean_team','enemy_fam_max_team'])
    return d.merge(fam, on=['game_id','team'], how='left')

work_players = add_enemy_familiarity_features(work_players)


### 4.7 Win/Loss streaks
Compute **pre-game** consecutive win and loss streak lengths for each player.


In [None]:
def add_streak_features(df):
    d = df.sort_values(['player_id','id']).copy()
    win_streaks, loss_streaks = [], []

    for pid, g in d.groupby('player_id', sort=False):
        prev = g['team_win'].shift(1).values
        w_stk = np.zeros(len(g), dtype=np.int16)
        l_stk = np.zeros(len(g), dtype=np.int16)
        cur_w = cur_l = 0
        for i, v in enumerate(prev):
            if np.isnan(v):
                cur_w = cur_l = 0
            else:
                if v == 1:
                    cur_w += 1; cur_l = 0
                else:
                    cur_l += 1; cur_w = 0
            w_stk[i] = cur_w
            l_stk[i] = cur_l
        win_streaks.append(pd.Series(w_stk, index=g.index))
        loss_streaks.append(pd.Series(l_stk, index=g.index))

    d['win_streak']  = pd.concat(win_streaks).sort_index()
    d['loss_streak'] = pd.concat(loss_streaks).sort_index()
    return d.sort_values('id').reset_index(drop=True)

work_players = add_streak_features(work_players)


### 4.8 Games played to date (per player)
Cumulative count of past games per player (pre-game). Useful as a general “experience” signal.


In [None]:
def add_games_played_feature(df):
    d = df.sort_values(['player_id','id']).copy()
    # number of *prior* appearances (shift to avoid leakage)
    d['games_played'] = d.groupby('player_id').cumcount().astype('int32')
    return d.sort_values('id').reset_index(drop=True)

work_players = add_games_played_feature(work_players)


## 5) Team-Level Aggregation & Deltas

Aggregate player features to `(game_id, team)` rows. Then create **safe deltas**: `mafia − citizens`.  
We **never** delta the target or `meta_period`.


In [None]:
def build_team_agg(work_players, add_ratios=False, ratio_eps=1e-3):
    agg_funcs = {}

    def add_agg(col, funcs):
        if col in work_players.columns:
            agg_funcs[col] = funcs

    # Core
    add_agg('pre_elo', ['mean','std','min','max', q25, q75])
    add_agg('pre_elo_side', ['mean'])
    add_agg('pre_elo_role', ['mean'])
    add_agg('gap_id_clipped', ['mean','max'])
    add_agg('long_break_flag', ['sum'])
    add_agg('place', ['mean','std','min','max'])
    add_agg('games_played', ['mean','std','min','max'])  # if present

    # Optional blocks
    add_agg('win_streak', ['mean','max'])
    add_agg('loss_streak', ['mean','max'])
    add_agg('synergy_mean_team', ['mean'])
    add_agg('synergy_max_team',  ['mean'])
    add_agg('enemy_fam_mean_team', ['mean'])
    add_agg('enemy_fam_max_team',  ['mean'])
    add_agg('roll5_win_rate_mafia',  ['mean'])
    add_agg('roll20_win_rate_mafia', ['mean'])
    add_agg('roll5_win_rate_citizens',  ['mean'])
    add_agg('roll20_win_rate_citizens', ['mean'])
    if 'meta_period' in work_players.columns:
        agg_funcs['meta_period'] = ['first']

    base = work_players.groupby(['game_id','team']).agg(agg_funcs)
    base.columns = ['_'.join([str(x) for x in c if x not in (None,)]).replace('<function ','').replace('>','')
                    for c in base.columns]
    base = base.reset_index()

    # --- NEW: meta-period normalization for Elo stats (remove era drift) ---
    if 'meta_period_first' in base.columns:
        elo_cols = [c for c in base.columns if c.startswith('pre_elo_')]
        for col in elo_cols:
            # center within meta-period
            base[f'{col}_norm'] = base[col] - base.groupby('meta_period_first')[col].transform('mean')

    # Role-specific singletons/means
    full_idx = base.set_index(['game_id','team']).index
    # Role-specific singletons/means
    full_idx = base.set_index(['game_id','team']).index

    def single_role_stat(role, value_col, out_name):
        s = (work_players[work_players['role']==role]
             .groupby(['game_id','team'])[value_col].mean()).reindex(full_idx)
        s.name = out_name; return s

    def mean_role_stat(role, value_col, out_name):
        s = (work_players[work_players['role']==role]
             .groupby(['game_id','team'])[value_col].mean()).reindex(full_idx)
        s.name = out_name; return s

    pieces = [
        single_role_stat('don','pre_elo_role','don_pre_elo_role'),
        single_role_stat('sheriff','pre_elo_role','sheriff_pre_elo_role'),
        single_role_stat('don','place','don_place'),
        single_role_stat('sheriff','place','sheriff_place'),
        mean_role_stat('black','pre_elo_role','black_mean_pre_elo_role'),
        mean_role_stat('red','pre_elo_role','red_mean_pre_elo_role'),
        single_role_stat('don','games_in_role','don_games_in_role'),
        single_role_stat('sheriff','games_in_role','sheriff_games_in_role'),
        mean_role_stat('black','games_in_role','black_mean_games_in_role'),
        mean_role_stat('red','games_in_role','red_mean_games_in_role'),
        single_role_stat('don','win_rate_role_don_last20','don_wr20'),
        single_role_stat('sheriff','win_rate_role_sheriff_last20','sheriff_wr20'),
        mean_role_stat('black','win_rate_role_black_last20','black_mean_wr20'),
        mean_role_stat('red','win_rate_role_red_last20','red_mean_wr20'),
    ]
    role_feats = pd.concat(pieces, axis=1).reset_index()
    team_agg = base.merge(role_feats, on=['game_id','team'], how='left')

    # Label & time proxy
    labels  = work_players.groupby(['game_id','team'])['team_win'].max().rename('team_win_team')
    gmaxid  = work_players.groupby('game_id')['id'].max().rename('game_max_id')
    team_agg = team_agg.merge(labels, on=['game_id','team']).merge(gmaxid, on='game_id')

    # Safe deltas / ratios
    wide = team_agg.pivot(index='game_id', columns='team')
    wide.columns = [f"{a}__{b}" for a,b in wide.columns]
    wide = wide.reset_index()

    def side_cols(side): 
        return [c for c in wide.columns if c.endswith(f"__{side}") and c!='game_id']
    maf_cols = side_cols('mafia')

    delta = pd.DataFrame({'game_id': wide['game_id']})
    skip_prefixes = ('team_win_team','meta_period')
    for mcol in maf_cols:
        base_name = mcol[:-len("__mafia")]
        if base_name.startswith(skip_prefixes): 
            continue
        ccol = base_name + "__citizens"
        if ccol in wide.columns:
            delta[base_name + "__delta_maf_minus_cit"] = wide[mcol] - wide[ccol]
            if add_ratios:
                delta[base_name + "__ratio_maf_over_cit"] = (wide[mcol] + ratio_eps) / (wide[ccol] + ratio_eps)

    team_tall = team_agg.merge(delta, on='game_id', how='left')

    # --- NEW: a few safe interactions (helps tree models separate regimes) ---
    def safe_mul(a, b): 
        return (team_tall.get(a) if a in team_tall else 0) * (team_tall.get(b) if b in team_tall else 0)

    def safe_diff(a, b): 
        return (team_tall.get(a) if a in team_tall else 0) - (team_tall.get(b) if b in team_tall else 0)

    # Names used below exist after delta creation; if any is missing in your run, it's treated as 0
    team_tall['elo_synergy_product'] = safe_mul('pre_elo_mean__delta_maf_minus_cit',
                                                'synergy_mean_team_mean__delta_maf_minus_cit')
    team_tall['elo_enemy_gap']       = safe_diff('pre_elo_mean__delta_maf_minus_cit',
                                                'enemy_fam_mean_team_mean__delta_maf_minus_cit')
    team_tall['elo_streak_mix']      = safe_mul('pre_elo_mean__delta_maf_minus_cit',
                                                'win_streak_mean__delta_maf_minus_cit')

    return team_tall

team_tall = build_team_agg(work_players, add_ratios=False)  # ratios often redundant
sanity_assert_two_rows_per_game(team_tall)


## 6) Feature Selection

We include **team-only** features and **deltas**. We **exclude** label-like columns.
Then we create **time-aware** train/cal/test splits (70/15/15 by `game_max_id`).


In [None]:
team_only = [c for c in team_tall.columns if c.startswith((
    'pre_elo_', 'gap_id_clipped_', 'long_break_flag_', 'place_',
    'win_streak_', 'loss_streak_', 'synergy_mean_team_', 'synergy_max_team_',
    'enemy_fam_', 'games_played_', 
    'don_pre_elo_role', 'sheriff_pre_elo_role', 'black_mean_pre_elo_role', 'red_mean_pre_elo_role',
    'don_games_in_role', 'sheriff_games_in_role', 'black_mean_games_in_role', 'red_mean_games_in_role',
    'don_wr20', 'sheriff_wr20', 'black_mean_wr20', 'red_mean_wr20',
    'meta_period_first'
))]
delta_feats = [c for c in team_tall.columns if c.endswith('__delta_maf_minus_cit')]

# NEW: explicitly add our interactions and meta-normalized Elo columns
extra_feats = [c for c in ['elo_synergy_product','elo_enemy_gap','elo_streak_mix']
               if c in team_tall.columns]
meta_norm_feats = [c for c in team_tall.columns if c.endswith('_norm')]

forbidden_tokens = {'team_win','team_win_team'}
USED_FEATS = [c for c in sorted(set(team_only + delta_feats + extra_feats + meta_norm_feats))
              if not any(tok in c for tok in forbidden_tokens)]

X = team_tall[USED_FEATS].fillna(0)
y = team_tall['team_win_team'].astype(int).values
groups = team_tall['game_id'].values
time_key = team_tall['game_max_id'].values

q70, q85 = np.quantile(time_key, [0.70, 0.85])
train_mask = time_key <= q85
cal_mask   = (time_key > q70) & (time_key <= q85)
test_mask  = time_key > q85

print("Shapes | X:", X.shape, "| y:", y.shape)
print("Split sizes | train:", train_mask.sum(), "cal:", cal_mask.sum(), "test:", test_mask.sum())


## 7) Model — LightGBM (calibrated)
Train on train, calibrate on cal (sigmoid), evaluate on holdout (last 15%).


In [None]:
params = dict(
    n_estimators=1500,
    learning_rate=0.01,
    num_leaves=127,
    min_data_in_leaf=60,
    subsample=0.9,
    colsample_bytree=0.9,
    reg_lambda=1.0,
    reg_alpha=0.5,
    class_weight='balanced',
    random_state=42,
    n_jobs=-1
)

# Optional inner early stopping on last 10% of train
tr_time = time_key[train_mask]; q90 = np.quantile(tr_time, 0.90)
inner_tr = train_mask & (time_key <= q90)
inner_va = train_mask & (time_key >  q90)

lgb = LGBMClassifier(**params)
lgb.fit(
    X[inner_tr], y[inner_tr],
    eval_set=[(X[inner_va], y[inner_va])],
    eval_metric='logloss',
    callbacks=[early_stopping(100), log_evaluation(0)]
)

# Freeze the already-fitted estimator (workaround for prefit deprecation)
frozen = clone(lgb)
frozen.__dict__.update(lgb.__dict__)

# Sigmoid (Platt) calibration on the calibration slice
calibrated = CalibratedClassifierCV(frozen, cv="prefit", method="sigmoid")
calibrated.fit(X[cal_mask], y[cal_mask])

p_test = calibrated.predict_proba(X[test_mask])[:,1]

print("Holdout (last 15%)")
print("LogLoss:", log_loss(y[test_mask], p_test))
print("ROC-AUC:", roc_auc_score(y[test_mask], p_test))
print("Brier  :", brier_score_loss(y[test_mask], p_test))


## 8) Calibration Diagnostics

Reliability curve and **ECE** (Expected Calibration Error) on holdout.


In [None]:
ece = expected_calibration_error(y[test_mask], p_test, n_bins=10)
print(f"ECE (10 bins): {ece:.4f}")

prob_true, prob_pred = calibration_curve(y[test_mask], p_test, n_bins=10)
plt.figure(figsize=(5,5))
plt.plot(prob_pred, prob_true, 'o-')
plt.plot([0,1],[0,1],'--', alpha=0.6)
plt.xlabel('Predicted probability'); plt.ylabel('Observed frequency')
plt.title('Reliability curve (holdout)')
plt.grid(alpha=0.2); plt.show()


## 9) (Optional) CatBoost + Calibration + Blend

CatBoost often complements tree ensembles. We calibrate its probabilities and optionally blend.


In [None]:
if CATBOOST_AVAILABLE:
    X_train, y_train = X[train_mask], y[train_mask]
    X_cal,   y_cal   = X[cal_mask],   y[cal_mask]
    X_hold,  y_hold  = X[test_mask],  y[test_mask]

    cat_features = []
    if 'meta_period_first' in X.columns:
        cat_features.append(X.columns.get_loc('meta_period_first'))

    train_pool = Pool(X_train, y_train, cat_features=cat_features or None)
    cal_pool   = Pool(X_cal,   y_cal,   cat_features=cat_features or None)
    test_pool  = Pool(X_hold,  y_hold,  cat_features=cat_features or None)

    cat = CatBoostClassifier(
        iterations=2500,
        learning_rate=0.02,
        depth=8,
        l2_leaf_reg=3.0,
        random_seed=42,
        eval_metric='Logloss',
        loss_function='Logloss',
        class_weights=[1.0, 2.3],
        use_best_model=True,
        verbose=200
    )
    cat.fit(train_pool, eval_set=cal_pool)

   # Sigmoid (baseline)
    cat_cal_sig = CalibratedClassifierCV(cat, cv='prefit', method='sigmoid').fit(X[cal_mask], y[cal_mask])
    p_cat_sig = cat_cal_sig.predict_proba(X[test_mask])[:,1]

    # Isotonic (may help CatBoost LogLoss if enough cal data)
    cat_cal_iso = CalibratedClassifierCV(cat, cv='prefit', method='isotonic').fit(X[cal_mask], y[cal_mask])
    p_cat_iso = cat_cal_iso.predict_proba(X[test_mask])[:,1]

    # Pick the better calibrated CatBoost for blending (by LogLoss)
    ll_sig = log_loss(y[test_mask], p_cat_sig)
    ll_iso = log_loss(y[test_mask], p_cat_iso)
    if ll_iso < ll_sig:
        p_cat = p_cat_iso
        print("Using CatBoost isotonic calibration (better LogLoss).")
    else:
        p_cat = p_cat_sig
        print("Using CatBoost sigmoid calibration (better or equal LogLoss).")

    print("\nCatBoost (calibrated) — Holdout")
    print("LogLoss:", log_loss(y[test_mask], p_cat))
    print("ROC-AUC:", roc_auc_score(y[test_mask], p_cat))
    print("Brier  :", brier_score_loss(y[test_mask], p_cat))

    # blend
    w_lgbm = 0.6
    w_cat  = 0.4
    p_blend = w_lgbm * p_test + w_cat * p_cat
    print(f"\nBlend {w_lgbm:.1f}·LGBM + {w_cat:.1f}·Cat — Holdout")

    print("LogLoss:", log_loss(y[test_mask], p_blend))
    print("ROC-AUC:", roc_auc_score(y[test_mask], p_blend))
    print("Brier  :", brier_score_loss(y[test_mask], p_blend))
else:
    print("CatBoost not installed — skipping optional block.")


### (Optional) XGBoost third learner
A slightly different tree engine sometimes helps the ensemble by a small margin.


In [None]:
try:
    from xgboost import XGBClassifier
    xgb = XGBClassifier(
        n_estimators=1200, learning_rate=0.015,
        max_depth=7, subsample=0.9, colsample_bytree=0.9,
        reg_lambda=1.0, reg_alpha=0.5,
        objective='binary:logistic', eval_metric='logloss',
        random_state=42, n_jobs=-1
    )
    xgb.fit(X_train, y_train)
    p_xgb = xgb.predict_proba(X_hold)[:,1]

    # quick 3-way blend (adjust weights after checking LogLoss)
    p_blend3 = 0.5*p_lgbm_final + 0.3*(p_cat if p_cat is not None else 0) + 0.2*p_xgb
    print("\n3-way Blend — Holdout")
    print("LogLoss:", log_loss(y_hold, p_blend3))
    print("ROC-AUC:", roc_auc_score(y_hold, p_blend3))
    print("Brier  :", brier_score_loss(y_hold, p_blend3))
except Exception as e:
    print("XGBoost not available or failed to fit:", e)


## 10) Summary & Next Steps

- Leak-free **pre-game** predictor, calibrated probabilities.
- Strong signals: Elo deltas/means, synergy, streaks, role history, freshness.
- Try: different **Elo decay** `tau`, **role-pair synergy**, **CatBoost blending**, or small **Optuna** tuning.


## 11) Quick Hyper-parameter & Blend Tuning (Optuna)

We tune a handful of high-impact LightGBM parameters and the blend weights
between LGBM and CatBoost. We keep a fixed learning rate and estimators for stability.


In [None]:
import optuna
from sklearn.base import clone

X_train, y_train = X[train_mask], y[train_mask]
X_cal,   y_cal   = X[cal_mask],   y[cal_mask]
X_hold,  y_hold  = X[test_mask],  y[test_mask]

def train_lgbm_with_params(params_dict):
    lgb = LGBMClassifier(
        n_estimators=1500,
        learning_rate=0.01,
        num_leaves=int(params_dict['num_leaves']),
        min_data_in_leaf=int(params_dict['min_data_in_leaf']),
        subsample=float(params_dict['subsample']),
        colsample_bytree=float(params_dict['colsample_bytree']),
        reg_lambda=float(params_dict['reg_lambda']),
        reg_alpha=float(params_dict['reg_alpha']),
        class_weight='balanced',
        random_state=42,
        n_jobs=-1
    )
    # inner early-stop on last 10% of train time
    tr_t = time_key[train_mask]; q90 = np.quantile(tr_t, 0.90)
    inner_tr = (time_key <= q90) & train_mask
    inner_va = (time_key >  q90) & train_mask

    lgb.fit(
        X[inner_tr], y[inner_tr],
        eval_set=[(X[inner_va], y[inner_va])],
        eval_metric='logloss',
        callbacks=[early_stopping(100), log_evaluation(0)]
    )
    # Calibrate on cal split (sigmoid)
    fr = clone(lgb); fr.__dict__.update(lgb.__dict__)
    cal = CalibratedClassifierCV(fr, cv='prefit', method='sigmoid')
    cal.fit(X_cal, y_cal)
    p = cal.predict_proba(X_hold)[:,1]
    return p, lgb

# Cache CatBoost predictions from your earlier section (if available)
# If not run yet, train a small Cat here:
if CATBOOST_AVAILABLE:
    try:
        p_cat  # exists
    except NameError:
        train_pool = Pool(X_train, y_train)
        cal_pool   = Pool(X_cal,   y_cal)
        cat = CatBoostClassifier(
            iterations=2000, learning_rate=0.02, depth=8,
            l2_leaf_reg=3.0, random_seed=42,
            eval_metric='Logloss', loss_function='Logloss',
            auto_class_weights='Balanced',
            use_best_model=True, verbose=False
        )
        cat.fit(train_pool, eval_set=cal_pool, verbose=False)
        cat_cal = CalibratedClassifierCV(cat, cv='prefit', method='sigmoid').fit(X_cal, y_cal)
        p_cat = cat_cal.predict_proba(X_hold)[:,1]
else:
    p_cat = None  # tuning will ignore blend weight

def objective(trial: optuna.Trial):
    # LGBM params to tune (narrow, safe ranges)
    params = {
        'num_leaves': trial.suggest_int('num_leaves', 63, 159),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 40, 140),
        'subsample': trial.suggest_float('subsample', 0.70, 0.95),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.75, 1.00),
        'reg_lambda': trial.suggest_float('reg_lambda', 0.0, 3.0),
        'reg_alpha': trial.suggest_float('reg_alpha', 0.0, 3.0),
    }
    p_lgbm, _ = train_lgbm_with_params(params)

    if p_cat is not None:
        w_lgbm = trial.suggest_float('w_lgbm', 0.4, 0.9)
        w_cat  = 1.0 - w_lgbm
        p_blend = w_lgbm * p_lgbm + w_cat * p_cat
        loss = log_loss(y_hold, p_blend)
    else:
        loss = log_loss(y_hold, p_lgbm)

    # report auxiliary metrics
    trial.set_user_attr('AUC', roc_auc_score(y_hold, p_lgbm if p_cat is None else p_blend))
    trial.set_user_attr('Brier', brier_score_loss(y_hold, p_lgbm if p_cat is None else p_blend))
    return loss

study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=50, show_progress_bar=False)

print("Best LogLoss:", study.best_value)
print("Best params:", study.best_params)
print("Best AUC (attr):", study.best_trial.user_attrs.get('AUC'))
print("Best Brier (attr):", study.best_trial.user_attrs.get('Brier'))

# Refit final LGBM with best params and produce final blended predictions
best_params = study.best_params.copy()
w_lgbm = best_params.pop('w_lgbm', 1.0)  # present only if CatBoost available
p_lgbm_final, lgb_final = train_lgbm_with_params(best_params)

if p_cat is not None:
    w_cat = 1.0 - w_lgbm
    p_final = w_lgbm * p_lgbm_final + w_cat * p_cat
    print(f"\nFinal blend weights: LGBM={w_lgbm:.2f}, Cat={w_cat:.2f}")
else:
    p_final = p_lgbm_final

print("\nTuned — Holdout")
print("LogLoss:", log_loss(y_hold, p_final))
print("ROC-AUC:", roc_auc_score(y_hold, p_final))
print("Brier  :", brier_score_loss(y_hold, p_final))


### (Optional) Micro-sweep of Elo decay τ
Quick check around τ=400 to confirm the best setting on this dataset split.


In [None]:
# taus = [300, 350, 400, 450, 500]
# results = []
# for t in taus:
#     # --- rebuild features with this tau ---
#     wp = compute_elos(df, init=1500, k=24, tau=float(t))
#     wp = add_rolling_stats_side(wp)
#     wp = add_role_history_stats(wp, windows=(5,20,50))
#     wp = add_synergy_features(wp)
#     wp = add_enemy_familiarity_features(wp)
#     wp = add_streak_features(wp)
#     wp = add_games_played_feature(wp)
#     team_tall = build_team_agg(wp, add_ratios=False)

#     # feature selection (same logic as main flow)
#     team_only = [c for c in team_tall.columns if c.startswith((
#         'pre_elo_', 'gap_id_clipped_', 'long_break_flag_', 'place_',
#         'win_streak_', 'loss_streak_', 'synergy_mean_team_', 'synergy_max_team_',
#         'enemy_fam_', 'games_played_', 
#         'don_pre_elo_role', 'sheriff_pre_elo_role', 'black_mean_pre_elo_role', 'red_mean_pre_elo_role',
#         'don_games_in_role', 'sheriff_games_in_role', 'black_mean_games_in_role', 'red_mean_games_in_role',
#         'don_wr20', 'sheriff_wr20', 'black_mean_wr20', 'red_mean_wr20',
#         'meta_period_first'
#     ))]
#     delta_feats = [c for c in team_tall.columns if c.endswith('__delta_maf_minus_cit')]
#     extra_feats = [c for c in ['elo_synergy_product','elo_enemy_gap','elo_streak_mix'] if c in team_tall.columns]
#     meta_norm_feats = [c for c in team_tall.columns if c.endswith('_norm')]

#     forbidden_tokens = {'team_win','team_win_team'}
#     USED_FEATS = [c for c in sorted(set(team_only + delta_feats + extra_feats + meta_norm_feats))
#                   if not any(tok in c for tok in forbidden_tokens)]

#     X = team_tall[USED_FEATS].fillna(0).values
#     y = team_tall['team_win_team'].astype(int).values
#     time_key = team_tall['game_max_id'].values

#     # time-aware split
#     q70, q85 = np.quantile(time_key, [0.70, 0.85])
#     train_mask = time_key <= q85
#     cal_mask   = (time_key > q70) & (time_key <= q85)
#     test_mask  = time_key > q85

#     # inner early-stop split (last 10% of train)
#     tr_time = time_key[train_mask]
#     q90 = np.quantile(tr_time, 0.90)
#     inner_tr = train_mask & (time_key <= q90)
#     inner_va = train_mask & (time_key >  q90)

#     # model
#     lgb = LGBMClassifier(
#         n_estimators=1500, learning_rate=0.01,
#         num_leaves=127, min_data_in_leaf=60,
#         subsample=0.9, colsample_bytree=0.9,
#         reg_lambda=1.0, reg_alpha=0.5,
#         class_weight='balanced', random_state=42, n_jobs=-1
#     )
#     lgb.fit(
#         X[inner_tr], y[inner_tr],
#         eval_set=[(X[inner_va], y[inner_va])],
#         eval_metric='logloss',
#         callbacks=[early_stopping(100), log_evaluation(0)]
#     )

#     # calibration on cal split — use the *fitted* estimator directly
#     cal = CalibratedClassifierCV(lgb, cv='prefit', method='sigmoid').fit(X[cal_mask], y[cal_mask])
#     p = cal.predict_proba(X[test_mask])[:,1]

#     auc = roc_auc_score(y[test_mask], p)
#     ll  = log_loss(y[test_mask], p)
#     print(f"tau={t}  AUC={auc:.4f}  LogLoss={ll:.4f}")
#     results.append((t, auc, ll))

# # pick best by LogLoss
# best = min(results, key=lambda x: x[2])
# print("Best (by LogLoss):", best)
