
# üèÄ NBA Predictor ‚Äî Team Boosts + **Exact Full-Name NBA Filter**
**Strictly includes only the 30 official NBA franchises** by **full name** (city + nickname), e.g., ‚ÄúOklahoma City Thunder‚Äù, ‚ÄúGolden State Warriors‚Äù.  
This prevents G-League lookalikes from slipping in.

What‚Äôs inside:
- Exact, case-insensitive match on **full team name**
- Manual team boosts 
- Playoff bias, continuity, and last-season finalists bias
- Conference-safe winners + NBA champ from finalists
- Saves CSVs each run


In [1]:
# If needed:
# !pip install nba_api pandas numpy scikit-learn joblib

In [2]:
import os, time, random, datetime, re
import numpy as np
import pandas as pd
from typing import List, Dict
from nba_api.stats.endpoints import leaguedashteamstats, leaguedashplayerstats
from sklearn.model_selection import LeaveOneGroupOut
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from joblib import dump, load

RATE_LIMIT_SEC = 0.6
RANDOM_STATE = 42


def _sleep():
    time.sleep(RATE_LIMIT_SEC + random.uniform(0, 0.1))


In [3]:
# === Official full team names (30) ===
NBA_FULL_NAMES = [
    "Atlanta Hawks",
    "Boston Celtics",
    "Brooklyn Nets",
    "Charlotte Hornets",
    "Chicago Bulls",
    "Cleveland Cavaliers",
    "Dallas Mavericks",
    "Denver Nuggets",
    "Detroit Pistons",
    "Golden State Warriors",
    "Houston Rockets",
    "Indiana Pacers",
    "LA Clippers",
    "Los Angeles Lakers",
    "Memphis Grizzlies",
    "Miami Heat",
    "Milwaukee Bucks",
    "Minnesota Timberwolves",
    "New Orleans Pelicans",
    "New York Knicks",
    "Oklahoma City Thunder",
    "Orlando Magic",
    "Philadelphia 76ers",
    "Phoenix Suns",
    "Portland Trail Blazers",
    "Sacramento Kings",
    "San Antonio Spurs",
    "Toronto Raptors",
    "Utah Jazz",
    "Washington Wizards",
]
# Map from abbreviation to full name (for fallback if needed)
ABBR_TO_FULL = {
    "ATL": "Atlanta Hawks",
    "BOS": "Boston Celtics",
    "BKN": "Brooklyn Nets",
    "CHA": "Charlotte Hornets",
    "CHI": "Chicago Bulls",
    "CLE": "Cleveland Cavaliers",
    "DAL": "Dallas Mavericks",
    "DEN": "Denver Nuggets",
    "DET": "Detroit Pistons",
    "GSW": "Golden State Warriors",
    "HOU": "Houston Rockets",
    "IND": "Indiana Pacers",
    "LAC": "LA Clippers",
    "LAL": "Los Angeles Lakers",
    "MEM": "Memphis Grizzlies",
    "MIA": "Miami Heat",
    "MIL": "Milwaukee Bucks",
    "MIN": "Minnesota Timberwolves",
    "NOP": "New Orleans Pelicans",
    "NYK": "New York Knicks",
    "OKC": "Oklahoma City Thunder",
    "ORL": "Orlando Magic",
    "PHI": "Philadelphia 76ers",
    "PHX": "Phoenix Suns",
    "POR": "Portland Trail Blazers",
    "SAC": "Sacramento Kings",
    "SAS": "San Antonio Spurs",
    "TOR": "Toronto Raptors",
    "UTA": "Utah Jazz",
    "WAS": "Washington Wizards",
}
TEAM_TO_CONF = {
    "ATL": "East",
    "BOS": "East",
    "BKN": "East",
    "CHA": "East",
    "CHI": "East",
    "CLE": "East",
    "DET": "East",
    "IND": "East",
    "MIA": "East",
    "MIL": "East",
    "NYK": "East",
    "ORL": "East",
    "PHI": "East",
    "TOR": "East",
    "WAS": "East",
    "DAL": "West",
    "DEN": "West",
    "GSW": "West",
    "HOU": "West",
    "LAC": "West",
    "LAL": "West",
    "MEM": "West",
    "MIN": "West",
    "NOP": "West",
    "OKC": "West",
    "PHX": "West",
    "POR": "West",
    "SAC": "West",
    "SAS": "West",
    "UTA": "West",
}


# Normalization helpers for exact (case-insensitive) match
def _norm(s: str) -> str:
    return " ".join(str(s).strip().split()).lower()


NBA_FULL_NAMES_NORM = {_norm(n) for n in NBA_FULL_NAMES}


In [4]:
# Finals/champs maps (training labels)
FINALS = {
    "2014-15": {"E": "CLE", "W": "GSW"},
    "2015-16": {"E": "CLE", "W": "GSW"},
    "2016-17": {"E": "CLE", "W": "GSW"},
    "2017-18": {"E": "CLE", "W": "GSW"},
    "2018-19": {"E": "TOR", "W": "GSW"},
    "2019-20": {"E": "MIA", "W": "LAL"},
    "2020-21": {"E": "MIL", "W": "PHX"},
    "2021-22": {"E": "BOS", "W": "GSW"},
    "2022-23": {"E": "MIA", "W": "DEN"},
    "2023-24": {"E": "BOS", "W": "DAL"},
}
CHAMPIONS = {
    "2014-15": "GSW",
    "2015-16": "CLE",
    "2016-17": "GSW",
    "2017-18": "GSW",
    "2018-19": "TOR",
    "2019-20": "LAL",
    "2020-21": "MIL",
    "2021-22": "GSW",
    "2022-23": "DEN",
    "2023-24": "BOS",
}
TRAIN_SEASONS = list(CHAMPIONS.keys())

MODELS_DIR = "models_v3"
PRED_DIR = "predictions_v3"
os.makedirs(MODELS_DIR, exist_ok=True)
os.makedirs(PRED_DIR, exist_ok=True)


In [5]:
ABBR_TO_NAME_REGEX = {
    "ATL": r"Hawks",
    "BOS": r"Celtics",
    "BKN": r"Nets",
    "CHA": r"Hornets|Bobcats",
    "CHI": r"Bulls",
    "CLE": r"Cavaliers|Cavs",
    "DAL": r"Mavericks|Mavs",
    "DEN": r"Nuggets",
    "DET": r"Pistons",
    "GSW": r"Warriors",
    "HOU": r"Rockets",
    "IND": r"Pacers",
    "LAC": r"Clippers",
    "LAL": r"Lakers",
    "MEM": r"Grizzlies",
    "MIA": r"Heat",
    "MIL": r"Bucks",
    "MIN": r"Timberwolves|Wolves",
    "NOP": r"Pelicans",
    "NYK": r"Knicks",
    "OKC": r"Thunder",
    "ORL": r"Magic",
    "PHI": r"(76ers|Sixers|Seventy[- ]?Sixers)",
    "PHX": r"Suns",
    "POR": r"(Trail\s*Blazers|Blazers)",
    "SAC": r"Kings",
    "SAS": r"Spurs",
    "TOR": r"Raptors",
    "UTA": r"Jazz",
    "WAS": r"Wizards",
}


def match_by_abbr_or_name(df: pd.DataFrame, target_abbr: str) -> pd.Series:
    if "TEAM_ABBREVIATION" in df.columns and target_abbr in set(
        df["TEAM_ABBREVIATION"]
    ):
        return df["TEAM_ABBREVIATION"] == target_abbr
    pat = ABBR_TO_NAME_REGEX.get(target_abbr, target_abbr)
    if "TEAM_NAME" in df.columns:
        return (
            df["TEAM_NAME"]
            .astype(str)
            .str.contains(pat, case=False, regex=True, na=False)
        )
    return pd.Series(False, index=df.index)


In [6]:
def fetch_team_stats(season: str, date_to: str | None = None) -> pd.DataFrame:
    """Fetch and merge base/advanced, then STRICTLY filter to official NBA full names (exact match)."""
    _sleep()
    adv = leaguedashteamstats.LeagueDashTeamStats(
        season=season,
        season_type_all_star="Regular Season",
        measure_type_detailed_defense="Advanced",
        per_mode_detailed="Totals",
        date_to_nullable=date_to,
    ).get_data_frames()[0]
    _sleep()
    base = leaguedashteamstats.LeagueDashTeamStats(
        season=season,
        season_type_all_star="Regular Season",
        measure_type_detailed_defense="Base",
        per_mode_detailed="Totals",
        date_to_nullable=date_to,
    ).get_data_frames()[0]

    adv_keep = [
        "TEAM_ID",
        "TEAM_NAME",
        "TEAM_ABBREVIATION",
        "OFF_RATING",
        "DEF_RATING",
        "NET_RATING",
        "PACE",
        "TS_PCT",
        "EFG_PCT",
    ]
    base_keep = [
        "TEAM_ID",
        "TEAM_NAME",
        "TEAM_ABBREVIATION",
        "W",
        "L",
        "W_PCT",
        "GP",
        "PTS",
        "REB",
        "AST",
        "STL",
        "BLK",
        "TOV",
    ]
    adv_df = adv[[c for c in adv_keep if c in adv.columns]].copy()
    base_df = base[[c for c in base_keep if c in base.columns]].copy()

    # Merge on best available key
    key = (
        "TEAM_ID"
        if "TEAM_ID" in adv_df.columns and "TEAM_ID" in base_df.columns
        else (
            "TEAM_ABBREVIATION"
            if "TEAM_ABBREVIATION" in adv_df.columns
            and "TEAM_ABBREVIATION" in base_df.columns
            else "TEAM_NAME"
        )
    )
    merged = pd.merge(
        base_df,
        adv_df.drop(
            columns=[
                c
                for c in ["TEAM_ID", "TEAM_ABBREVIATION", "TEAM_NAME"]
                if c != key and c in adv_df.columns
            ]
        ),
        on=key,
        how="left",
    )

    # === FULL NAME exact-match filter ===
    # If TEAM_NAME missing/empty, attempt to fill from ABBR
    if "TEAM_NAME" not in merged.columns:
        merged["TEAM_NAME"] = ""
    if "TEAM_ABBREVIATION" in merged.columns and merged["TEAM_NAME"].eq("").any():
        fill = (
            merged.loc[merged["TEAM_NAME"].eq(""), "TEAM_ABBREVIATION"]
            .map(ABBR_TO_FULL)
            .fillna("")
        )
        merged.loc[merged["TEAM_NAME"].eq(""), "TEAM_NAME"] = fill

    norm_names = (
        merged.get("TEAM_NAME", pd.Series("", index=merged.index))
        .astype(str)
        .apply(_norm)
    )
    merged = merged[norm_names.isin(NBA_FULL_NAMES_NORM)].copy()

    # Conference mapping (safe)
    abbr_col = (
        merged["TEAM_ABBREVIATION"]
        if "TEAM_ABBREVIATION" in merged.columns
        else pd.Series("", index=merged.index)
    )
    merged["conference"] = abbr_col.map(TEAM_TO_CONF).fillna("")
    merged["season"] = season

    # Coerce numerics
    for c in merged.columns:
        if c not in [
            "TEAM_ID",
            "TEAM_ABBREVIATION",
            "TEAM_NAME",
            "season",
            "conference",
        ]:
            merged[c] = pd.to_numeric(merged[c], errors="coerce")
    return merged.fillna(0)


In [7]:
def build_training_table_safe(seasons: List[str]) -> pd.DataFrame:
    frames, used = [], []
    for s in seasons:
        df = fetch_team_stats(s)
        champ_abbr = CHAMPIONS.get(s, "")
        finals = FINALS.get(s, {"E": "", "W": ""})
        east_abbr, west_abbr = finals.get("E", ""), finals.get("W", "")
        is_nba_champ = match_by_abbr_or_name(df, champ_abbr)
        is_conf_champ = match_by_abbr_or_name(df, east_abbr) | match_by_abbr_or_name(
            df, west_abbr
        )
        if is_nba_champ.sum() == 0 or is_conf_champ.sum() < 2:
            print(f"[warn] Skip {s}: labels not found")
            continue
        d = df.copy()
        d["is_nba_champ"] = is_nba_champ.values.astype(bool)
        d["is_conf_champ"] = is_conf_champ.values.astype(bool)
        frames.append(d)
        used.append(s)
    if not frames:
        raise ValueError("No labeled seasons.")
    full = pd.concat(frames, ignore_index=True)
    print("Training seasons used:", used)
    return full


In [8]:
def fit_binary_model(train_df: pd.DataFrame, target_col: str):
    feats = [
        c
        for c in train_df.columns
        if c
        not in [
            "TEAM_ID",
            "TEAM_ABBREVIATION",
            "TEAM_NAME",
            "season",
            "conference",
            "is_nba_champ",
            "is_conf_champ",
        ]
    ]
    X = train_df[feats].values
    y = train_df[target_col].astype(int).values
    groups = train_df["season"].values
    pipe = Pipeline(
        [
            ("scaler", StandardScaler()),
            (
                "logreg",
                LogisticRegression(
                    max_iter=5000, class_weight="balanced", random_state=RANDOM_STATE
                ),
            ),
        ]
    )
    logo = LeaveOneGroupOut()
    _acc = []
    for tr, te in logo.split(X, y, groups):
        pipe.fit(X[tr], y[tr])
        p = pipe.predict_proba(X[te])[:, 1]
        pred = np.zeros_like(p)
        pred[np.argsort(-p)[: (2 if target_col == "is_conf_champ" else 1)]] = 1
        _acc.append(accuracy_score(y[te], pred))
    print(f"CV({target_col}) mean acc: {np.mean(_acc):.3f}")
    pipe.fit(X, y)
    return pipe, feats


In [9]:
# --- Bias blocks ---
def _prev_season(s: str) -> str:
    y1, _ = s.split("-")
    a = int(y1)
    return f"{a - 1}-{(a - 1) % 100:02d}"


def _team_minutes_by_team(season: str) -> pd.DataFrame:
    _sleep()
    df = leaguedashplayerstats.LeagueDashPlayerStats(
        season=season, per_mode_detailed="Totals"
    ).get_data_frames()[0]
    return df.rename(
        columns={"TEAM_ABBREVIATION": "TEAM", "PLAYER_ID": "pid", "MIN": "MIN"}
    )[["TEAM", "pid", "MIN"]]


def roster_continuity(prev_season: str, curr_season: str) -> Dict[str, float]:
    try:
        prev = _team_minutes_by_team(prev_season)
        curr = _team_minutes_by_team(curr_season)
    except Exception as e:
        print("[warn] roster continuity fetch failed:", e)
        return {}
    cont = {}
    for team in sorted(set(prev["TEAM"]).union(set(curr["TEAM"]))):
        a = prev[prev["TEAM"] == team]
        b = curr[curr["TEAM"] == team]
        if a.empty or b.empty:
            cont[team] = 0.0
            continue
        prev_total = a["MIN"].sum()
        stayed = a[a["pid"].isin(set(b["pid"]))]["MIN"].sum()
        cont[team] = float(stayed / prev_total) if prev_total > 0 else 0.0
    return cont


def playoff_bias(prev_season: str) -> Dict[str, float]:
    df = fetch_team_stats(prev_season, date_to=None)
    conf = df.get("conference", pd.Series("", index=df.index)).astype(str)
    if conf.eq("").any():
        abbr_col = df.get("TEAM_ABBREVIATION", pd.Series("", index=df.index))
        conf = conf.where(conf.ne(""), abbr_col.map(TEAM_TO_CONF).fillna(""))
    df = df.assign(conference=conf)
    metric = df.get("W_PCT", pd.Series(0, index=df.index)).astype(float)
    if metric.eq(0).all() and "NET_RATING" in df.columns:
        metric = df["NET_RATING"].astype(float)
    bias = {abbr: 1.0 for abbr in ABBR_TO_FULL.keys()}
    for side in ("East", "West"):
        ix = df["conference"].str.upper().eq(side.upper())
        if not ix.any():
            continue
        order = metric[ix].sort_values(ascending=False)
        top10 = order.index[: min(10, len(order))]
        top4 = order.index[: min(4, len(order))]
        abbrs10 = df.loc[top10, "TEAM_ABBREVIATION"].astype(str)
        abbrs4 = df.loc[top4, "TEAM_ABBREVIATION"].astype(str)
        for a in abbrs10:
            bias[a] = bias.get(a, 1.0) * 1.06
        for a in abbrs4:
            bias[a] = bias.get(a, 1.0) * 1.04
    return bias


# === Manual Team Boosts ===
MANUAL_TEAM_BIAS = {
    # West
    "OKC": 1.10,
    "DEN": 1.08,
    "HOU": 1.06,
    "LAL": 1.05,
    "GSW": 1.05,
    # East
    "CLE": 1.08,
    "NYK": 1.08,
    "DET": 1.06,
}


def compute_bias(season_current: str) -> Dict[str, float]:
    prev = _prev_season(season_current)
    bias = playoff_bias(prev)  # top10/top4 bias
    # champ/finalists
    champ = CHAMPIONS.get(prev, "")
    fins = FINALS.get(prev, {"E": "", "W": ""})
    finalists = {fins.get("E", ""), fins.get("W", "")}
    if champ:
        bias[champ] = bias.get(champ, 1.0) * 1.08
    for t in finalists:
        if t and t != champ:
            bias[t] = bias.get(t, 1.0) * 1.05
    # manual boosts
    for abbr, mult in MANUAL_TEAM_BIAS.items():
        if abbr in ABBR_TO_FULL:
            bias[abbr] = bias.get(abbr, 1.0) * float(mult)
    # continuity
    cont = roster_continuity(prev, season_current)
    if cont:
        vals = np.array(list(cont.values()))
        mu = float(np.mean(vals)) if len(vals) > 0 else 0.0
        for team, c in cont.items():
            if team not in bias:
                bias[team] = 1.0
            factor = 1.0 + 0.15 * (c - mu) / 0.35
            factor = float(np.clip(factor, 0.90, 1.15))
            bias[team] *= factor
    return bias


In [10]:
def _contender_mask(
    df: pd.DataFrame,
    k_per_conf: int = 12,
    min_wpct: float = 0.40,
    min_net: float = -1.0,
) -> pd.Series:
    keep = pd.Series(True, index=df.index)
    if "W_PCT" in df.columns:
        keep &= pd.to_numeric(df["W_PCT"], errors="coerce").fillna(0) >= min_wpct
    if "NET_RATING" in df.columns:
        keep &= pd.to_numeric(df["NET_RATING"], errors="coerce").fillna(-999) >= min_net
    conf = df.get("conference", pd.Series("", index=df.index)).astype(str)
    metric = (
        pd.to_numeric(
            df.get("W_PCT", pd.Series(0, index=df.index)), errors="coerce"
        ).fillna(0)
        if "W_PCT" in df.columns
        else pd.to_numeric(
            df.get("NET_RATING", pd.Series(-999, index=df.index)), errors="coerce"
        ).fillna(-999)
    )
    if conf.str.len().gt(0).any():
        keep_topk = pd.Series(False, index=df.index)
        for side in ("East", "West"):
            ix = conf.str.upper().eq(side.upper())
            if ix.any():
                order = metric[ix].sort_values(ascending=False)
                keep_topk.loc[order.index[:k_per_conf]] = True
        keep &= keep_topk
    if not keep.any():
        order = metric.sort_values(ascending=False)
        keep = pd.Series(False, index=df.index)
        keep.loc[order.index[:20]] = True
    for side in ("East", "West"):
        if not keep[conf.str.upper().eq(side.upper())].any():
            ix = conf.str.upper().eq(side.upper())
            if ix.any():
                order = metric[ix].sort_values(ascending=False)
                keep.loc[order.index[:1]] = True
    return keep


In [11]:
def predict_season_probs(
    pipeline: Pipeline,
    feature_cols: List[str],
    season: str,
    date_to: str | None = None,
    bias: Dict[str, float] | None = None,
) -> pd.DataFrame:
    df = fetch_team_stats(season, date_to)
    for c in feature_cols:
        if c not in df.columns:
            df[c] = 0
    X = df[feature_cols].values
    proba = pipeline.predict_proba(X)[:, 1]

    out = pd.DataFrame(index=df.index)
    out["TEAM_ABBREVIATION"] = df.get("TEAM_ABBREVIATION", "")
    out["TEAM_NAME"] = df.get("TEAM_NAME", "")
    conf = df.get("conference", pd.Series("", index=df.index)).astype(str)
    if conf.eq("").any():
        conf = conf.where(
            conf.ne(""), out["TEAM_ABBREVIATION"].map(TEAM_TO_CONF).fillna("")
        )
    out["conference"] = conf
    out["season"] = season

    # Apply bias in logit space
    p = np.clip(proba, 1e-6, 1 - 1e-6)
    base_score = np.log(p / (1 - p))
    if bias is None:
        bias = {}
    bf = np.array(
        [bias.get(str(abbr), 1.0) for abbr in out["TEAM_ABBREVIATION"].astype(str)]
    )
    score = base_score + np.log(np.clip(bf, 1e-3, 1e3))

    # Contender filter
    cand = df.copy().drop(
        columns=[c for c in df.columns if c == "conference"], errors="ignore"
    )
    cand["conference"] = out["conference"].values
    mask = _contender_mask(cand)
    if mask.any():
        out = out.loc[mask].copy()
        score = score[mask.values]

    exp_scores = np.exp(score - score.max())
    out["prob"] = exp_scores / exp_scores.sum()
    out["raw"] = p if p.shape[0] == out.shape[0] else 0.0
    return out.sort_values("prob", ascending=False).reset_index(drop=True)


In [12]:
# Conference-safe winners
EAST_NAME_HINTS = [
    "Hawks",
    "Celtics",
    "Nets",
    "Knicks",
    "76ers",
    "Sixers",
    "Raptors",
    "Bulls",
    "Cavaliers",
    "Pistons",
    "Pacers",
    "Heat",
    "Bucks",
    "Magic",
    "Wizards",
    "Hornets",
    "Bobcats",
]
WEST_NAME_HINTS = [
    "Mavericks",
    "Nuggets",
    "Warriors",
    "Rockets",
    "Clippers",
    "Lakers",
    "Grizzlies",
    "Timberwolves",
    "Pelicans",
    "Thunder",
    "Suns",
    "Trail Blazers",
    "Blazers",
    "Kings",
    "Spurs",
    "Jazz",
    "SuperSonics",
    "Sonics",
]


def _infer_conf_from_name(name: str) -> str:
    if not isinstance(name, str):
        return ""
    for k in EAST_NAME_HINTS:
        if re.search(k, name, flags=re.I):
            return "East"
    for k in WEST_NAME_HINTS:
        if re.search(k, name, flags=re.I):
            return "West"
    return ""


def conference_winners_from_probs(conf_probs: pd.DataFrame):
    cp = conf_probs.copy()
    if "TEAM_ABBREVIATION" not in cp.columns:
        cp["TEAM_ABBREVIATION"] = ""
    if "TEAM_NAME" not in cp.columns:
        cp["TEAM_NAME"] = ""
    if "prob" not in cp.columns:
        if "raw" in cp.columns:
            sc = cp["raw"].to_numpy()
            ex = np.exp(sc - np.nanmax(sc))
            cp["prob"] = (
                ex / np.nansum(ex) if np.nansum(ex) > 0 else 1.0 / max(len(cp), 1)
            )
        else:
            cp["prob"] = 1.0 / max(len(cp), 1)

    conf = cp.get("conference", pd.Series("", index=cp.index)).astype(str)
    conf = conf.where(conf.ne(""), cp["TEAM_ABBREVIATION"].map(TEAM_TO_CONF).fillna(""))
    miss = conf.eq("")
    if miss.any():
        conf.loc[miss] = cp.loc[miss, "TEAM_NAME"].map(_infer_conf_from_name).fillna("")
    cp["conference"] = conf

    east = (
        cp[cp["conference"].str.upper().eq("EAST")]
        .copy()
        .sort_values("prob", ascending=False)
    )
    west = (
        cp[cp["conference"].str.upper().eq("WEST")]
        .copy()
        .sort_values("prob", ascending=False)
    )

    if east.empty:
        east_by_abbr = cp[
            cp["TEAM_ABBREVIATION"].map(TEAM_TO_CONF).fillna("").str.upper().eq("EAST")
        ]
        if not east_by_abbr.empty:
            east = east_by_abbr.sort_values("prob", ascending=False).head(1).copy()
            east["conference"] = "East"
    if west.empty:
        west_by_abbr = cp[
            cp["TEAM_ABBREVIATION"].map(TEAM_TO_CONF).fillna("").str.upper().eq("WEST")
        ]
        if not west_by_abbr.empty:
            west = west_by_abbr.sort_values("prob", ascending=False).head(1).copy()
            west["conference"] = "West"

    if east.empty and not cp.empty:
        east = cp.sort_values("prob", ascending=False).head(1).copy()
        east["conference"] = "East"
    if west.empty and not cp.empty:
        tmp = cp[~cp.index.isin(east.index)] if not east.empty else cp
        if tmp.empty:
            tmp = cp
        west = tmp.sort_values("prob", ascending=False).head(1).copy()
        west["conference"] = "West"

    if east.empty or west.empty:
        raise ValueError(
            "Conference split could not be formed even after fallbacks ‚Äî check input conf_probs."
        )

    return (
        east.reset_index(drop=True),
        west.reset_index(drop=True),
        east.iloc[0],
        west.iloc[0],
    )


In [14]:
# === AUTO‚ÄëUPDATE: Current season ‚Äî Full-name exact filter ===
TARGET_SEASON = "2025-26"  # change if needed
DATE_TO = datetime.datetime.now().strftime("%m/%d/%Y")

conf_path = os.path.join("models_v3", "conf_model_v3.joblib")
nba_path = os.path.join("models_v3", "nba_model_v3.joblib")

if not (os.path.exists(conf_path) and os.path.exists(nba_path)):
    print("Training models...")
    train = build_training_table_safe(list(CHAMPIONS.keys()))
    conf_model, conf_features = fit_binary_model(train, "is_conf_champ")
    nba_model, nba_features = fit_binary_model(train, "is_nba_champ")
    dump((conf_model, conf_features), conf_path)
    dump((nba_model, nba_features), nba_path)
else:
    conf_model, conf_features = load(conf_path)
    nba_model, nba_features = load(nba_path)

bias_map = compute_bias(TARGET_SEASON)

conf_probs = predict_season_probs(
    conf_model, conf_features, TARGET_SEASON, date_to=DATE_TO, bias=bias_map
)
east_tbl, west_tbl, east_winner, west_winner = conference_winners_from_probs(conf_probs)

nba_full = predict_season_probs(
    nba_model, nba_features, TARGET_SEASON, date_to=DATE_TO, bias=bias_map
)


def _row_match(df, abbr=None, name=None):
    if abbr and "TEAM_ABBREVIATION" in df.columns:
        m = df["TEAM_ABBREVIATION"].astype(str).str.upper().eq(str(abbr).upper())
        if m.any():
            return df[m]
    if name and "TEAM_NAME" in df.columns:
        m = df["TEAM_NAME"].astype(str).str.lower().str.contains(str(name).lower())
        if m.any():
            return df[m]
    return df.iloc[0:0]


cand_e = _row_match(
    nba_full, east_winner.get("TEAM_ABBREVIATION"), east_winner.get("TEAM_NAME")
)
cand_w = _row_match(
    nba_full, west_winner.get("TEAM_ABBREVIATION"), west_winner.get("TEAM_NAME")
)
nba_finalists = (
    pd.concat([cand_e, cand_w], axis=0).drop_duplicates().reset_index(drop=True)
)
if len(nba_finalists) < 2:
    add_e = nba_full[nba_full["conference"].astype(str).str.upper().eq("EAST")].head(1)
    add_w = nba_full[nba_full["conference"].astype(str).str.upper().eq("WEST")].head(1)
    nba_finalists = (
        pd.concat([nba_finalists, add_e, add_w], axis=0)
        .drop_duplicates()
        .head(2)
        .reset_index(drop=True)
    )

scores = nba_finalists["prob"].to_numpy()
exp_scores = np.exp(scores - scores.max())
nba_finalists["prob_norm"] = exp_scores / exp_scores.sum()
nba_winner = nba_finalists.sort_values("prob_norm", ascending=False).iloc[0]

stamp = DATE_TO.replace("/", "")
os.makedirs(PRED_DIR, exist_ok=True)
conf_csv = os.path.join(PRED_DIR, f"conf_probs_{TARGET_SEASON}_{stamp}.csv")
nba_csv = os.path.join(PRED_DIR, f"nba_probs_{TARGET_SEASON}_{stamp}.csv")
finals_csv = os.path.join(PRED_DIR, f"finals_{TARGET_SEASON}_{stamp}.csv")
conf_probs.to_csv(conf_csv, index=False)
nba_full.to_csv(nba_csv, index=False)
nba_finalists.to_csv(finals_csv, index=False)

print(f"Saved: {conf_csv}\nSaved: {nba_csv}\nSaved: {finals_csv}")
print("\nTop 10 ‚Äî East (conf champion probs)")
display(east_tbl.head(10))
print("Top 10 ‚Äî West (conf champion probs)")
display(west_tbl.head(10))
print("Top 10 ‚Äî NBA Champion probs (full league, exact-name filter)")
display(nba_full.head(10))

print(
    f"\nü•á Predicted East Champion: {east_winner['TEAM_NAME']} ({east_winner.get('TEAM_ABBREVIATION', '')}) ‚Äî {east_winner['prob']:.2%}"
)
print(
    f"ü•á Predicted West Champion: {west_winner['TEAM_NAME']} ({west_winner.get('TEAM_ABBREVIATION', '')}) ‚Äî {west_winner['prob']:.2%}"
)
print(
    f"\nüèÜ Predicted NBA Champion (from E/W winners only): {nba_winner['TEAM_NAME']} ({nba_winner.get('TEAM_ABBREVIATION', '')}) ‚Äî {nba_winner['prob_norm']:.2%}"
)


Saved: predictions_v3/conf_probs_2025-26_11282025.csv
Saved: predictions_v3/nba_probs_2025-26_11282025.csv
Saved: predictions_v3/finals_2025-26_11282025.csv

Top 10 ‚Äî East (conf champion probs)


Unnamed: 0,TEAM_ABBREVIATION,TEAM_NAME,conference,season,prob,raw
0,,Toronto Raptors,East,2025-26,0.148662,0.0
1,,Detroit Pistons,East,2025-26,0.057013,0.0
2,,Atlanta Hawks,East,2025-26,0.026392,0.0
3,,Miami Heat,East,2025-26,0.023538,0.0
4,,Boston Celtics,East,2025-26,0.007661,0.0
5,,New York Knicks,East,2025-26,0.007563,0.0
6,,Cleveland Cavaliers,East,2025-26,0.005341,0.0
7,,Orlando Magic,East,2025-26,0.001825,0.0
8,,Charlotte Hornets,East,2025-26,0.00176,0.0


Top 10 ‚Äî West (conf champion probs)


Unnamed: 0,TEAM_ABBREVIATION,TEAM_NAME,conference,season,prob,raw
0,,Oklahoma City Thunder,West,2025-26,0.530683,0.0
1,,San Antonio Spurs,West,2025-26,0.064941,0.0
2,,Los Angeles Lakers,West,2025-26,0.049296,0.0
3,,Denver Nuggets,West,2025-26,0.040394,0.0
4,,Minnesota Timberwolves,West,2025-26,0.011567,0.0
5,,Phoenix Suns,West,2025-26,0.008717,0.0
6,,Houston Rockets,West,2025-26,0.008533,0.0
7,,Golden State Warriors,West,2025-26,0.00468,0.0
8,,Portland Trail Blazers,West,2025-26,0.001435,0.0


Top 10 ‚Äî NBA Champion probs (full league, exact-name filter)


Unnamed: 0,TEAM_ABBREVIATION,TEAM_NAME,conference,season,prob,raw
0,,Oklahoma City Thunder,,2025-26,0.237872,0.0
1,,Toronto Raptors,,2025-26,0.169502,0.0
2,,Detroit Pistons,,2025-26,0.143452,0.0
3,,Atlanta Hawks,,2025-26,0.10237,0.0
4,,Los Angeles Lakers,,2025-26,0.088834,0.0
5,,San Antonio Spurs,,2025-26,0.088283,0.0
6,,Denver Nuggets,,2025-26,0.050611,0.0
7,,Miami Heat,,2025-26,0.035989,0.0
8,,Minnesota Timberwolves,,2025-26,0.017008,0.0
9,,Phoenix Suns,,2025-26,0.01631,0.0



ü•á Predicted East Champion: Toronto Raptors () ‚Äî 14.87%
ü•á Predicted West Champion: Oklahoma City Thunder () ‚Äî 53.07%

üèÜ Predicted NBA Champion (from E/W winners only): Oklahoma City Thunder () ‚Äî 51.71%
