# Toy Contract Valuation — Aging Curve (Proxy) → wOBA Projection → WAR → $

**Portfolio demo** of a transparent valuation path for hitters using only season-level features.

Steps:
1. Build a **proxy aging curve** using within-player season-to-season changes (experience index as a surrogate for age).
2. Project **wOBA** using a blend of xwOBA & wOBA with PA-based shrinkage toward league mean + aging adjustment.
3. Convert to **wRAA** (with fixed wOBA scale), then **WAR** (runs per win), then **$** (constant $/WAR).
4. Show **risk bands** via empirical residuals (percentiles).

> In production, you would use true ages, park factors, positional adjustments, defense/base-running, and season-specific run environments.


In [None]:
# Setup
import numpy as np
import pandas as pd
from pathlib import Path

PROC_DIR = Path("data/processed")
SAMPLE_DIR = Path("sample_data")

# Load hitters features (season-level)
hit_path = PROC_DIR / "hitters_features.csv"
if not hit_path.exists():
    alt_path = SAMPLE_DIR / "hitters_features_sample.csv"
    hit_path = alt_path if alt_path.exists() else None

if hit_path and hit_path.exists():
    hit = pd.read_csv(hit_path)
else:
    hit = pd.DataFrame()

# Clean types
if "season" in hit.columns:
    hit["season"] = pd.to_numeric(hit["season"], errors="coerce").astype("Int64")
for c in ["pa","woba","xwoba","avg_ev","k_rate","bb_rate","barrel_rate"]:
    if c in hit.columns:
        hit[c] = pd.to_numeric(hit[c], errors="coerce")

print("Hitters features:", hit.shape)
display(hit.head(3))

## Build a proxy aging curve

We construct an **experience index** per player: 0 for their first observed season in the dataset, 1 for next, etc.  
Then compute `ΔwOBA = wOBA(t+1) - wOBA(t)` and average by experience to get an empirical adjustment curve.

In [None]:
# Ensure we have minimal columns
req = {"player_id","player_name","season","woba","xwoba","pa"}
missing = req - set(hit.columns)
if missing:
    raise SystemExit(f"Missing required columns for this demo: {missing}. Build features first.")

# Experience index
hit = hit.dropna(subset=["player_id","season","woba"]).copy()
hit["season"] = hit["season"].astype(int)
hit = hit.sort_values(["player_id","season"])

hit["exp_idx"] = hit.groupby("player_id")["season"].rank(method="first").astype(int) - 1

# Season-to-season deltas
next_map = (hit[["player_id","season","woba"]]
            .rename(columns={"season":"season_t","woba":"woba_t"})
            .merge(hit[["player_id","season","woba","exp_idx"]]
                   .rename(columns={"season":"season_t1","woba":"woba_t1","exp_idx":"exp_idx_t1"}),
                   on="player_id", how="inner"))
# keep true consecutive seasons
next_map = next_map[next_map["season_t1"] == next_map["season_t"] + 1].copy()
next_map["delta"] = next_map["woba_t1"] - next_map["woba_t"]

aging_curve = (next_map.groupby("exp_idx_t1")["delta"]
               .mean()
               .rename("avg_delta")
               .reset_index())
aging_curve

## Project wOBA with shrinkage + aging adjustment

`wOBA_proj = shrink * (α * xwOBA + (1-α) * wOBA) + (1-shrink) * league_wOBA + aging_adj`

- `shrink = PA / (PA + K)` (K ≈ 200 by default)  
- `α` = 0.6 by default  
- `aging_adj` = cumulative sum of `avg_delta` up to the player's **next** experience index (proxy for aging effects)

In [None]:
WOBASCALE = 1.25      # demo constant
RUNS_PER_WIN = 10.0  # demo constant
DOLLARS_PER_WAR = 9_000_000  # demo constant
K_SHRINK = 200.0
ALPHA = 0.6  # weight on xwOBA vs wOBA

league_woba = float(hit["woba"].mean()) if not hit.empty else 0.320

# Cumulative aging adjustment by experience index
aging_curve = aging_curve.set_index("exp_idx_t1") if "aging_curve" in locals() else pd.DataFrame()
aging_cum = aging_curve["avg_delta"].cumsum() if not aging_curve.empty else pd.Series(dtype=float)

def aging_adjust(exp_next:int) -> float:
    if aging_cum.empty:
        return 0.0
    # clamp to index range
    keys = aging_cum.index.to_list()
    if not keys:
        return 0.0
    exp_next = max(min(exp_next, max(keys)), min(keys))
    return float(aging_cum.loc[exp_next])

def project_row(row):
    pa = float(row.get("pa", 0.0))
    shrink = pa / (pa + K_SHRINK)
    blend = ALPHA * float(row.get("xwoba", np.nan)) + (1-ALPHA) * float(row.get("woba", np.nan))
    # next experience index (proxy for next season)
    exp_next = int(row.get("exp_idx", 0)) + 1
    adj = aging_adjust(exp_next)
    return shrink * blend + (1.0 - shrink) * league_woba + adj

hit["woba_proj"] = hit.apply(project_row, axis=1)

# Residuals to build empirical risk bands (by exp_idx buckets)
hit["resid"] = hit["woba_proj"] - hit["woba"]
risk = (hit.groupby("exp_idx")["resid"]
        .quantile([0.1, 0.5, 0.9])
        .unstack()
        .rename(columns={0.1:"p10",0.5:"p50",0.9:"p90"})
        .reset_index())
risk

## Convert to wRAA → WAR → $ with bands

In [None]:
# Convert to wRAA and WAR
hit["wraa"] = ((hit["woba_proj"] - league_woba) / WOBASCALE) * hit["pa"]
hit["war"]  = hit["wraa"] / RUNS_PER_WIN
hit["$"]    = hit["war"] * DOLLARS_PER_WAR

# Attach rough risk bands using residual percentiles per exp_idx
hit = hit.merge(risk, on="exp_idx", how="left")
hit["war_p10"] = hit["war"] + (hit["p10"] / WOBASCALE) * (hit["pa"] / RUNS_PER_WIN)
hit["war_p90"] = hit["war"] + (hit["p90"] / WOBASCALE) * (hit["pa"] / RUNS_PER_WIN)
hit["$_p10"]   = hit["war_p10"] * DOLLARS_PER_WAR
hit["$_p90"]   = hit["war_p90"] * DOLLARS_PER_WAR

latest = int(hit["season"].dropna().max())
cols = ["player_name","season","pa","woba","xwoba","woba_proj","war","$","war_p10","war_p90","$_p10","$_p90"]
top = (hit[hit["season"]==latest]
       .sort_values("war", ascending=False)
       .reset_index(drop=True))[cols]
top.head(20)

> **Disclaimers:** This is a toy demo. For production: use true ages, park/league adjustments, positions, playing time projections, defense/baserunning, team run-to-win curves, and market-specific $/WAR.