# Toy Contract Valuation — Aging Curve (Proxy) → wOBA Projection → WAR → $

**Portfolio demo** of a transparent valuation path for hitters using only season-level features.

Steps:
1. Build a **proxy aging curve** using within-player season-to-season changes (experience index as a surrogate for age).
2. Project **wOBA** using a blend of xwOBA & wOBA with PA-based shrinkage toward league mean + aging adjustment.
3. Convert to **wRAA** (with fixed wOBA scale), then **WAR** (runs per win), then **$** (constant $/WAR).
4. Show **risk bands** via empirical residuals (percentiles).

> In production, you would use true ages, park factors, positional adjustments, defense/base-running, and season-specific run environments.


In [None]:
# Setup
import numpy as np
import pandas as pd
from pathlib import Path

PROC_DIR = Path("data/processed")
SAMPLE_DIR = Path("sample_data")

# Load hitters features (season-level)
hit_path = PROC_DIR / "hitters_features.csv"
if not hit_path.exists():
    alt_path = SAMPLE_DIR / "hitters_features_sample.csv"
    hit_path = alt_path if alt_path.exists() else None

if hit_path and hit_path.exists():
    hit = pd.read_csv(hit_path)
else:
    hit = pd.DataFrame()
    
# Clean types
if "season" in hit.columns:
    hit["season"] = pd.to_numeric(hit["season"], errors="coerce").astype("Int64")
for c in ["pa","woba","xwoba","avg_ev","k_rate","bb_rate","barrel_rate"]:
    if c in hit.columns:
        hit[c] = pd.to_numeric(hit[c], errors="coerce")

print("Hitters features:", hit.shape)
display(hit.head(3))

Hitters features: (1322, 11)


Unnamed: 0,player_id,player_name,season,pa,woba,xwoba,avg_ev,max_ev,k_rate,bb_rate,barrel_rate
0,543063,"Kuhl, Chad",2022,10.0,1.15,0.32475,86.963636,101.4,0.1,0.0,
1,592518,"Kuhl, Chad",2022,11.0,0.918182,0.594545,88.609524,110.1,0.0,0.0,
2,669257,"Clevinger, Mike",2022,10.0,0.9,0.621221,86.346667,110.6,0.0,0.1,


## Build a proxy aging curve

We construct an **experience index** per player: 0 for their first observed season in the dataset, 1 for next, etc.  
Then compute `ΔwOBA = wOBA(t+1) - wOBA(t)` and average by experience to get an empirical adjustment curve.

In [10]:
# Ensure we have minimal columns
req = {"player_id","player_name","season","woba","xwoba","pa"}
missing = req - set(hit.columns)
if missing:
    raise SystemExit(f"Missing required columns for this demo: {missing}. Build features first.")

# Experience index
hit = hit.dropna(subset=["player_id","season","woba"]).copy()
hit["season"] = hit["season"].astype(int)
hit = hit.sort_values(["player_id","season"])

hit["exp_idx"] = hit.groupby("player_id")["season"].rank(method="first").astype(int) - 1

# Season-to-season deltas
next_map = (hit[["player_id","season","woba"]]
            .rename(columns={"season":"season_t","woba":"woba_t"})
            .merge(hit[["player_id","season","woba","exp_idx"]]
                   .rename(columns={"season":"season_t1","woba":"woba_t1","exp_idx":"exp_idx_t1"}),
                   on="player_id", how="inner"))
# keep true consecutive seasons
next_map = next_map[next_map["season_t1"] == next_map["season_t"] + 1].copy()
next_map["delta"] = next_map["woba_t1"] - next_map["woba_t"]

aging_curve = (next_map.groupby("exp_idx_t1")["delta"]
               .mean()
               .rename("avg_delta")
               .reset_index())
aging_curve

Unnamed: 0,exp_idx_t1,avg_delta
0,1,0.109772
1,2,0.071143
2,3,0.003822
3,4,0.083652
4,5,0.033315
5,6,0.066609
6,7,-0.059688
7,8,0.036191
8,9,0.036406
9,10,-0.016704


## Project wOBA with shrinkage + aging adjustment

`wOBA_proj = shrink * (α * xwOBA + (1-α) * wOBA) + (1-shrink) * league_wOBA + aging_adj`

- `shrink = PA / (PA + K)` (K ≈ 200 by default)  
- `α` = 0.6 by default  
- `aging_adj` = cumulative sum of `avg_delta` up to the player's **next** experience index (proxy for aging effects)

In [11]:
WOBASCALE = 1.25      # demo constant
RUNS_PER_WIN = 10.0  # demo constant
DOLLARS_PER_WAR = 9_000_000  # demo constant
K_SHRINK = 200.0
ALPHA = 0.6  # weight on xwOBA vs wOBA

league_woba = float(hit["woba"].mean()) if not hit.empty else 0.320

# Cumulative aging adjustment by experience index
aging_curve = aging_curve.set_index("exp_idx_t1") if "aging_curve" in locals() else pd.DataFrame()
aging_cum = aging_curve["avg_delta"].cumsum() if not aging_curve.empty else pd.Series(dtype=float)

def aging_adjust(exp_next:int) -> float:
    if aging_cum.empty:
        return 0.0
    # clamp to index range
    keys = aging_cum.index.to_list()
    if not keys:
        return 0.0
    exp_next = max(min(exp_next, max(keys)), min(keys))
    return float(aging_cum.loc[exp_next])

def project_row(row):
    pa = float(row.get("pa", 0.0))
    shrink = pa / (pa + K_SHRINK)
    blend = ALPHA * float(row.get("xwoba", np.nan)) + (1-ALPHA) * float(row.get("woba", np.nan))
    # next experience index (proxy for next season)
    exp_next = int(row.get("exp_idx", 0)) + 1
    adj = aging_adjust(exp_next)
    return shrink * blend + (1.0 - shrink) * league_woba + adj

hit["woba_proj"] = hit.apply(project_row, axis=1)

# Residuals to build empirical risk bands (by exp_idx buckets)
hit["resid"] = hit["woba_proj"] - hit["woba"]
risk = (hit.groupby("exp_idx")["resid"]
        .quantile([0.1, 0.5, 0.9])
        .unstack()
        .rename(columns={0.1:"p10",0.5:"p50",0.9:"p90"})
        .reset_index())
risk

Unnamed: 0,exp_idx,p10,p50,p90
0,0,-0.172483,0.036016,0.247742
1,1,-0.025707,0.172713,0.354668
2,2,0.029155,0.193035,0.360498
3,3,0.160785,0.318078,0.478139
4,4,0.209278,0.37093,0.500961
5,5,0.249457,0.440866,0.572834
6,6,0.081689,0.386834,0.558908
7,7,0.245491,0.428218,0.573244
8,8,0.063989,0.386923,0.545686
9,9,0.058128,0.373385,0.565184


## Convert to wRAA → WAR → $ with bands

In [12]:
# Convert to wRAA and WAR
hit["wraa"] = ((hit["woba_proj"] - league_woba) / WOBASCALE) * hit["pa"]
hit["war"]  = hit["wraa"] / RUNS_PER_WIN
hit["$"]    = hit["war"] * DOLLARS_PER_WAR

# Attach rough risk bands using residual percentiles per exp_idx
hit = hit.merge(risk, on="exp_idx", how="left")
hit["war_p10"] = hit["war"] + (hit["p10"] / WOBASCALE) * (hit["pa"] / RUNS_PER_WIN)
hit["war_p90"] = hit["war"] + (hit["p90"] / WOBASCALE) * (hit["pa"] / RUNS_PER_WIN)
hit["$_p10"]   = hit["war_p10"] * DOLLARS_PER_WAR
hit["$_p90"]   = hit["war_p90"] * DOLLARS_PER_WAR

latest = int(hit["season"].dropna().max())
cols = ["player_name","season","pa","woba","xwoba","woba_proj","war","$","war_p10","war_p90","$_p10","$_p90"]
top = (hit[hit["season"]==latest]
       .sort_values("war", ascending=False)
       .reset_index(drop=True))[cols]
top.head(20)

Unnamed: 0,player_name,season,pa,woba,xwoba,woba_proj,war,$,war_p10,war_p90,$_p10,$_p90
0,"Cortes, Nestor",2024,15.0,0.553333,0.484084,0.718804,0.451595,4064357.0,0.521349,1.129816,4692139.0,10168340.0
1,"King, Michael",2024,14.0,0.657143,0.529162,0.722553,0.425689,3831200.0,0.490792,1.058695,4417129.0,9528252.0
2,"King, Michael",2024,13.0,0.538462,0.653856,0.739882,0.413305,3719741.0,0.479853,0.980818,4318677.0,8827358.0
3,"Pfaadt, Brandon",2024,13.0,0.676923,0.419538,0.734682,0.407896,3671066.0,0.474445,0.975409,4270002.0,8778683.0
4,"Sears, JP",2024,13.0,0.507692,0.395179,0.712955,0.3853,3467699.0,0.445753,0.973091,4011776.0,8757819.0
5,"Eflin, Zach",2024,12.0,0.6625,0.639417,0.741026,0.382609,3443485.0,0.444039,0.906468,3996350.0,8158209.0
6,"Eflin, Zach",2024,12.0,0.733333,0.639178,0.725918,0.368106,3312951.0,0.423909,0.910682,3815177.0,8196139.0
7,"King, Michael",2024,12.0,0.329167,0.282178,0.721346,0.363717,3273452.0,0.425146,0.887575,3826316.0,7988175.0
8,"Brown, Hunter",2024,12.0,0.266667,0.298272,0.720477,0.362883,3265948.0,0.424312,0.886741,3818812.0,7980672.0
9,"Fried, Max",2024,12.0,0.566667,0.432277,0.715117,0.357737,3219636.0,0.41354,0.900314,3721861.0,8102824.0


> **Disclaimers:** This is a toy demo. For production: use true ages, park/league adjustments, positions, playing time projections, defense/baserunning, team run-to-win curves, and market-specific $/WAR.