# Toy Contract Valuation: Aging Curve + Risk Bands

This is a demonstration, *not* a production model. It projects a hitter's next 3–5 seasons by:
1. Estimating an **empirical aging curve** from historical (league-wide) season-level features.
2. Projecting **wOBA** forward using the aging curve and last-season features.
3. Converting projected wOBA to **WAR (toy mapping)**, then to value using a $/WAR and discount rate.

**Prereqs**: Run the pipeline across multiple years (e.g., 2018–2024 league-wide by setting `team: null` in `config.yaml`).
```
python scripts/pull_statcast.py --config config.yaml
python scripts/build_features.py --config config.yaml
```
Then re-run this notebook.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

hitters_path = Path('data/processed/hitters_features.csv')
assert hitters_path.exists(), 'Expected data/processed/hitters_features.csv. Run build_features.py.'
hit = pd.read_csv(hitters_path)

# Expect columns: player_id, player_name, season, pa, woba, xwoba, avg_ev, k_rate, bb_rate, barrel_rate
# We'll need age; Statcast export may not include age. For demo, infer age via a simple proxy if 'player_id' has births not present.
# In a production system, you'd join a player bio table. Here we approximate using within-player season order.
hit = hit.sort_values(['player_id','season']).reset_index(drop=True)
hit['age_index'] = hit.groupby('player_id').cumcount() + 1  # proxy age index (1,2,3,...) per player career sequence
hit = hit.dropna(subset=['woba'])

# --- Empirical aging curve (proxy): average delta wOBA from t-1 to t by age_index ---
def season_shift(df, id_col='player_id', target='woba', season_col='season'):
    prev = df[[id_col, season_col, target, 'age_index']].copy()
    prev[season_col] = prev[season_col] + 1
    merged = pd.merge(df[[id_col, season_col, target, 'age_index']], prev,
                      on=[id_col, season_col], how='inner', suffixes=('', '_prev'))
    # delta from prev season
    merged['delta'] = merged[target] - merged[f'{target}_prev']
    # aging bucket by prev age_index
    merged['age_bucket'] = merged['age_index_prev']
    return merged

chg = season_shift(hit)
aging = chg.groupby('age_bucket')['delta'].mean().rename('mean_delta').to_frame()
aging['count'] = chg.groupby('age_bucket')['delta'].size()
aging.head()


In [None]:
# Plot the proxy aging curve (delta wOBA by age_index)
fig, ax = plt.subplots(figsize=(8,4))
ax.axhline(0, ls='--')
ax.plot(aging.index, aging['mean_delta'], marker='o')
ax.set_xlabel('Career Age Index (proxy)')
ax.set_ylabel('Δ wOBA from prior season')
ax.set_title('Empirical Aging Curve (Proxy)')
plt.tight_layout(); plt.show()

## Project a player's next seasons
We take last-season wOBA and apply cumulative aging deltas for the next N seasons. For uncertainty, we add noise from the empirical residual variance and show **risk bands**.

In [None]:
player = hit['player_name'].value_counts().index[0]  # pick a common player in dataset
p = hit[hit['player_name'] == player].sort_values('season')
last = p.iloc[-1]
start_age = int(last['age_index'])
base_woba = float(last['woba'])

horizon = 5
deltas = [aging['mean_delta'].get(start_age+i, aging['mean_delta'].mean()) for i in range(1, horizon+1)]
proj = [base_woba + np.sum(deltas[:i]) for i in range(1, horizon+1)]

# Simple uncertainty via residual std of delta
resid_std = float(chg['delta'].std()) if not np.isnan(chg['delta'].std()) else 0.015
years_ahead = np.arange(1, horizon+1)
upper = np.array(proj) + 1.0 * resid_std
lower = np.array(proj) - 1.0 * resid_std

fig, ax = plt.subplots(figsize=(8,4))
ax.plot(years_ahead, proj, marker='o', label='Projected wOBA')
ax.fill_between(years_ahead, lower, upper, alpha=0.2, label='±1σ band')
ax.set_xticks(years_ahead)
ax.set_xlabel('Years Ahead')
ax.set_ylabel('wOBA (projected)')
ax.set_title(f'{player}: Toy Projection from Last Season (wOBA)')
ax.legend(); plt.tight_layout(); plt.show()

## From wOBA to WAR (toy)
We use a rough mapping for illustration only:
- Convert wOBA to runs above average per PA using a linear approximation around league mean.
- Aggregate by an assumed PA per season (user input).
- Convert runs to WAR via a fixed runs-per-win (e.g., 8.5).
- Monetize with $/WAR and a discount rate.

These constants should be replaced with team-grade values in production.

In [None]:
league_woba = float(hit['woba'].mean())
runs_per_win = 8.5
dollars_per_war = 9_000_000  # $9M per WAR (toy)
discount = 0.08
pa_per_season = 600

# linear mapping: every 0.010 wOBA ~ 1.5 runs per 600 PA (toy coefficient)
def woba_to_war(woba, baseline=league_woba, pa=pa_per_season):
    delta = woba - baseline
    runs = (delta / 0.010) * 1.5 * (pa / 600)
    war = runs / runs_per_win
    return war

wars = [woba_to_war(w) for w in proj]
values = [war * dollars_per_war / ((1+discount)**t) for t, war in enumerate(wars, start=1)]
pd.DataFrame({'YearAhead': years_ahead, 'wOBA': proj, 'WAR_toy': wars, 'NPV_Value_$': values})