# Yankees 2024 – Rolling xwOBA & Comparable Players

This notebook demonstrates two portfolio-ready analyses:
1. **Rolling xwOBA** for top Yankees hitters by plate appearances in 2024.
2. **Comparable player search** within the same season using cosine similarity on a compact feature set.

**Prereqs**: Run the pipeline first:
```
python scripts/pull_statcast.py --config config.yaml
python scripts/build_features.py --config config.yaml
```
Set `team: NYY` and include `2024` in `years`.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from src.similarity import top_comps

pd.set_option('display.max_columns', 50)
raw_2024 = Path('data/raw/statcast_NYY_2024.csv')
hitters_path = Path('data/processed/hitters_features.csv')
assert raw_2024.exists(), 'Expected data/raw/statcast_NYY_2024.csv. Run pull_statcast.py.'
assert hitters_path.exists(), 'Expected data/processed/hitters_features.csv. Run build_features.py.'

stat = pd.read_csv(raw_2024, low_memory=False)
hitters = pd.read_csv(hitters_path)
hitters_2024 = hitters[hitters['season'] == 2024].copy()
hitters_2024 = hitters_2024.sort_values('pa', ascending=False)
hitters_2024.head(10)

## Rolling xwOBA (game-by-game)
Statcast provides `estimated_woba_using_speedangle` per plate appearance. We aggregate to **game date** and compute a rolling mean (e.g., 100 PA).

In [None]:
stat['game_date'] = pd.to_datetime(stat['game_date'])
if 'player_name' not in stat.columns and 'batter' in stat.columns:
    # We'll merge names from the hitters table if needed
    name_map = hitters_2024.set_index('player_id')['player_name'].to_dict()
    stat['player_name'] = stat['batter'].map(name_map)

top_names = hitters_2024['player_name'].head(5).tolist()
roll_window = 100  # PA window

fig, ax = plt.subplots(figsize=(10,6))
for name in top_names:
    df = stat[stat['player_name'] == name].copy()
    if df.empty:
        continue
    # Sort by date; keep only rows with xwOBA estimates
    df = df.sort_values('game_date')
    x = df['estimated_woba_using_speedangle']
    # rolling by rows ~ proxy for PA window
    roll = x.rolling(roll_window, min_periods=max(10, roll_window//5)).mean()
    ax.plot(df['game_date'], roll, label=name)

ax.set_title(f'NYY 2024: Rolling xwOBA (window={roll_window} PA)')
ax.set_xlabel('Date')
ax.set_ylabel('Rolling xwOBA')
ax.legend()
plt.tight_layout()
plt.show()

## Comparable Players (same season)
We use cosine similarity over a compact feature vector to find the most similar hitters to a target player in 2024.

In [None]:
target = hitters_2024['player_name'].iloc[0] if not hitters_2024.empty else 'Aaron Judge'
comps = top_comps(hitters, player_name=target, season=2024, k=10)
comps