# 05 - 2026 Fantasy Predictions

Generate 2026 fantasy point predictions:

1. **Retrain models** on all data (2016-2025)
2. **Predict rate stats** (Fpoints/PA, Fpoints/IP) for 2026 using 2025 features
3. **Apply external projections** (PA/IP/W/L/SV) to get total fantasy points
4. **Generate final rankings**

In [20]:
import sys
import os

# Set working directory to project root
if 'notebooks' in os.getcwd():
    os.chdir('..')
sys.path.insert(0, os.getcwd())

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import joblib

from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import mean_absolute_error

from config.settings import PROCESSED_DATA_DIR, MODELS_DIR, RAW_DATA_DIR, RANDOM_STATE
from config.scoring import PITCHER_SCORING_TEAM

import warnings
warnings.filterwarnings('ignore')

np.random.seed(RANDOM_STATE)
print(f"Project root: {os.getcwd()}")

Project root: /Users/matthewgillies/mlb-fantasy-2026


---
## Part 1: Batter Predictions
---

### 1.1 Load Data & Retrain Model

In [21]:
# Load processed batter data
batters = pd.read_csv(f"{PROCESSED_DATA_DIR}/batters_processed.csv")
print(f"Loaded {len(batters)} batter-seasons")
print(f"Years: {batters['Season'].min()} - {batters['Season'].max()}")

# Feature columns (lag features only - these use previous year data to predict current year)
feature_cols_bat = [c for c in batters.columns if '_lag' in c]
print(f"\nFeatures: {len(feature_cols_bat)}")

Loaded 3578 batter-seasons
Years: 2016 - 2025

Features: 50


In [22]:
# Train on ALL data (2016-2025) for final model
# We use all years since we're predicting 2026, not evaluating
train_df_bat = batters.copy()

X_train_bat = train_df_bat[feature_cols_bat].copy()
y_train_bat = train_df_bat['Fpoints_PA'].copy()

# Fill NaN with median
train_medians_bat = X_train_bat.median()
X_train_bat = X_train_bat.fillna(train_medians_bat)

print(f"Training set: {len(X_train_bat)} rows (all years)")
print(f"Features: {X_train_bat.shape[1]}")

Training set: 3578 rows (all years)
Features: 50


In [23]:
# Train Random Forest (best performer from evaluation)
# Using similar params to what worked well in evaluation
rf_bat = RandomForestRegressor(
    n_estimators=200,
    max_depth=20,
    min_samples_split=5,
    min_samples_leaf=4,
    random_state=RANDOM_STATE,
    n_jobs=-1
)

print("Training batter model on all data...")
rf_bat.fit(X_train_bat, y_train_bat)
print("Done!")

# Store for later use
batter_model = rf_bat

Training batter model on all data...
Done!


### 1.2 Prepare 2026 Prediction Data

For 2026 predictions, we use 2025 stats as `_lag1` features.

In [24]:
# Get 2025 player data to use as features for 2026 predictions
# The 2025 row already has lag1 features (which are 2024 data)
# For 2026, we need 2025 data as lag1

# Get base features from 2025 (these become lag1 for 2026)
batters_2025 = batters[batters['Season'] == 2025].copy()
print(f"Players with 2025 data: {len(batters_2025)}")

# The feature columns we need for 2026 prediction are the _lag1 columns
# But we need to populate them with 2025 current-year values

# Get the base feature names (without _lag1 suffix)
base_features = [c.replace('_lag1', '') for c in feature_cols_bat if '_lag1' in c]
print(f"Base features needed: {len(base_features)}")

Players with 2025 data: 376
Base features needed: 25


In [25]:
# Create 2026 prediction dataframe
# Use 2025 actual values as lag1, and 2025 lag1 as lag2

pred_2026_bat = batters_2025[['IDfg', 'Name', 'Team', 'Age', 'PA', 'Fpoints_PA']].copy()
pred_2026_bat = pred_2026_bat.rename(columns={'Fpoints_PA': 'Fpoints_PA_2025', 'PA': 'PA_2025', 'Age': 'Age_2025'})

# Build feature matrix for 2026 prediction
X_2026_bat = pd.DataFrame(index=batters_2025.index)

for feat in base_features:
    # lag1 for 2026 = current value in 2025
    if feat in batters_2025.columns:
        X_2026_bat[f'{feat}_lag1'] = batters_2025[feat].values
    else:
        X_2026_bat[f'{feat}_lag1'] = np.nan
    
    # lag2 for 2026 = lag1 value in 2025 (which was 2024 data)
    lag1_col = f'{feat}_lag1'
    if lag1_col in batters_2025.columns:
        X_2026_bat[f'{feat}_lag2'] = batters_2025[lag1_col].values
    else:
        X_2026_bat[f'{feat}_lag2'] = np.nan

# Ensure column order matches training
X_2026_bat = X_2026_bat[feature_cols_bat]

# Fill missing with training medians
X_2026_bat = X_2026_bat.fillna(train_medians_bat)

print(f"2026 prediction matrix: {X_2026_bat.shape}")
print(f"NaN count: {X_2026_bat.isna().sum().sum()}")

2026 prediction matrix: (376, 50)
NaN count: 0


### 1.3 Generate Rate Predictions

In [26]:
# Predict Fpoints/PA for 2026
pred_2026_bat['Predicted_Fpoints_PA'] = batter_model.predict(X_2026_bat)

# Sort by predicted rate
pred_2026_bat = pred_2026_bat.sort_values('Predicted_Fpoints_PA', ascending=False).reset_index(drop=True)

print("\n=== 2026 Batter Rate Predictions (Fpoints/PA) ===")
print(f"Players: {len(pred_2026_bat)}")
print(f"\nTop 25 by predicted Fpoints/PA:")
display_cols = ['Name', 'Team', 'Age_2025', 'PA_2025', 'Fpoints_PA_2025', 'Predicted_Fpoints_PA']
print(pred_2026_bat[display_cols].head(25).to_string(index=False))


=== 2026 Batter Rate Predictions (Fpoints/PA) ===
Players: 376

Top 25 by predicted Fpoints/PA:
                 Name  Team  Age_2025  PA_2025  Fpoints_PA_2025  Predicted_Fpoints_PA
            Juan Soto   NYM        26      715         0.777622              0.771566
          Aaron Judge   NYY        33      679         0.882180              0.767631
        Shohei Ohtani   LAD        30      727         0.784044              0.709556
         Jose Ramirez   CLE        32      673         0.775632              0.707757
       Bobby Witt Jr.   KCR        25      687         0.671033              0.674176
       Yordan Alvarez   HOU        28      199         0.557789              0.669735
          Kyle Tucker   CHC        28      597         0.703518              0.662984
          Ketel Marte   ARI        31      556         0.705036              0.634366
         Corey Seager   TEX        31      445         0.606742              0.631847
Vladimir Guerrero Jr.   TOR        26      

In [27]:
# Save rate predictions
os.makedirs('predictions', exist_ok=True)
pred_2026_bat.to_csv('predictions/batters_2026_rate_predictions.csv', index=False)
print("Saved to predictions/batters_2026_rate_predictions.csv")

Saved to predictions/batters_2026_rate_predictions.csv


---
## Part 2: Pitcher Predictions
---

### 2.1 Load Data & Retrain Model

In [28]:
# Load processed pitcher data
pitchers = pd.read_csv(f"{PROCESSED_DATA_DIR}/pitchers_processed.csv")
print(f"Loaded {len(pitchers)} pitcher-seasons")
print(f"Years: {pitchers['Season'].min()} - {pitchers['Season'].max()}")

# Feature columns
feature_cols_pit = [c for c in pitchers.columns if '_lag' in c]

# Also include arsenal columns if present
arsenal_cols = [c for c in pitchers.columns if c.startswith(('ff_', 'si_', 'sl_', 'ch_', 'cu_'))]
feature_cols_pit = feature_cols_pit + [c for c in arsenal_cols if c not in feature_cols_pit]

print(f"\nFeatures: {len(feature_cols_pit)}")

Loaded 3985 pitcher-seasons
Years: 2016 - 2025

Features: 119


In [29]:
# Train on ALL data for final model
train_df_pit = pitchers.copy()

X_train_pit = train_df_pit[feature_cols_pit].copy()
y_train_pit = train_df_pit['Fpoints_IP'].copy()

# Fill NaN with median
train_medians_pit = X_train_pit.median()
X_train_pit = X_train_pit.fillna(train_medians_pit)

print(f"Training set: {len(X_train_pit)} rows (all years)")
print(f"Features: {X_train_pit.shape[1]}")

Training set: 3985 rows (all years)
Features: 119


In [30]:
# Train XGBoost (best performer from evaluation)
xgb_pit = XGBRegressor(
    n_estimators=200,
    max_depth=7,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.6,
    gamma=5,
    random_state=RANDOM_STATE,
    n_jobs=-1,
    verbosity=0
)

print("Training pitcher model on all data...")
xgb_pit.fit(X_train_pit, y_train_pit)
print("Done!")

# Store for later use
pitcher_model = xgb_pit

Training pitcher model on all data...
Done!


### 2.2 Prepare 2026 Prediction Data

In [31]:
# Get 2025 pitcher data
pitchers_2025 = pitchers[pitchers['Season'] == 2025].copy()
print(f"Pitchers with 2025 data: {len(pitchers_2025)}")

# Get base feature names
base_features_pit = list(set([c.replace('_lag1', '').replace('_lag2', '') 
                               for c in feature_cols_pit if '_lag' in c]))
print(f"Base features: {len(base_features_pit)}")

Pitchers with 2025 data: 423
Base features: 56


In [32]:
# Create 2026 prediction dataframe
pred_2026_pit = pitchers_2025[['IDfg', 'Name', 'Team', 'Age', 'IP', 'GS', 'G', 
                                'Fpoints_IP', 'W', 'L', 'SV', 'HLD']].copy()
pred_2026_pit = pred_2026_pit.rename(columns={
    'Fpoints_IP': 'Fpoints_IP_2025', 
    'IP': 'IP_2025', 
    'Age': 'Age_2025',
    'W': 'W_2025', 'L': 'L_2025', 'SV': 'SV_2025', 'HLD': 'HLD_2025'
})
pred_2026_pit['Role'] = pred_2026_pit['GS'].apply(lambda x: 'SP' if x > 0 else 'RP')

# Build feature matrix
X_2026_pit = pd.DataFrame(index=pitchers_2025.index)

for feat in base_features_pit:
    # lag1 for 2026 = current value in 2025
    if feat in pitchers_2025.columns:
        X_2026_pit[f'{feat}_lag1'] = pitchers_2025[feat].values
    else:
        X_2026_pit[f'{feat}_lag1'] = np.nan
    
    # lag2 for 2026 = lag1 value in 2025
    lag1_col = f'{feat}_lag1'
    if lag1_col in pitchers_2025.columns:
        X_2026_pit[f'{feat}_lag2'] = pitchers_2025[lag1_col].values
    else:
        X_2026_pit[f'{feat}_lag2'] = np.nan

# Add arsenal columns (these are current-year values, not lagged)
for col in arsenal_cols:
    if col in pitchers_2025.columns:
        X_2026_pit[col] = pitchers_2025[col].values
    else:
        X_2026_pit[col] = np.nan

# Ensure column order matches training
X_2026_pit = X_2026_pit.reindex(columns=feature_cols_pit)

# Fill missing with training medians
X_2026_pit = X_2026_pit.fillna(train_medians_pit)

print(f"2026 prediction matrix: {X_2026_pit.shape}")
print(f"NaN count: {X_2026_pit.isna().sum().sum()}")

2026 prediction matrix: (423, 119)
NaN count: 0


### 2.3 Generate Rate Predictions

In [33]:
# Predict Fpoints/IP for 2026
pred_2026_pit['Predicted_Fpoints_IP'] = pitcher_model.predict(X_2026_pit)

# Sort by predicted rate
pred_2026_pit = pred_2026_pit.sort_values('Predicted_Fpoints_IP', ascending=False).reset_index(drop=True)

print("\n=== 2026 Pitcher Rate Predictions (Fpoints/IP) ===")
print(f"Pitchers: {len(pred_2026_pit)}")
print(f"\nTop 25 by predicted Fpoints/IP (all):")
display_cols = ['Name', 'Team', 'Role', 'IP_2025', 'Fpoints_IP_2025', 'Predicted_Fpoints_IP']
print(pred_2026_pit[display_cols].head(25).to_string(index=False))


=== 2026 Pitcher Rate Predictions (Fpoints/IP) ===
Pitchers: 423

Top 25 by predicted Fpoints/IP (all):
            Name  Team Role  IP_2025  Fpoints_IP_2025  Predicted_Fpoints_IP
    Mason Miller - - -   RP     61.2         3.196078              2.817536
 Aroldis Chapman   BOS   RP     61.1         3.425532              2.634882
      Edwin Diaz   NYM   RP     66.1         3.242057              2.593045
     Griffin Jax - - -   SP     66.0         2.272727              2.591288
   Shohei Ohtani   LAD   SP     47.0         2.638298              2.516919
  Devin Williams   NYY   RP     62.0         2.258065              2.503018
      Josh Hader   HOU   RP     52.2         3.134100              2.498071
Jeremiah Estrada   SDP   RP     73.0         2.547945              2.488457
 Garrett Crochet   BOS   SP    205.1         2.639200              2.482658
    Tarik Skubal   DET   SP    195.1         2.851358              2.480192
      Cade Smith   CLE   RP     73.2         2.754098      

In [34]:
# Show SP and RP separately
print("\n=== Top 20 Starting Pitchers ===")
sp_preds = pred_2026_pit[pred_2026_pit['Role'] == 'SP']
print(sp_preds[display_cols].head(20).to_string(index=False))

print("\n=== Top 20 Relief Pitchers ===")
rp_preds = pred_2026_pit[pred_2026_pit['Role'] == 'RP']
print(rp_preds[display_cols].head(20).to_string(index=False))


=== Top 20 Starting Pitchers ===
            Name  Team Role  IP_2025  Fpoints_IP_2025  Predicted_Fpoints_IP
     Griffin Jax - - -   SP     66.0         2.272727              2.591288
   Shohei Ohtani   LAD   SP     47.0         2.638298              2.516919
 Garrett Crochet   BOS   SP    205.1         2.639200              2.482658
    Tarik Skubal   DET   SP    195.1         2.851358              2.480192
     Cole Ragans   KCR   SP     61.2         2.362745              2.449911
   Hunter Greene   CIN   SP    107.2         2.673507              2.422286
    Zack Wheeler   PHI   SP    149.2         2.765416              2.414526
    Kyle Bradish   BAL   SP     32.0         2.875000              2.314774
     Dylan Cease   SDP   SP    168.0         1.940476              2.302171
      Chris Sale   ATL   SP    125.2         2.672524              2.287775
    Jacob deGrom   TEX   SP    172.2         2.488966              2.267852
   Logan Gilbert   SEA   SP    131.0         2.526718 

In [35]:
# Save rate predictions
pred_2026_pit.to_csv('predictions/pitchers_2026_rate_predictions.csv', index=False)
print("Saved to predictions/pitchers_2026_rate_predictions.csv")

Saved to predictions/pitchers_2026_rate_predictions.csv


---
## Part 3: Apply External Projections

To convert rate predictions to total fantasy points, we need:
- **Batters**: PA projections
- **Pitchers**: IP, W, L, SV, HLD projections

### Projection Files Used:
- `data/projections/batx_hitters_2026.csv` - BatX batter projections (PA)
- `data/projections/oopsy_pitcher_2026.csv` - OOPSY pitcher projections (IP, W, L, SV, HLD)

Both files use `PlayerId` which maps to FanGraphs `IDfg`.

---

In [36]:
# Create projections directory
os.makedirs('data/projections', exist_ok=True)

# Check if projections exist
proj_files = os.listdir('data/projections') if os.path.exists('data/projections') else []
print("Available projection files:")
for f in proj_files:
    print(f"  - {f}")
    
if not proj_files:
    print("  (none found - download from FanGraphs)")

Available projection files:
  - oopsy_pitcher_2026.csv
  - batx_hitters_2026.csv


### 3.1 Load & Apply Batter Projections

In [37]:
# Load batter projections
BATTER_PROJ_FILE = 'data/projections/batx_hitters_2026.csv'

if os.path.exists(BATTER_PROJ_FILE):
    batter_proj = pd.read_csv(BATTER_PROJ_FILE)
    print(f"Loaded batter projections: {len(batter_proj)} players")
    print(f"Columns: {batter_proj.columns.tolist()}")
else:
    print(f"File not found: {BATTER_PROJ_FILE}")
    print("Using 2025 PA as fallback projection")
    batter_proj = None

Loaded batter projections: 667 players
Columns: ['Name', 'Team', 'G', 'PA', 'AB', 'H', '1B', '2B', '3B', 'HR', 'R', 'RBI', 'BB', 'IBB', 'SO', 'HBP', 'SF', 'SH', 'GDP', 'SB', 'CS', 'AVG', 'BB%', 'K%', 'BB/K', 'OBP', 'SLG', 'wOBA', 'OPS', 'ISO', 'Spd', 'BABIP', 'UBR', 'wSB', 'wRC', 'wRAA', 'wRC+', 'BsR', 'Fld', 'Off', 'Def', 'WAR', 'ADP', 'InterSD', 'InterSK', 'IntraSD', 'Vol', 'Skew', 'Dim', 'FPTS', 'FPTS/G', 'SPTS', 'SPTS/G', 'P10', 'P20', 'P30', 'P40', 'P50', 'P60', 'P70', 'P80', 'P90', 'TT10', 'TT20', 'TT30', 'TT40', 'TT50', 'TT60', 'TT70', 'TT80', 'TT90', 'NameASCII', 'PlayerId', 'MLBAMID']


In [38]:
# Merge projections with predictions
if batter_proj is not None:
    # BatX uses 'PlayerId' which maps to IDfg
    id_col = 'PlayerId'
    pa_col = 'PA'
    
    proj_subset = batter_proj[[id_col, pa_col, 'Name']].copy()
    proj_subset.columns = ['IDfg', 'Projected_PA', 'Proj_Name']
    
    # Convert to numeric, coercing non-numeric IDs (like 'sa3063134' for minor leaguers) to NaN
    proj_subset['IDfg'] = pd.to_numeric(proj_subset['IDfg'], errors='coerce')
    
    # Drop rows with non-numeric IDs
    proj_subset = proj_subset.dropna(subset=['IDfg'])
    proj_subset['IDfg'] = proj_subset['IDfg'].astype(int)
    
    print(f"BatX projections with valid IDs: {len(proj_subset)}")
    
    pred_2026_bat = pred_2026_bat.merge(proj_subset[['IDfg', 'Projected_PA']], on='IDfg', how='left')
    
    # Check how many matched
    matched = pred_2026_bat['Projected_PA'].notna().sum()
    print(f"Matched {matched} of {len(pred_2026_bat)} players with BatX projections")
    
    # Fill missing with 2025 PA
    pred_2026_bat['Projected_PA'] = pred_2026_bat['Projected_PA'].fillna(pred_2026_bat['PA_2025'])
else:
    # Use 2025 PA as projection
    pred_2026_bat['Projected_PA'] = pred_2026_bat['PA_2025']

# Calculate total projected fantasy points
pred_2026_bat['Projected_Fpoints'] = pred_2026_bat['Predicted_Fpoints_PA'] * pred_2026_bat['Projected_PA']

# Sort by total
pred_2026_bat = pred_2026_bat.sort_values('Projected_Fpoints', ascending=False).reset_index(drop=True)
pred_2026_bat['Rank'] = range(1, len(pred_2026_bat) + 1)

print("\n=== 2026 Batter Total Projections ===")
display_cols = ['Rank', 'Name', 'Team', 'Projected_PA', 'Predicted_Fpoints_PA', 'Projected_Fpoints']
print(pred_2026_bat[display_cols].head(30).to_string(index=False))

BatX projections with valid IDs: 598
Matched 362 of 376 players with BatX projections

=== 2026 Batter Total Projections ===
 Rank                  Name  Team  Projected_PA  Predicted_Fpoints_PA  Projected_Fpoints
    1             Juan Soto   NYM       674.850              0.771566         520.691083
    2           Aaron Judge   NYY       637.851              0.767631         489.633886
    3         Shohei Ohtani   LAD       659.709              0.709556         468.100622
    4          Jose Ramirez   CLE       649.533              0.707757         459.711472
    5        Bobby Witt Jr.   KCR       659.965              0.674176         444.932372
    6 Vladimir Guerrero Jr.   TOR       665.751              0.627687         417.883511
    7           Kyle Tucker   CHC       619.354              0.662984         410.622040
    8           Ketel Marte   ARI       616.155              0.634366         390.867515
    9      Gunnar Henderson   BAL       657.586              0.589613     

### 3.2 Load & Apply Pitcher Projections

In [39]:
# Load pitcher projections
PITCHER_PROJ_FILE = 'data/projections/oopsy_pitcher_2026.csv'

if os.path.exists(PITCHER_PROJ_FILE):
    pitcher_proj = pd.read_csv(PITCHER_PROJ_FILE)
    print(f"Loaded pitcher projections: {len(pitcher_proj)} players")
    print(f"Columns: {pitcher_proj.columns.tolist()}")
else:
    print(f"File not found: {PITCHER_PROJ_FILE}")
    print("Using 2025 stats as fallback projection")
    pitcher_proj = None

Loaded pitcher projections: 4333 players
Columns: ['Name', 'Team', 'W', 'L', 'QS', 'ERA', 'G', 'GS', 'SV', 'HLD', 'BS', 'IP', 'TBF', 'H', 'R', 'ER', 'HR', 'BB', 'IBB', 'HBP', 'SO', 'K/9', 'BB/9', 'K/BB', 'HR/9', 'K%', 'BB%', 'K-BB%', 'AVG', 'WHIP', 'BABIP', 'LOB%', 'GB%', 'HR/FB', 'FIP', 'WAR', 'RA9-WAR', 'ADP', 'InterSD', 'InterSK', 'IntraSD', 'Vol', 'Skew', 'Dim', 'FPTS', 'FPTS/IP', 'SPTS', 'SPTS/IP', 'P10', 'P20', 'P30', 'P40', 'P50', 'P60', 'P70', 'P80', 'P90', 'TT10', 'TT20', 'TT30', 'TT40', 'TT50', 'TT60', 'TT70', 'TT80', 'TT90', 'NameASCII', 'PlayerId', 'MLBAMID']


In [40]:
# Merge projections with predictions
if pitcher_proj is not None:
    # OOPSY uses 'PlayerId' which maps to IDfg
    id_col = 'PlayerId'
    
    # Required columns from OOPSY
    proj_cols = ['IP', 'W', 'L', 'SV', 'HLD']
    
    proj_subset = pitcher_proj[[id_col] + proj_cols].copy()
    proj_subset = proj_subset.rename(columns={id_col: 'IDfg'})
    proj_subset.columns = ['IDfg'] + [f'Proj_{c}' for c in proj_cols]
    
    # Convert to numeric, coercing non-numeric IDs to NaN
    proj_subset['IDfg'] = pd.to_numeric(proj_subset['IDfg'], errors='coerce')
    
    # Drop rows with non-numeric IDs
    proj_subset = proj_subset.dropna(subset=['IDfg'])
    proj_subset['IDfg'] = proj_subset['IDfg'].astype(int)
    
    print(f"OOPSY projections with valid IDs: {len(proj_subset)}")
    
    pred_2026_pit = pred_2026_pit.merge(proj_subset, on='IDfg', how='left')
    
    # Check how many matched
    matched = pred_2026_pit['Proj_IP'].notna().sum()
    print(f"Matched {matched} of {len(pred_2026_pit)} players with OOPSY projections")
    
    # Fill missing with 2025 values
    pred_2026_pit['Proj_IP'] = pred_2026_pit['Proj_IP'].fillna(pred_2026_pit['IP_2025'])
    pred_2026_pit['Proj_W'] = pred_2026_pit['Proj_W'].fillna(pred_2026_pit['W_2025'])
    pred_2026_pit['Proj_L'] = pred_2026_pit['Proj_L'].fillna(pred_2026_pit['L_2025'])
    pred_2026_pit['Proj_SV'] = pred_2026_pit['Proj_SV'].fillna(pred_2026_pit['SV_2025'])
    pred_2026_pit['Proj_HLD'] = pred_2026_pit['Proj_HLD'].fillna(pred_2026_pit['HLD_2025'])
else:
    # Use 2025 stats as projection
    pred_2026_pit['Proj_IP'] = pred_2026_pit['IP_2025']
    pred_2026_pit['Proj_W'] = pred_2026_pit['W_2025']
    pred_2026_pit['Proj_L'] = pred_2026_pit['L_2025']
    pred_2026_pit['Proj_SV'] = pred_2026_pit['SV_2025']
    pred_2026_pit['Proj_HLD'] = pred_2026_pit['HLD_2025']

print("Projection columns added")

OOPSY projections with valid IDs: 1184
Matched 418 of 423 players with OOPSY projections
Projection columns added


In [41]:
# Calculate total projected fantasy points
# Skill-based points
pred_2026_pit['Proj_Skill_Fpoints'] = pred_2026_pit['Predicted_Fpoints_IP'] * pred_2026_pit['Proj_IP']

# Team-based points (W/L/SV/HLD)
pred_2026_pit['Proj_Team_Fpoints'] = (
    pred_2026_pit['Proj_W'] * PITCHER_SCORING_TEAM['W'] +
    pred_2026_pit['Proj_L'] * PITCHER_SCORING_TEAM['L'] +
    pred_2026_pit['Proj_SV'] * PITCHER_SCORING_TEAM['SV'] +
    pred_2026_pit['Proj_HLD'] * PITCHER_SCORING_TEAM['HLD']
)

# Total
pred_2026_pit['Projected_Fpoints'] = pred_2026_pit['Proj_Skill_Fpoints'] + pred_2026_pit['Proj_Team_Fpoints']

# Sort by total
pred_2026_pit = pred_2026_pit.sort_values('Projected_Fpoints', ascending=False).reset_index(drop=True)
pred_2026_pit['Rank'] = range(1, len(pred_2026_pit) + 1)

print("\n=== 2026 Pitcher Total Projections ===")
display_cols = ['Rank', 'Name', 'Team', 'Role', 'Proj_IP', 'Predicted_Fpoints_IP', 
                'Proj_Skill_Fpoints', 'Proj_Team_Fpoints', 'Projected_Fpoints']
print(pred_2026_pit[display_cols].head(30).to_string(index=False))


=== 2026 Pitcher Total Projections ===
 Rank               Name  Team Role  Proj_IP  Predicted_Fpoints_IP  Proj_Skill_Fpoints  Proj_Team_Fpoints  Projected_Fpoints
    1       Tarik Skubal   DET   SP    205.0              2.480192          508.439349               16.0         524.439349
    2    Garrett Crochet   BOS   SP    197.0              2.482658          489.083702               12.0         501.083702
    3      Hunter Greene   CIN   SP    183.0              2.422286          443.278344                8.0         451.278344
    4        Paul Skenes   PIT   SP    198.0              2.166109          428.889646               14.0         442.889646
    5        Dylan Cease   SDP   SP    189.0              2.302171          435.110363                6.0         441.110363
    6        Cole Ragans   KCR   SP    173.0              2.449911          423.834582                8.0         431.834582
    7          Bryan Woo   SEA   SP    200.0              1.971158          394.23158

In [42]:
# Show SP and RP rankings separately
print("\n=== Top 25 Starting Pitchers ===")
sp_final = pred_2026_pit[pred_2026_pit['Role'] == 'SP'].copy()
sp_final['SP_Rank'] = range(1, len(sp_final) + 1)
print(sp_final[['SP_Rank', 'Name', 'Team', 'Proj_IP', 'Proj_W', 'Proj_L', 'Projected_Fpoints']].head(25).to_string(index=False))

print("\n=== Top 25 Relief Pitchers ===")
rp_final = pred_2026_pit[pred_2026_pit['Role'] == 'RP'].copy()
rp_final['RP_Rank'] = range(1, len(rp_final) + 1)
print(rp_final[['RP_Rank', 'Name', 'Team', 'Proj_IP', 'Proj_SV', 'Proj_HLD', 'Projected_Fpoints']].head(25).to_string(index=False))


=== Top 25 Starting Pitchers ===
 SP_Rank               Name Team  Proj_IP  Proj_W  Proj_L  Projected_Fpoints
       1       Tarik Skubal  DET    205.0    15.0     7.0         524.439349
       2    Garrett Crochet  BOS    197.0    14.0     8.0         501.083702
       3      Hunter Greene  CIN    183.0    12.0     8.0         451.278344
       4        Paul Skenes  PIT    198.0    14.0     7.0         442.889646
       5        Dylan Cease  SDP    189.0    13.0    10.0         441.110363
       6        Cole Ragans  KCR    173.0    12.0     8.0         431.834582
       7          Bryan Woo  SEA    200.0    14.0     9.0         404.231582
       8 Cristopher Sanchez  PHI    204.0    15.0     8.0         401.102191
       9       Jacob deGrom  TEX    174.0    11.0     8.0         400.606301
      10         Chris Sale  ATL    166.0    13.0     7.0         391.770696
      11      Logan Gilbert  SEA    167.0    12.0     7.0         384.381408
      12         Logan Webb  SFG    205.0 

---
## Part 4: Save Final Rankings
---

In [43]:
# Save final predictions with totals
pred_2026_bat.to_csv('predictions/batters_2026_final.csv', index=False)
pred_2026_pit.to_csv('predictions/pitchers_2026_final.csv', index=False)

print("Saved final predictions:")
print("  - predictions/batters_2026_final.csv")
print("  - predictions/pitchers_2026_final.csv")

Saved final predictions:
  - predictions/batters_2026_final.csv
  - predictions/pitchers_2026_final.csv


In [44]:
# Create combined overall ranking
batters_ranked = pred_2026_bat[['Name', 'Team', 'Projected_Fpoints']].copy()
batters_ranked['Type'] = 'Batter'

pitchers_ranked = pred_2026_pit[['Name', 'Team', 'Role', 'Projected_Fpoints']].copy()
pitchers_ranked['Type'] = pitchers_ranked['Role']
pitchers_ranked = pitchers_ranked.drop('Role', axis=1)

overall = pd.concat([batters_ranked, pitchers_ranked], ignore_index=True)
overall = overall.sort_values('Projected_Fpoints', ascending=False).reset_index(drop=True)
overall['Overall_Rank'] = range(1, len(overall) + 1)

print("\n=== 2026 Overall Rankings (Top 50) ===")
print(overall[['Overall_Rank', 'Name', 'Team', 'Type', 'Projected_Fpoints']].head(50).to_string(index=False))


=== 2026 Overall Rankings (Top 50) ===
 Overall_Rank                  Name  Team   Type  Projected_Fpoints
            1          Tarik Skubal   DET     SP         524.439349
            2             Juan Soto   NYM Batter         520.691083
            3       Garrett Crochet   BOS     SP         501.083702
            4           Aaron Judge   NYY Batter         489.633886
            5         Shohei Ohtani   LAD Batter         468.100622
            6          Jose Ramirez   CLE Batter         459.711472
            7         Hunter Greene   CIN     SP         451.278344
            8        Bobby Witt Jr.   KCR Batter         444.932372
            9           Paul Skenes   PIT     SP         442.889646
           10           Dylan Cease   SDP     SP         441.110363
           11           Cole Ragans   KCR     SP         431.834582
           12 Vladimir Guerrero Jr.   TOR Batter         417.883511
           13           Kyle Tucker   CHC Batter         410.622040
        

In [45]:
# Save overall rankings
overall.to_csv('predictions/overall_2026_rankings.csv', index=False)
print("\nSaved to predictions/overall_2026_rankings.csv")


Saved to predictions/overall_2026_rankings.csv


---
## Summary

Generated predictions saved to `predictions/` folder:

| File | Description |
|------|-------------|
| `batters_2026_rate_predictions.csv` | Predicted Fpoints/PA for all batters |
| `pitchers_2026_rate_predictions.csv` | Predicted Fpoints/IP for all pitchers |
| `batters_2026_final.csv` | Batter totals with PA projections applied |
| `pitchers_2026_final.csv` | Pitcher totals with IP/W/L/SV/HLD projections |
| `overall_2026_rankings.csv` | Combined rankings (batters + pitchers) |

---