# 03 — Baseline Regression (Predict Team Points)

**Target:** `pts`  
**Split:** chronological (first 80% train → last 20% test)  
**Metrics:** MAE, RMSE vs. baseline (season average)

**Plan:**
1. Select features (e.g., `home`, `rest_days`, `fg_pct`, `reb`, `tov`, `opponent_*`)
2. Chronological split
3. Baselines (team season avg) + Linear Regression + Ridge
4. Evaluate & plot Predicted vs Actual


In [None]:
# Setup & data load
import pandas as pd
import numpy as np
from pathlib import Path

from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_absolute_error, mean_squared_error

CLEAN_PATH = Path("../data/processed/team_games_clean.csv")
df = pd.read_csv(CLEAN_PATH, parse_dates=["game_date"])
print("Loaded:", CLEAN_PATH, "| shape:", df.shape)

# Placeholder feature/target selection (adjust later)
TARGET = "pts"
CANDIDATE_FEATURES = [
    "home", "rest_days",
    "fg_pct", "fga", "fgm",
    "reb", "tov",
    "opponent_pts", "opponent_fg_pct", "opponent_reb", "opponent_tov",
]

# Chronological 80/20 split index
df = df.sort_values("game_date").reset_index(drop=True)
split_idx = int(len(df) * 0.8)
train_df, test_df = df.iloc[:split_idx], df.iloc[split_idx:]

print(f"Train: {train_df.shape}, Test: {test_df.shape}")
