# Linear Regression Tutorial for Hockey Prediction

This tutorial explains how linear regression works and how to use it for
predicting hockey game outcomes.

## What You'll Learn

1. **Linear Regression Basics** - How the model works
2. **Regularization** - Ridge, Lasso, and ElasticNet
3. **Feature Engineering** - Polynomial features
4. **Interpreting Coefficients** - Understanding what the model learned
5. **Practical Usage** - Training and prediction

---

## 1. Understanding Linear Regression

Linear regression predicts a target value as a **weighted sum of features**:

$$\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n$$

Where:
- $\hat{y}$ = predicted value (e.g., goals scored)
- $\beta_0$ = intercept (baseline prediction)
- $\beta_i$ = coefficient for feature $x_i$
- $x_i$ = feature values (e.g., team ELO, recent form)

### Example
For predicting home goals:
```
home_goals = 2.5 + (0.003 √ó elo_diff) + (0.5 √ó recent_form) - (0.2 √ó away_defense)
```

In [None]:
# Setup
import sys
sys.path.insert(0, '..')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from utils.linear_model import (
    LinearRegressionModel,
    LinearGoalPredictor,
    compare_regularization
)

print("Tutorial ready!")

## 2. Regularization Explained

Regularization **penalizes large coefficients** to prevent overfitting.

### Ridge Regression (L2)
- Adds penalty: $\lambda \sum \beta_i^2$
- **Shrinks** coefficients toward zero
- Keeps all features, just reduces their impact
- Use when: All features might be relevant

### Lasso Regression (L1)
- Adds penalty: $\lambda \sum |\beta_i|$
- **Zeros out** some coefficients entirely
- Performs feature selection
- Use when: You want to identify most important features

### ElasticNet (L1 + L2)
- Combines both: $\lambda_1 \sum |\beta_i| + \lambda_2 \sum \beta_i^2$
- Best of both worlds
- Use when: You want feature selection + coefficient shrinkage

In [None]:
# Create sample data
np.random.seed(42)
n = 300

# Features
data = pd.DataFrame({
    'elo_diff': np.random.normal(0, 100, n),       # ELO difference
    'home_form': np.random.uniform(0, 1, n),       # Recent form (0-1)
    'away_form': np.random.uniform(0, 1, n),
    'rest_advantage': np.random.choice([-2, -1, 0, 1, 2], n),
    'noise_1': np.random.normal(0, 1, n),          # Irrelevant feature
    'noise_2': np.random.normal(0, 1, n),          # Irrelevant feature
})

# Target (only depends on first 4 features)
data['home_goals'] = (
    2.8 + 
    0.005 * data['elo_diff'] + 
    0.8 * data['home_form'] - 
    0.3 * data['away_form'] + 
    0.2 * data['rest_advantage'] +
    np.random.normal(0, 0.8, n)  # Random noise
).clip(0)

print("Sample data with 4 relevant + 2 noise features:")
data.head()

In [None]:
# Split data
feature_cols = ['elo_diff', 'home_form', 'away_form', 'rest_advantage', 'noise_1', 'noise_2']
X = data[feature_cols]
y = data['home_goals']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Training: {len(X_train)}, Test: {len(X_test)}")

### 2.1 Compare Regularization Types

In [None]:
# Train Ridge (L2)
ridge = LinearRegressionModel(alpha=0.1, l1_ratio=0.0, name='ridge')
ridge.fit(X_train, y_train)

# Train Lasso (L1) 
lasso = LinearRegressionModel(alpha=0.1, l1_ratio=1.0, name='lasso')
lasso.fit(X_train, y_train)

# Train ElasticNet
elastic = LinearRegressionModel(alpha=0.1, l1_ratio=0.5, name='elasticnet')
elastic.fit(X_train, y_train)

print("Models trained!")

In [None]:
# Compare coefficients
coef_comparison = pd.DataFrame({
    'Feature': feature_cols,
    'Ridge': ridge.coef_,
    'Lasso': lasso.coef_,
    'ElasticNet': elastic.coef_,
    'True Importance': [0.005, 0.8, -0.3, 0.2, 0, 0]  # Ground truth
})

print("Coefficient Comparison:")
print("(Lasso sets noise features to ~0)")
coef_comparison

In [None]:
# Visualize
fig, ax = plt.subplots(figsize=(12, 5))

x = np.arange(len(feature_cols))
width = 0.2

ax.bar(x - width, coef_comparison['Ridge'], width, label='Ridge', alpha=0.8)
ax.bar(x, coef_comparison['Lasso'], width, label='Lasso', alpha=0.8)
ax.bar(x + width, coef_comparison['ElasticNet'], width, label='ElasticNet', alpha=0.8)

ax.set_xticks(x)
ax.set_xticklabels(feature_cols, rotation=45)
ax.set_ylabel('Coefficient Value')
ax.set_title('Coefficients by Regularization Type\n(Notice: Lasso zeros out noise features)')
ax.legend()
ax.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
plt.tight_layout()
plt.show()

### Key Insight

Notice how **Lasso** sets `noise_1` and `noise_2` coefficients to nearly zero!
This is automatic feature selection - Lasso identifies that these features
don't help predict goals.

## 3. The Alpha Parameter

**Alpha** controls the strength of regularization:
- `alpha = 0`: No regularization (plain linear regression)
- `alpha = 0.01`: Light regularization
- `alpha = 1.0`: Moderate regularization
- `alpha = 10.0`: Strong regularization

In [None]:
# Compare different alpha values
alphas = [0.001, 0.01, 0.1, 1.0, 10.0]
results = []

for alpha in alphas:
    model = LinearRegressionModel(alpha=alpha, l1_ratio=1.0)  # Lasso
    model.fit(X_train, y_train)
    metrics = model.evaluate(X_test, y_test)
    n_nonzero = len([c for c in model.coef_ if abs(c) > 0.001])
    
    results.append({
        'alpha': alpha,
        'RMSE': metrics['rmse'],
        'R¬≤': metrics['r2'],
        'Non-zero features': n_nonzero
    })

pd.DataFrame(results)

## 4. Polynomial Features

Sometimes relationships are **non-linear**. Polynomial features add:
- Squared terms: $x^2$
- Interaction terms: $x_1 \times x_2$

This lets linear regression capture non-linear patterns!

In [None]:
# Create data with non-linear relationship
np.random.seed(42)
x_simple = np.random.uniform(-3, 3, 100)
y_simple = 2 + 0.5 * x_simple + 0.3 * x_simple**2 + np.random.normal(0, 0.5, 100)

# Linear model (degree=1)
linear_model = LinearRegressionModel(alpha=0.01, poly_degree=1)
linear_model.fit(pd.DataFrame({'x': x_simple}), y_simple)

# Polynomial model (degree=2)
poly_model = LinearRegressionModel(alpha=0.01, poly_degree=2)
poly_model.fit(pd.DataFrame({'x': x_simple}), y_simple)

# Plot
x_plot = np.linspace(-3, 3, 100)
y_linear = linear_model.predict(pd.DataFrame({'x': x_plot}))
y_poly = poly_model.predict(pd.DataFrame({'x': x_plot}))

fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x_simple, y_simple, alpha=0.5, label='Data')
ax.plot(x_plot, y_linear, 'r-', linewidth=2, label='Linear (degree=1)')
ax.plot(x_plot, y_poly, 'g-', linewidth=2, label='Polynomial (degree=2)')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('Linear vs Polynomial Features')
ax.legend()
plt.show()

print(f"Linear RMSE: {linear_model.evaluate(pd.DataFrame({'x': x_simple}), y_simple)['rmse']:.4f}")
print(f"Poly RMSE:   {poly_model.evaluate(pd.DataFrame({'x': x_simple}), y_simple)['rmse']:.4f}")

## 5. Interpreting Coefficients

Linear regression coefficients have **direct interpretation**:

| Coefficient | Meaning |
|-------------|--------------------------------------|
| `+0.5` | 1 unit increase ‚Üí 0.5 more goals |
| `-0.3` | 1 unit increase ‚Üí 0.3 fewer goals |
| `0.0` | Feature has no effect |

**Important**: Features must be on similar scales for fair comparison!

In [None]:
# Train model with scaling
model = LinearRegressionModel(
    alpha=0.1, 
    l1_ratio=0.5,
    scaling='standard'  # Standardize features first
)
model.fit(X_train, y_train)

# Get coefficients
coefs = model.get_coefficients()
print("Coefficients (standardized features):")
print("Larger absolute value = more important\n")
coefs

In [None]:
# Interpret
print("Interpretation:")
print("-" * 50)
for _, row in coefs.iterrows():
    feat = row['feature']
    coef = row['coefficient']
    direction = "increases" if coef > 0 else "decreases"
    
    if abs(coef) < 0.01:
        print(f"{feat}: No effect (coefficient ‚âà 0)")
    else:
        print(f"{feat}: 1 std increase {direction} predicted goals by {abs(coef):.3f}")

## 6. Complete Workflow

Here's the full workflow for using linear regression in hockey prediction:

In [None]:
# Step 1: Create predictor
predictor = LinearGoalPredictor(
    alpha=0.1,           # Regularization strength
    l1_ratio=0.5,        # ElasticNet (L1+L2)
    scaling='standard',  # Standardize features
    poly_degree=1        # Linear (no polynomial)
)

print("Created predictor:")
print(predictor)

In [None]:
# Step 2: Prepare data with both home and away goals
full_data = data.copy()
full_data['away_goals'] = (
    2.5 - 
    0.003 * data['elo_diff'] + 
    0.6 * data['away_form'] - 
    0.2 * data['home_form'] -
    0.1 * data['rest_advantage'] +
    np.random.normal(0, 0.7, len(data))
).clip(0)

train_df, test_df = train_test_split(full_data, test_size=0.2, random_state=42)
print(f"Training games: {len(train_df)}")

In [None]:
# Step 3: Train
predictor.fit(train_df)
print("Training complete!")

In [None]:
# Step 4: Evaluate
metrics = predictor.evaluate(test_df)

print("\n" + "="*50)
print("EVALUATION RESULTS")
print("="*50)
print(f"\nHome Goals: RMSE={metrics['home']['rmse']:.3f}, R¬≤={metrics['home']['r2']:.3f}")
print(f"Away Goals: RMSE={metrics['away']['rmse']:.3f}, R¬≤={metrics['away']['r2']:.3f}")
print(f"Combined:   RMSE={metrics['combined']['rmse']:.3f}")
print(f"\nWin Prediction Accuracy: {metrics['win_accuracy']:.1%}")

In [None]:
# Step 5: Make predictions
sample_game = {
    'elo_diff': 50,        # Home team 50 ELO higher
    'home_form': 0.8,      # Home team on hot streak
    'away_form': 0.4,      # Away team struggling
    'rest_advantage': 1,   # Home had 1 more rest day
    'noise_1': 0,
    'noise_2': 0,
}

home_pred, away_pred = predictor.predict_goals(sample_game)

print("\nPrediction for sample game:")
print(f"  Home: {home_pred:.1f} goals")
print(f"  Away: {away_pred:.1f} goals")
print(f"  Predicted winner: {'Home' if home_pred > away_pred else 'Away'}")

In [None]:
# Step 6: Save for later use
predictor.save('../models/saved/tutorial_linear')
print("Model saved!")

## 7. When to Use Linear Regression

### ‚úÖ Good Use Cases
- **Interpretability** matters (understand feature contributions)
- **Limited data** (less prone to overfitting)
- **Baseline model** to compare against complex models
- **Feature selection** with Lasso/ElasticNet
- **Fast training** and inference

### ‚ùå Limitations
- Assumes linear relationships (unless using polynomial)
- Can't capture complex feature interactions automatically
- May underperform vs. tree-based models on complex data

### üéØ Typical Hockey Features
- ELO/rating differences
- Recent form/win percentages
- Rest days, back-to-back status
- Goals for/against averages
- Power play / Penalty kill percentages
- Home ice advantage

## 8. Quick Reference

### Import
```python
from utils.linear_model import LinearRegressionModel, LinearGoalPredictor
```

### Create Model
```python
model = LinearRegressionModel(
    alpha=0.1,          # Regularization strength
    l1_ratio=0.5,       # 0=Ridge, 0.5=ElasticNet, 1=Lasso
    scaling='standard', # Feature scaling
    poly_degree=1       # 1=linear, 2=quadratic
)
```

### Train & Evaluate
```python
model.fit(X_train, y_train)
metrics = model.evaluate(X_test, y_test)
print(f"RMSE: {metrics['rmse']:.4f}")
```

### Get Coefficients
```python
coefs = model.get_coefficients(top_n=10)
importance = model.get_feature_importance()
```

### Save & Load
```python
model.save('model.pkl')
model = LinearRegressionModel.load('model.pkl')
```

---

## Summary

| Concept | Key Point |
|---------|--------------------|
| **Linear Regression** | Weighted sum of features |
| **Ridge (L2)** | Shrinks all coefficients |
| **Lasso (L1)** | Zeros out unimportant features |
| **ElasticNet** | Combines L1 and L2 |
| **Alpha** | Higher = stronger regularization |
| **Polynomial** | Captures non-linear relationships |
| **Coefficients** | Directly interpretable |

Linear regression is a great starting point for hockey prediction, providing
interpretable results and serving as a strong baseline for comparison with
more complex models like XGBoost.