# Chapter 2.3: Evaluation Metrics

Goal: Calculate and interpret MSE, MAE, RMSE, and R² to understand model performance.

### Topics:
- Computing metrics manually and with sklearn
- Understanding when to use MSE vs MAE
- Interpreting R² as variance explained
- How outliers affect different metrics

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

## Quick Recap

- **MSE**: Average of squared errors - penalizes large errors heavily
- **RMSE**: Square root of MSE - same units as target variable
- **MAE**: Average of absolute errors - robust to outliers
- **R²**: Proportion of variance explained (0 to 1, higher is better)

In [None]:
# Load the Diamonds dataset
diamonds = sns.load_dataset('diamonds')
diamonds.head()

In [None]:
# We'll predict price from carat
X = diamonds[['carat']]
y = diamonds['price']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the model
model = LinearRegression()
model.fit(X_train, y_train)

# Get predictions
y_pred = model.predict(X_test)

print(f"Model fitted. Test set size: {len(y_test)}")

## Practice

### 1. Fit regression: carat → price, calculate predictions

(Already done above - just verify the predictions look reasonable)

In [None]:
# Look at first 5 actual vs predicted values
comparison = pd.DataFrame({
    'Actual': y_test.values[:5],
    'Predicted': y_pred[:5],
    'Error': y_test.values[:5] - y_pred[:5]
})
comparison

### 2. Calculate MSE by hand: `((y_true - y_pred)**2).mean()`

In [None]:
# Step 1: Calculate the errors (residuals)
errors = y_test - y_pred

# Step 2: Square the errors


# Step 3: Take the mean


print(f"MSE (by hand): {mse_manual:.2f}")

### 3. Calculate MAE by hand: `(abs(y_true - y_pred)).mean()`

In [None]:
# Step 1: Take absolute value of errors


# Step 2: Take the mean


print(f"MAE (by hand): {mae_manual:.2f}")

### 4. Use sklearn functions to verify your calculations

In [None]:
# Calculate MSE, MAE, and R² using sklearn
mse_sklearn = mean_squared_error(y_test, y_pred)
mae_sklearn = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mse_sklearn)

print(f"MSE (sklearn): {mse_sklearn:.2f}")
print(f"MAE (sklearn): {mae_sklearn:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"R²: {r2:.4f}")

**Question:** Which metric is most interpretable for this problem? Why?

(Write your answer here - hint: RMSE is in the same units as price)

### 5. Add 5 extreme outliers to y_test, recalculate MSE vs MAE - which changed more?

In [None]:
# Create a copy of y_test with outliers
y_test_outliers = y_test.copy()

# Add 5 extreme outliers (errors of $50,000)
outlier_indices = y_test_outliers.index[:5]
y_pred_outliers = y_pred.copy()
y_pred_outliers[:5] = y_pred_outliers[:5] + 50000  # Predictions are way off

# Calculate new metrics with outliers
mse_outliers = mean_squared_error(y_test, y_pred_outliers)
mae_outliers = mean_absolute_error(y_test, y_pred_outliers)

print("Original metrics:")
print(f"  MSE: {mse_sklearn:,.2f}")
print(f"  MAE: {mae_sklearn:,.2f}")

print("\nWith 5 outliers:")
print(f"  MSE: {mse_outliers:,.2f}")
print(f"  MAE: {mae_outliers:,.2f}")

print("\nPercent change:")
print(f"  MSE changed by: {(mse_outliers - mse_sklearn) / mse_sklearn * 100:.1f}%")
print(f"  MAE changed by: {(mae_outliers - mae_sklearn) / mae_sklearn * 100:.1f}%")

**Your interpretation:** Which metric changed more? Why does this happen?

(Write your answer here)

### 6. Calculate Adjusted R² manually using the formula

Adjusted R² penalizes adding features that don't help:

$$\text{Adjusted } R^2 = 1 - \frac{(1 - R^2)(n - 1)}{n - p - 1}$$

Where:
- n = number of samples
- p = number of features

In [None]:
# Calculate Adjusted R²
n = len(y_test)  # number of samples
p = X_test.shape[1]  # number of features

# Apply the formula
adjusted_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)

print(f"R²: {r2:.6f}")
print(f"Adjusted R²: {adjusted_r2:.6f}")
print(f"\nDifference: {r2 - adjusted_r2:.6f}")

**Question:** Why is Adjusted R² slightly lower than R²? When would this difference be larger?

(Write your answer here)

## Summary

| Metric | Value | Interpretation |
|--------|-------|----------------|
| MSE | | Sensitive to large errors |
| RMSE | | Average error in dollars |
| MAE | | Robust to outliers |
| R² | | Variance explained |