# 📈 Model Evaluation & Metrics: Amazon Sales Regression

## 🎯 Learning Objectives
- Understand key regression evaluation metrics
- Apply cross-validation for robust model assessment
- Interpret results in a business context

---

## 📊 Key Regression Metrics
- **MSE (Mean Squared Error)**: Average squared difference between actual and predicted values
- **RMSE (Root Mean Squared Error)**: Square root of MSE, interpretable in target units
- **MAE (Mean Absolute Error)**: Average absolute difference
- **MAPE (Mean Absolute Percentage Error)**: Average absolute percent error
- **R² (R-squared)**: Proportion of variance explained by the model
- **Adjusted R²**: R² adjusted for number of predictors

**Business Interpretation:**
- Lower MSE, RMSE, MAE, and MAPE = better predictions
- Higher R²/Adjusted R² = more variance explained
- Use RMSE/MAE relative to average revenue for business impact


In [None]:
# Example: Evaluate a regression model
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import numpy as np

def regression_metrics(y_true, y_pred):
    mse = mean_squared_error(y_true, y_pred)
    rmse = np.sqrt(mse)
    mae = mean_absolute_error(y_true, y_pred)
    mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100
    r2 = r2_score(y_true, y_pred)
    n = len(y_true)
    p = 1  # Set to number of features for multiple regression
    adj_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
    return {
        'MSE': mse, 'RMSE': rmse, 'MAE': mae, 'MAPE': mape, 'R2': r2, 'Adj_R2': adj_r2
    }

# Example usage (replace with your model's predictions):
y_true = np.array([100, 200, 300, 400, 500])
y_pred = np.array([110, 190, 310, 390, 480])
metrics = regression_metrics(y_true, y_pred)
print(metrics)

## 🔁 Cross-Validation
- Splits data into multiple train/test sets
- Reduces risk of overfitting
- Common: K-Fold (e.g., 5-fold, 10-fold)

**Business Value:**
- Ensures model is robust and generalizes to new data
- More reliable for business decision-making


In [None]:
from sklearn.model_selection import cross_val_score, KFold
from sklearn.linear_model import LinearRegression

# Example: 5-fold cross-validation
X = np.random.rand(100, 2)  # Replace with your features
y = np.random.rand(100)     # Replace with your target
model = LinearRegression()
cv = KFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=cv, scoring='neg_mean_squared_error')
rmse_scores = np.sqrt(-scores)
print(f'Cross-validated RMSE: {rmse_scores.mean():.2f} ± {rmse_scores.std():.2f}')

## 📉 Bias-Variance Tradeoff
- **High Bias**: Underfitting, model too simple
- **High Variance**: Overfitting, model too complex
- **Goal**: Find the sweet spot for best generalization

**Business Impact:**
- Underfit: Missed opportunities, poor forecasts
- Overfit: Bad business decisions from unreliable predictions


## 🏆 Business Interpretation
- Use metrics to communicate model value to stakeholders
- Relate errors to business KPIs (e.g., RMSE as % of average sales)
- Choose models that balance accuracy and interpretability

---
**Next:** Advanced topics: regularization, model drift, fine-tuning, and production!
