<a href="https://colab.research.google.com/github/svgoudar/My-Data-Science-Roadmap/blob/main/ML/Supervised%20Learning/Regression/Linear%20Regression/3.Evaluation%20metrocs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **1Ô∏è‚É£ Mean Squared Error (MSE)**

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2
$$

* Measures **average squared difference** between actual and predicted values.
* **Squaring** penalizes large errors more heavily (outliers have a big effect).
* **Lower is better**.

üìå **Use when**: You want to heavily penalize large errors.

---

## **2Ô∏è‚É£ Root Mean Squared Error (RMSE)**

$$
\text{RMSE} = \sqrt{\text{MSE}}
$$

* Same as MSE, but in the **same units as the target variable** (e.g., dollars, meters).
* Easier to interpret than MSE.

üìå **Use when**: You want an error measure in the target‚Äôs original scale.

---

## **3Ô∏è‚É£ Mean Absolute Error (MAE)**

$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|
$$

* Takes the **absolute difference** instead of squaring.
* Less sensitive to outliers than MSE.
* Still **lower is better**.

üìå **Use when**: Outliers exist and you want a more robust measure.

---

## **4Ô∏è‚É£ R-squared ($R^2$)**

$$
R^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}}
$$

Where:

* $\text{SS}_{\text{res}} = \sum (y_i - \hat{y}_i)^2$ (Residual sum of squares)
* $\text{SS}_{\text{tot}} = \sum (y_i - \bar{y})^2$ (Total sum of squares)

**Interpretation:**

* $R^2 = 0.9$ ‚Üí model explains **90% of the variance** in the target.
* $R^2 = 0$ ‚Üí model explains nothing beyond the mean.
* Can be **negative** if the model is worse than predicting the mean.

üìå **Use when**: You want to measure the proportion of variance explained.

---

## **5Ô∏è‚É£ Adjusted R-squared**

$$
R^2_{\text{adj}} = 1 - \frac{(1 - R^2)(n - 1)}{n - p - 1}
$$

Where:

* $n$ = number of observations

* $p$ = number of predictors

* **Penalizes** adding irrelevant variables.

* Useful for **multiple regression**.

üìå **Use when**: Comparing models with different numbers of predictors.

---

## **6Ô∏è‚É£ Mean Absolute Percentage Error (MAPE)**

$$
\text{MAPE} = \frac{100}{n} \sum \left| \frac{y_i - \hat{y}_i}{y_i} \right|
$$

* Expresses errors as a **percentage** of the actual values.
* Cannot be used if $y_i = 0$ for any point.

üìå **Use when**: You want an error measure in percentage terms.

---

### **Quick Comparison Table**

| Metric | Units                | Outlier Sensitivity | Range   | Higher/Lower Better? |
| ------ | -------------------- | ------------------- | ------- | -------------------- |
| MSE    | Squared target units | High                | ‚â• 0     | Lower                |
| RMSE   | Target units         | High                | ‚â• 0     | Lower                |
| MAE    | Target units         | Low                 | ‚â• 0     | Lower                |
| R¬≤     | None                 | Moderate            | (-‚àû, 1] | Higher               |
| Adj R¬≤ | None                 | Moderate            | (-‚àû, 1] | Higher               |
| MAPE   | %                    | Moderate            | \[0, ‚àû) | Lower                |



In [None]:
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# 1Ô∏è‚É£ Load dataset
data = fetch_california_housing(as_frame=True)
df = data.frame

# 2Ô∏è‚É£ Features and target
X = df.drop(columns=["MedHouseVal"])
y = df["MedHouseVal"]

# 3Ô∏è‚É£ Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 4Ô∏è‚É£ Train Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# 5Ô∏è‚É£ Predictions
y_pred = model.predict(X_test)

# 6Ô∏è‚É£ Calculate Metrics
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Adjusted R¬≤ formula
n = X_test.shape[0]  # number of observations
p = X_test.shape[1]  # number of predictors
adj_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)

# MAPE calculation
mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100

# 7Ô∏è‚É£ Create results DataFrame
metrics_df = pd.DataFrame({
    "Metric": ["MSE", "RMSE", "MAE", "R¬≤", "Adjusted R¬≤", "MAPE (%)"],
    "Value": [mse, rmse, mae, r2, adj_r2, mape]
})

metrics_df

Unnamed: 0,Metric,Value
0,MSE,0.555892
1,RMSE,0.745581
2,MAE,0.5332
3,R¬≤,0.575788
4,Adjusted R¬≤,0.574964
5,MAPE (%),31.952187
