# Performance Metrics in Linear Regression: R² and Adjusted R²

---

## 1. Why Performance Metrics?

- After building a regression model, we need to evaluate **how good the model is**.  
- Common metrics for linear regression:  
  1. **R squared (R²)**  
  2. **Adjusted R squared (Adjusted R²)**  

---

## 2. R Squared (R²)

### Definition

R² measures how much of the **variation in the dependent variable** is explained by the independent variables:

$$
R^2 = 1 - \frac{\text{SS}_{\text{residual}}}{\text{SS}_{\text{total}}}
$$

Where:  
- $\text{SS}_{\text{residual}}$ = Sum of squared residuals (errors)  
- $\text{SS}_{\text{total}}$ = Total sum of squares  

---

### 2.1 Residuals

- Residual = difference between **true value** and **predicted value**:

$$
\text{Residual for } i\text{th data point: } e_i = y_i - \hat{y}_i
$$

- Sum of squared residuals:

$$
\text{SS}_{\text{residual}} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

---

### 2.2 Total Sum of Squares

- Measures deviation from the mean of the true values:

$$
\text{SS}_{\text{total}} = \sum_{i=1}^{n} (y_i - \bar{y})^2
$$

Where $\bar{y}$ is the mean of all $y_i$ values.

---

### 2.3 Intuition

- **R² close to 1:** Model explains most of the variance → Good fit  
- **R² close to 0:** Model explains very little variance → Poor fit  

**Example:**  
- R² = 0.75 → 75% of the variance is explained by the model  
- R² = 0.90 → 90% of the variance is explained by the model  

---

## 3. Problem with R²

- **Adding more features** (even irrelevant ones) can **increase R²**, even if they are not correlated with the target.  
- Example: Adding a "gender of resident" feature to predict house price might slightly increase R², even though it has no real correlation with price.  

---

## 4. Adjusted R Squared (Adjusted R²)

- **Adjusted R²** compensates for the addition of irrelevant features by **penalizing unnecessary variables**.  

### Formula

$$
R^2_{\text{adj}} = 1 - \left( 1 - R^2 \right) \frac{n-1}{n-p-1}
$$

Where:  
- $n$ → Number of data points  
- $p$ → Number of independent features  

---

### 4.1 Intuition

- If a **new feature improves the model** → Adjusted R² **increases**  
- If a **new feature is irrelevant** → Adjusted R² **decreases**  

**Example:**
| Features Added | R² | Adjusted R² | Interpretation |
|----------------|----|-------------|----------------|
| Size           | 0.75 | 0.75       | Good correlation |
| Size + Bedrooms | 0.80 | 0.78       | Both correlated |
| Size + Bedrooms + Gender | 0.82 | 0.76 | Gender irrelevant → penalized |

- Adjusted R² gives a **more reliable measure** of model performance when **multiple features** are used.

---

## 5. Summary

- **R²**: Measures proportion of variance explained; can be misleading with extra features  
- **Adjusted R²**: Corrects R² by penalizing irrelevant features  
- **Higher value (closer to 1)** → Better model  
- **Use Adjusted R²** for multiple linear regression with many features  

---

✅ **Key Takeaways:**

1. R² evaluates **model fit**, but can increase with irrelevant features.  
2. Adjusted R² **penalizes unnecessary features**, making it more robust for multiple regression.  
3. Both metrics help in **selecting and evaluating regression models**.


# Error Metrics in Linear Regression: MSE, MAE, RMSE

---

## 1. Introduction

- In linear regression, besides **R² and Adjusted R²**, we can evaluate the **error per data point** using:  
  1. **Mean Squared Error (MSE)**  
  2. **Mean Absolute Error (MAE)**  
  3. **Root Mean Squared Error (RMSE)**  

- These metrics quantify how far predictions are from true values.

---

## 2. Mean Squared Error (MSE)

### Formula

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

Where:  
- \(y_i\) = true value  
- \(\hat{y}_i\) = predicted value  
- \(n\) = number of data points  

---

### Intuition

- Measures **average squared deviation** of predictions from true values.  
- Squaring emphasizes larger errors.

---

### Advantages

1. **Differentiable** – allows gradient descent to find **global minimum**.  
2. **Convex function** – only one global minimum, no local minima.  
3. **Faster convergence** in optimization because of the smooth gradient.

---

### Disadvantages

1. **Not robust to outliers** – large errors are heavily penalized due to squaring.  
2. **Units are squared** – if target is in dollars, MSE is in dollars², making interpretation harder.  

---

## 3. Mean Absolute Error (MAE)

### Formula

$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
$$

---

### Advantages

1. **Robust to outliers** – errors are not squared, so extreme points have less influence.  
2. **Same units as target** – easier to interpret.

---

### Disadvantages

1. **Non-differentiable at 0** – gradient at zero error is undefined.  
2. **Convergence is slower** – optimization is more complex; often solved using **subgradients**.

---

## 4. Root Mean Squared Error (RMSE)

### Formula

$$
\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
$$

- Simply the **square root of MSE**.  

---

### Advantages

1. **Same units as target** – easier interpretation than MSE.  
2. **Differentiable** – suitable for gradient-based optimization.  

---

### Disadvantages

1. **Not robust to outliers** – large deviations still heavily penalized.  

---

## 5. Choosing Between Metrics

| Metric | Advantage | Disadvantage | When to Use |
|--------|-----------|-------------|-------------|
| **MSE** | Differentiable, convex, fast convergence | Not robust, units squared | Optimization via gradient descent |
| **MAE** | Robust to outliers, same units | Slower convergence, non-differentiable at 0 | When outliers are present |
| **RMSE** | Same units, differentiable | Not robust to outliers | Combines MSE benefits with interpretable units |

---

## 6. Summary

- **MSE, MAE, RMSE** measure error per data point.  
- **R² and Adjusted R²** measure variance explained by the model.  
- **MSE** → fast, smooth optimization, sensitive to outliers.  
- **MAE** → robust to outliers, slower convergence.  
- **RMSE** → interpretable units, sensitive to outliers.  

> **Interview Tip:**  
> - MSE is preferred for optimization (quadratic loss).  
> - MAE is preferred when robustness to outliers is important.  
> - Compare with R² / Adjusted R² to evaluate overall model fit.
