# **L1 and L2 Norms**

L1 and L2 norms are ways to measure the size or magnitude of a vector. They are commonly used in **machine learning, optimization, and regularization techniques** like Lasso and Ridge regression.

---

## **1. L1 Norm (Manhattan Distance or Taxicab Norm)**
The **L1 norm** of a vector is the **sum of the absolute values** of its components.  

### **Formula**  
For a vector $ x = (x_1, x_2, ..., x_n) $, the L1 norm is:  

$$
\|x\|_1 = |x_1| + |x_2| + \dots + |x_n|
$$

### **Example**  
If $ x = (3, -4) $, then:  

$$
\|x\|_1 = |3| + |-4| = 3 + 4 = 7
$$

### **Properties**  
✅ Encourages sparsity (used in Lasso regression).  
✅ Less sensitive to outliers than L2 norm.  
✅ Used in feature selection (removes less important coefficients).  

---

## **2. L2 Norm (Euclidean Distance)**
The **L2 norm** of a vector is the **square root of the sum of squares** of its components.  

### **Formula**  
For a vector $ x = (x_1, x_2, ..., x_n) $, the L2 norm is:  

$$
\|x\|_2 = \sqrt{x_1^2 + x_2^2 + \dots + x_n^2}
$$

### **Example**  
If $ x = (3, -4) $, then:  

$$
\|x\|_2 = \sqrt{3^2 + (-4)^2} = \sqrt{9 + 16} = \sqrt{25} = 5
$$

### **Properties**  
✅ Used in Ridge regression to prevent large coefficients.  
✅ Penalizes large values more than L1 norm.  
✅ Sensitive to outliers.  

---

## **Comparison**

| Feature | L1 Norm (Manhattan) | L2 Norm (Euclidean) |
|---------|------------------|----------------|
| Formula | $ |x_1| + |x_2| + ... + |x_n| $ | $ \sqrt{x_1^2 + x_2^2 + ... + x_n^2} $ |
| Effect | Encourages sparsity | Shrinks all coefficients |
| Use Case | Lasso regression, feature selection | Ridge regression, stability |
| Sensitivity | Less sensitive to outliers | More sensitive to outliers |

---

## **Why is RMSE More Sensitive to Outliers Than MAE?**  

Root Mean Squared Error (**RMSE**) and Mean Absolute Error (**MAE**) are both commonly used metrics for evaluating the performance of regression models, but **RMSE is more sensitive to outliers**. Here’s why:

### **1. RMSE vs. MAE Formulas**  

- **MAE (Mean Absolute Error):**  
  $$
  MAE = \frac{1}{n} \sum_{i=1}^{n} | y_i - \hat{y}_i |
  $$
  - **Takes absolute errors** and averages them.  
  - **Weights all errors equally** (linear impact).  

- **RMSE (Root Mean Squared Error):**  
  $$
  RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
  $$
  - **Squares the errors before averaging** and then takes the square root.  
  - **Larger errors have a disproportionately higher impact** due to squaring.  

### **2. The Effect of Outliers**  
👉 **Squaring in RMSE amplifies large errors.**  

Let’s say we have two prediction errors: **2 and 10**.  
- **MAE Calculation:**  
  $$
  MAE = \frac{|2| + |10|}{2} = \frac{12}{2} = 6
  $$
- **RMSE Calculation:**  
  $$
  RMSE = \sqrt{\frac{(2^2) + (10^2)}{2}} = \sqrt{\frac{4 + 100}{2}} = \sqrt{52} \approx 7.21
  $$

### **Key Observation**  
- A large error (10) increases **MAE by a small amount**.  
- But it **drastically increases RMSE** due to squaring.  

Thus, **RMSE penalises large errors more than MAE**, making it more sensitive to outliers.

---

## **3. When to Use MAE vs. RMSE**

| Metric | Sensitivity | Use Case |
|--------|------------|----------|
| **MAE** | Less sensitive to outliers | When you want a simple, interpretable error measure (e.g., median-like behaviour) |
| **RMSE** | More sensitive to outliers | When large errors should be penalised more (e.g., in applications like financial forecasting) |

---

## **Python Code to Compare RMSE and MAE**
```python
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Actual vs Predicted values (with an outlier)
y_true = np.array([3, 5, 7, 9, 100])  # 100 is an outlier
y_pred = np.array([2, 5, 6, 8, 10])

# Compute MAE and RMSE
mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))

print(f"MAE: {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
```

### **Expected Output**
```
MAE: 17.20
RMSE: 40.62
```

### **Key Observations**
- **MAE** remains relatively low because it treats all errors **equally**.  
- **RMSE** is much **higher** because it **squares** the error of the outlier (**100 vs. 10**) before averaging.  


## Machine Learning Project Checklist

1. Frame the problem and look at the big picture
2. Get the data
3. Explore the data to gain insights
4. Prepare the data to better expose the underlying data patterns to machine learning algorithms.
5. Explore many diffferent models and shortlist the best ones.
6. Fine-tune your models and combine them into great solution.
7. Present your solution
8. Launch, monitor, and maintain your system.
   