# 📘 Understanding Regression Models

This notebook provides a **mathematical and conceptual understanding** of various regression techniques, including:

1. **Linear Regression**
2. **Polynomial Regression**
3. **Ridge Regression (L2 Regularization)**
4. **Lasso Regression (L1 Regularization)**
5. **Elastic Net Regression (L1 + L2 Regularization)**

Each section explains **the math, when to use the model, and why it's useful**.

---

## 📌 1. Linear Regression

Linear Regression models the relationship between **input features ($X$)** and **a continuous target variable ($y$)** using a straight-line equation:

$$
y = w_1 x_1 + w_2 x_2 + ... + w_n x_n + b
$$

or in matrix form:

$$
\mathbf{y} = \mathbf{X} \mathbf{w} + b
$$

### **Cost Function: Mean Squared Error (MSE)**

To measure how well the model fits the data, we minimize the **Mean Squared Error (MSE):**

$$
MSE = \frac{1}{m} \sum_{i=1}^{m} (y_i - \hat{y}_i)^2
$$

where:
- $y_i$ = actual value
- $\hat{y}_i$ = predicted value
- $m$ = number of samples

### **Gradient Descent Optimization**

To minimize MSE, we update the weights iteratively using **Gradient Descent**:

$$
w := w - \alpha \cdot \frac{\partial MSE}{\partial w}
$$

$$
b := b - \alpha \cdot \frac{\partial MSE}{\partial b}
$$

Alternatively, we can compute the optimal weights using the **Normal Equation**:

$$
\mathbf{w} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}
$$

### **When to Use Linear Regression?**
✅ Use when:
- Data has a **linear relationship** between features and target.
- **No strong multicollinearity** (i.e., features are not highly correlated).
- Interpretability is important.

❌ Avoid when:
- Data is **nonlinear** (use **Polynomial Regression** instead).
- There are **many correlated features** (use **Ridge or Lasso**).

---

## 📌 2. Polynomial Regression

Linear Regression assumes a **straight-line relationship**, but **Polynomial Regression** models **curved relationships** by transforming features:

For a **degree-$d$** polynomial, we create new features:

$$
X_{\text{poly}} = [X, X^2, X^3, ..., X^d]
$$

Then, we apply **Linear Regression**:

$$
y = w_1 x + w_2 x^2 + w_3 x^3 + ... + w_d x^d + b
$$

### **When to Use Polynomial Regression?**
✅ Use when:
- The data shows **a curved trend**.
- You want to model **higher-order interactions**.

❌ Avoid when:
- **Overfitting** occurs (use **regularization** like **Ridge or Elastic Net**).
- High-degree polynomials become unstable.

---

## 📌 3. Ridge Regression (L2 Regularization)

Ridge Regression **modifies Linear Regression** by **adding a penalty** on large weights to **prevent overfitting**:

### **Ridge Cost Function (MSE + L2 Penalty)**

$$
MSE + \alpha \sum_{j=1}^{n} w_j^2
$$

where:
- $\alpha$ is a **regularization parameter** (higher values reduce weights more).

### **Gradient Descent Update Rule**

$$
w := w - \alpha \left( \frac{\partial MSE}{\partial w} + \frac{\lambda}{m} w \right)
$$

### **When to Use Ridge Regression?**
✅ Use when:
- Features are **highly correlated (multicollinearity)**.
- You need a **balance between performance and interpretability**.

❌ Avoid when:
- Some features should be **completely eliminated** (use **Lasso** instead).

---

## 📌 4. Lasso Regression (L1 Regularization)

Lasso Regression **improves Ridge** by **shrinking some weights to zero**, which **performs feature selection**.

### **Lasso Cost Function (MSE + L1 Penalty)**

$$
MSE + \alpha \sum_{j=1}^{n} |w_j|
$$

### **Gradient Descent Update Rule**

$$
w := w - \alpha \left( \frac{\partial MSE}{\partial w} + \frac{\lambda}{m} \cdot \text{sign}(w) \right)
$$

where **sign($w$)** is:
- $+1$ if $w > 0$
- $-1$ if $w < 0$
- $0$ if $w = 0$

### **When to Use Lasso Regression?**
✅ Use when:
- **Feature selection** is important (some features should have **zero** weight).
- You want a **simpler model with fewer variables**.

❌ Avoid when:
- You need **smooth weight decay** (use **Ridge** instead).

---

## 📌 5. Elastic Net Regression (L1 + L2 Regularization)

Elastic Net **combines Ridge and Lasso**:

$$
MSE + \alpha \left( (1 - r) \sum w_j^2 + r \sum |w_j| \right)
$$

where:
- $r$ is the **L1 ratio**:
  - If $r = 1$, Elastic Net = **Lasso**.
  - If $r = 0$, Elastic Net = **Ridge**.

### **When to Use Elastic Net?**
✅ Use when:
- **Lasso selects too few features**.
- **Ridge keeps too many weak features**.
- **Multicollinearity exists**, but you also want **sparse features**.

---

## 📌 6. Choosing the Right Regression Model

| Model | When to Use |
|--------|-----------------------------------------------------|
| **Linear Regression** | If the relationship is **linear**. |
| **Polynomial Regression** | If the relationship is **nonlinear**. |
| **Ridge Regression (L2)** | When **features are correlated**, and you want **smooth weight reduction**. |
| **Lasso Regression (L1)** | When you want **automatic feature selection**. |
| **Elastic Net Regression** | When you need **both feature selection (L1) and weight decay (L2)**. |

---

## 📌 7. Final Thoughts

- **Linear Regression** is simple but **sensitive to multicollinearity**.
- **Polynomial Regression** captures **curved relationships**, but can **overfit**.
- **Ridge Regression (L2)** is great for **high-dimensional data** with **multicollinearity**.
- **Lasso Regression (L1)** performs **feature selection**, making it useful for **sparse models**.
- **Elastic Net** combines **Lasso and Ridge**, balancing both approaches.

Choosing the **right model** depends on your **data structure** and **goals**. 🚀
