# Lasso Regression (L1 Regularization)

Lasso regression is also called **L1 Regularization**.

**Purpose:**

* Lasso is primarily used for **feature selection**.
* It can reduce coefficients of less important features **exactly to zero**, effectively removing them from the model.

---

## Lasso Regression Cost Function

The cost function for Lasso Regression is:

$$
J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} \Big(h_\theta(x^{(i)}) - y^{(i)} \Big)^2 + \lambda \sum_{j=1}^{n} |\theta_j|
$$

where:

* $\lambda \geq 0$ is the **regularization parameter** (hyperparameter).
* $|\theta_j|$ is the **L1 penalty term** (magnitude of coefficients).
* $\theta_0$ (intercept) is **not penalized**.

---

## Intuition Behind Lasso Regression

* **Lambda ($\lambda$) = 0:** Normal linear regression (no regularization).
* **Increasing $\lambda$:** Coefficients shrink.
* **Some coefficients become exactly 0:**

  * Features with little contribution to the target are **removed automatically**.
  * Strongly correlated features remain with non-zero coefficients.

### Example

Suppose a model with 4 features:

$$
h_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \theta_4 x_4
$$

Initial coefficients (before Lasso):

$$
\theta_0 = 0.52, \quad \theta_1 = 0.65, \quad \theta_2 = 0.72, \quad \theta_3 = 0.34, \quad \theta_4 = 0.12
$$

After applying Lasso (with suitable $\lambda$):

$$
\theta_0 = 0.50, \quad \theta_1 = 0.55, \quad \theta_2 = 0.68, \quad \theta_3 = 0.14, \quad \theta_4 = 0
$$

* $\theta_4 = 0$ â†’ $x_4$ is **removed**.
* Stronger features ($x_1, x_2$) retain larger coefficients.
* This is why Lasso is **useful for automatic feature selection**.

---

# Elastic Net Regression

Elastic Net is a combination of **Ridge (L2)** and **Lasso (L1)**.

**Purpose:**

1. Reduce overfitting (Ridge effect)
2. Perform feature selection (Lasso effect)

---

## Elastic Net Cost Function

The cost function for Elastic Net is:

$$
J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} \Big(h_\theta(x^{(i)}) - y^{(i)} \Big)^2

* \lambda_1 \sum_{j=1}^{n} \theta_j^2 \quad \text{(Ridge term)}
* \lambda_2 \sum_{j=1}^{n} |\theta_j| \quad \text{(Lasso term)}
  $$

where:

* $\lambda_1$ controls **overfitting reduction** (Ridge effect).
* $\lambda_2$ controls **feature selection** (Lasso effect).

---

## Summary of Regression Series

| Regression  | Regularization Type | Main Purpose                           | Effect on Coefficients                        |
| ----------- | ------------------- | -------------------------------------- | --------------------------------------------- |
| Ridge       | L2                  | Reduce overfitting                     | Shrinks coefficients but **never zero**       |
| Lasso       | L1                  | Feature selection                      | Shrinks some coefficients **exactly to zero** |
| Elastic Net | L1 + L2             | Reduce overfitting + feature selection | Combination of both effects                   |

**Key Insight:**

* These methods allow **hyperparameter tuning** of linear regression to improve **generalization** and **interpretability**.
* Interview questions often focus on:

  * Why we use Ridge/Lasso/Elastic Net
  * Relationship between **lambda ($\lambda$)** and **coefficients ($\theta$)**
  * Feature selection capabilities

---


