## **1. What is Lasso Regression?**

**Lasso** stands for **Least Absolute Shrinkage and Selection Operator**.
It’s a **linear regression** technique that uses **L1 regularization** to:

* Reduce overfitting.
* Perform **feature selection** by shrinking some coefficients to exactly **zero**.

---

## **2. The L1 Regularization Formula**

In **ordinary least squares (OLS)**, we minimize:

$$
\text{Loss} = \sum_{i=1}^n (y_i - \hat{y}_i)^2
$$

In **Lasso**, we add a penalty term:

$$
\text{Loss} = \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p |\beta_j|
$$

Where:

* $\lambda$ = regularization parameter (controls penalty strength).
* $|\beta_j|$ = absolute value of coefficient.
* Intercept ($\beta_0$) is usually not penalized.

---

## **3. Effect of L1 Penalty**

* **If λ = 0** → Same as OLS (no regularization).
* **If λ is small** → Slight shrinkage, coefficients reduced but most remain non-zero.
* **If λ is large** → Many coefficients shrink to **exactly zero** (feature elimination).

---

## **4. Why Lasso Can Zero Out Coefficients**

Mathematically, the **absolute value function** creates sharp corners in the cost function's geometry (diamond-shaped constraint), so the optimization often hits exactly zero for some coefficients.
This is **different from Ridge (L2)**, which only shrinks coefficients but never makes them exactly zero.

---

## **5. When to Use Lasso**

✅ When you suspect **many irrelevant features**.
✅ When you want **automatic feature selection**.
✅ When you have **high-dimensional data** (p >> n).

---

## **6. Visual Intuition**

* Think of it as forcing coefficients to live inside a **diamond-shaped boundary**.
* Because of the diamond’s corners, optimization naturally “sticks” some coefficients at zero.

