# 🔧 Workflow of Regularized Regression (Lasso, Ridge, Elastic Net)

### **Step 1: Define the Linear Regression Model**

We start with the standard linear regression equation:

$$
\hat{y} = X\beta + \epsilon
$$

where:

* $X$ = input features
* $\beta$ = coefficients
* $\epsilon$ = error term

---

### **Step 2: Define the Cost Function**

* Standard regression uses **Mean Squared Error (MSE):**

$$
J(\beta) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2
$$

* Regularization adds a **penalty term** to control complexity:

1. **Ridge (L2 Regularization):**

   $$
   J(\beta) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p \beta_j^2
   $$

   * Shrinks coefficients but never makes them exactly zero.
   * Helps with **multicollinearity**.

2. **Lasso (L1 Regularization):**

   $$
   J(\beta) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p |\beta_j|
   $$

   * Can shrink some coefficients exactly to **zero** → feature selection.

3. **Elastic Net (Combination of L1 & L2):**

   $$
   J(\beta) = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \Big(\alpha \sum_{j=1}^p |\beta_j| + (1-\alpha) \sum_{j=1}^p \beta_j^2 \Big)
   $$

   * Balances Ridge and Lasso.
   * Good for **high-dimensional datasets (p >> n)**.

---

### **Step 3: Choose Hyperparameters**

* **$\lambda$ (regularization strength)**: Controls penalty size.
* **$\alpha$ (for Elastic Net only)**: Balances L1 vs L2 penalty.

---

### **Step 4: Optimization**

* Use **Gradient Descent** (or specialized solvers like Coordinate Descent for Lasso).
* Iteratively update coefficients:

$$
\beta_j \leftarrow \beta_j - \eta \cdot \frac{\partial J}{\partial \beta_j}
$$

where $\eta$ is the learning rate.

---

### **Step 5: Model Training**

* Fit model on training data.
* Coefficients shrink depending on the regularization.

  * Ridge → small but nonzero.
  * Lasso → some zero.
  * Elastic Net → mix.

---

### **Step 6: Model Validation (Cross-Validation)**

* Use **k-Fold CV** to tune $\lambda$ (and $\alpha$ for Elastic Net).
* Select the value that minimizes validation error.

---

### **Step 7: Prediction**

* Use the final model to make predictions:

$$
\hat{y}_{test} = X_{test}\beta
$$

---

# ⚡ Summary Workflow

1. Define regression model.
2. Add regularization term (L1, L2, or both).
3. Choose hyperparameters ($\lambda$, $\alpha$).
4. Optimize using gradient descent/coordinate descent.
5. Train model → shrink/zero coefficients.
6. Validate via CV and tune parameters.
7. Predict on new data.
