```{contents}
```

# Hyperparameter Tuning in Linear Regression

In **ordinary least squares (OLS) linear regression**, there are actually **no hyperparameters to tune** — the coefficients are directly computed by minimizing the sum of squared errors.

But when we apply **regularization techniques** (Ridge, Lasso, ElasticNet), hyperparameters come into play.

---

## Hyperparameters in Linear Regression Variants

### Ridge Regression (L2 Regularization)

* **Hyperparameter:** `α` (sometimes called λ).
* Controls the penalty on large coefficients.

  * `α = 0` → ordinary least squares (no penalty).
  * Large `α` → coefficients shrink towards zero but never exactly zero.
* Effect: reduces variance, prevents overfitting.

---

### Lasso Regression (L1 Regularization)

* **Hyperparameter:** `α`.
* Penalizes absolute values of coefficients.
* Large `α` → many coefficients become **exactly zero** → feature selection.
* Effect: simpler, more interpretable model.

---

### ElasticNet (Combination of L1 & L2)

* **Hyperparameters:**

  * `α` → overall penalty strength.
  * `l1_ratio` → balance between L1 (Lasso) and L2 (Ridge).

    * `l1_ratio = 0` → pure Ridge.
    * `l1_ratio = 1` → pure Lasso.
    * `0 < l1_ratio < 1` → mixture.

---

## Why Hyperparameter Tuning is Needed?

* If `α` is too small → model behaves like OLS, may **overfit**.
* If `α` is too large → coefficients shrink too much, model may **underfit**.
* Proper tuning finds a balance.

---

## How to Tune Hyperparameters?

We use **Cross-Validation (CV)** to find the best values:

1. **Grid Search CV**

   * Try different values of `α` (and `l1_ratio` for ElasticNet).
   * Example: test `α = [0.01, 0.1, 1, 10, 100]`.
   * Train model on folds, pick the one with best average CV score.

2. **Randomized Search CV**

   * Randomly sample hyperparameters from distributions.
   * More efficient for large search spaces.

3. **Bayesian Optimization** (advanced)

   * Uses past evaluation results to choose next hyperparameter values intelligently.

---

## 🔹 Example (Python, Scikit-learn)

```python
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.model_selection import GridSearchCV

# Example: Ridge
ridge = Ridge()

param_grid = {'alpha': [0.01, 0.1, 1, 10, 100]}

grid_search = GridSearchCV(ridge, param_grid, cv=5, scoring='r2')
grid_search.fit(X_train, y_train)

print("Best alpha:", grid_search.best_params_)
print("Best score:", grid_search.best_score_)
```

---

**Key takeaway:**

* OLS → no hyperparameters.
* Ridge, Lasso, ElasticNet → hyperparameters (`α`, `l1_ratio`).
* Tune them using **cross-validation** to balance bias and variance.

