```{contents}
```


# Hyperparameter Tuning methods

There are several ways to search for the **best hyperparameters** when tuning machine learning models. Here are the main types:

---

## Types of Hyperparameter Search

### 1. **Manual Search**

* Try parameters by hand based on intuition or domain knowledge.
* Example: test `α = 0.1, 1, 10` for Ridge.
* ✅ Simple, but ❌ inefficient and may miss optimal values.

---

### 2. **Grid Search**

* Define a grid of hyperparameter values.
* Try all combinations exhaustively with **cross-validation**.
* Example:

  ```python
  alpha = [0.01, 0.1, 1, 10, 100]
  l1_ratio = [0.1, 0.5, 0.9]
  ```

  → Tests 5 × 3 = 15 combinations.
* ✅ Systematic, guarantees best within grid.
* ❌ Expensive if grid is large.

---

### 3. **Random Search**

* Instead of testing all values, randomly sample combinations from given distributions.
* Example: `alpha ∼ Uniform(0.001, 100)`
* ✅ More efficient than grid, can cover large spaces.
* ❌ May miss exact optimal if unlucky.

---

### 4. **Bayesian Optimization**

* Uses past evaluation results to model performance as a probability distribution.
* Chooses new hyperparameters that are most promising.
* ✅ Finds optimal faster than grid/random.
* ❌ More complex, needs specialized libraries (`optuna`, `scikit-optimize`, `hyperopt`).

---

### 5. **Gradient-Based Optimization** (advanced)

* Uses gradients of the loss with respect to hyperparameters.
* Works mainly for continuous hyperparameters.
* Rare in practice because many hyperparameters (like `max_depth`) are discrete.

---

### 6. **Evolutionary / Genetic Algorithms**

* Treat hyperparameters like genes.
* Randomly mutate and crossover values across generations.
* ✅ Can escape local optima.
* ❌ Slower, harder to tune.

---

### 7. **Successive Halving / Hyperband**

* Start with many random hyperparameter sets.
* Train each briefly.
* Discard poorly performing ones early, keep only the best for longer training.
* ✅ Efficient, reduces wasted computation.

---

**Summary Table**

| Method                         | Strategy                          | Pros                        | Cons                |
| ------------------------------ | --------------------------------- | --------------------------- | ------------------- |
| Manual Search                  | Trial-and-error                   | Simple                      | Not systematic      |
| Grid Search                    | Exhaustive combinations           | Guaranteed best in grid     | Expensive           |
| Random Search                  | Random sampling                   | Efficient, scalable         | No guarantee        |
| Bayesian Optimization          | Probabilistic model-guided search | Fast convergence            | Complex             |
| Gradient-Based                 | Gradient descent on hyperparams   | Precise for continuous vars | Rarely practical    |
| Evolutionary Algorithms        | Mutation + crossover              | Escapes local optima        | Slow                |
| Hyperband / Successive Halving | Early stopping bad configs        | Saves compute               | Needs careful setup |

---

👉 In practice:

* For **small problems** → Grid Search.
* For **large spaces** → Random Search or Hyperband.
* For **serious optimization** → Bayesian Optimization (e.g., `Optuna`).



In [None]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
from sklearn.metrics import r2_score

# Generate synthetic regression dataset
X, y = make_regression(n_samples=200, n_features=10, noise=15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ---------------- Grid Search ----------------
ridge = Ridge()

param_grid = {'alpha': [0.01, 0.1, 1, 10, 100, 1000]}  # exhaustive list
grid_search = GridSearchCV(ridge, param_grid, cv=5, scoring='r2')
grid_search.fit(X_train, y_train)

print("GridSearchCV best params:", grid_search.best_params_)
print("GridSearchCV best CV score:", grid_search.best_score_)

# Evaluate on test data
y_pred_grid = grid_search.best_estimator_.predict(X_test)
print("GridSearchCV test R2:", r2_score(y_test, y_pred_grid))

# ---------------- Random Search ----------------
param_dist = {'alpha': np.logspace(-3, 3, 100)}  # random sampling from wide range
random_search = RandomizedSearchCV(ridge, param_dist, n_iter=10, cv=5, scoring='r2', random_state=42)
random_search.fit(X_train, y_train)

print("\nRandomizedSearchCV best params:", random_search.best_params_)
print("RandomizedSearchCV best CV score:", random_search.best_score_)

# Evaluate on test data
y_pred_rand = random_search.best_estimator_.predict(X_test)
print("RandomizedSearchCV test R2:", r2_score(y_test, y_pred_rand))


GridSearchCV best params: {'alpha': 0.1}
GridSearchCV best CV score: 0.9907142472562647
GridSearchCV test R2: 0.9934316711441261

RandomizedSearchCV best params: {'alpha': 0.021544346900318846}
RandomizedSearchCV best CV score: 0.9907141636217599
RandomizedSearchCV test R2: 0.9934566226827182


: 