## Hyperparameter Tuning

Grid Search and Randomized Search are techniques used for hyperparameter tuning in machine learning. They help you find the best combination of hyperparameters for your model by systematically searching through a predefined set of hyperparameters.

**Grid Search**:

1. **What is Grid Search?** Grid Search is a hyperparameter optimization technique that exhaustively searches all possible combinations of hyperparameter values within a predefined grid.

2. **How does it work?** You specify a set of hyperparameters and their possible values in a grid or list. Grid Search then trains and evaluates the model using each combination of hyperparameters through cross-validation.

3. **Pros**:
   - Guarantees that you will find the best combination of hyperparameters within the search space.
   - Provides a systematic and structured approach to hyperparameter tuning.

4. **Cons**:
   - Can be computationally expensive when the search space is large.
   - May not be suitable for datasets with a large number of features or complex models.

5. **Example**:
   ```python
   param_grid = {
       'n_estimators': [50, 100, 150],
       'max_depth': [None, 10, 20, 30],
       'min_samples_split': [2, 5, 10],
       'min_samples_leaf': [1, 2, 4]
   }
   grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy')
   grid_search.fit(X_train, y_train)
   ```

**Randomized Search**:

1. **What is Randomized Search?** Randomized Search is a hyperparameter optimization technique that randomly samples a specified number of combinations of hyperparameters from a predefined distribution.

2. **How does it work?** Instead of exploring all possible combinations like Grid Search, Randomized Search randomly selects combinations from the defined distribution. It allows you to explore a larger search space more efficiently.

3. **Pros**:
   - More computationally efficient than Grid Search, especially for large search spaces.
   - Provides a good balance between exploration and exploitation of hyperparameters.

4. **Cons**:
   - It's not guaranteed to find the best hyperparameters but often finds good ones in a shorter time.

5. **Example**:
   ```python
   param_dist = {
       'n_estimators': np.arange(50, 151, 10),
       'max_depth': [None] + list(np.arange(10, 31, 5)),
       'min_samples_split': [2, 5, 10],
       'min_samples_leaf': [1, 2, 4]
   }
   random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=100, cv=5, scoring='accuracy')
   random_search.fit(X_train, y_train)
   ```

In summary, Grid Search explores all combinations of hyperparameters, while Randomized Search randomly samples combinations. Grid Search guarantees finding the best combination but may be computationally expensive. Randomized Search is more efficient and often finds good combinations quickly. The choice between them depends on the computational resources available and the complexity of your hyperparameter search space.

## Scikit-Learn built-in functions for hyperparameter tuning:

`RandomizedSearchCV` and `GridSearchCV` are hyperparameter tuning techniques provided by scikit-learn for optimizing machine learning models. They help you systematically search through different combinations of hyperparameters to find the best set of hyperparameters for your model. Here, I'll explain these two techniques and the meaning of their parameters:

**GridSearchCV**:

- **What it is**: GridSearchCV is a technique that performs an exhaustive search over a specified hyperparameter grid, trying all possible combinations.

- **Parameters**:
   - `estimator`: The machine learning model for which you want to tune hyperparameters.
   - `param_grid`: A dictionary or list of dictionaries specifying the hyperparameter grid to search over. Each key in the dictionary represents a hyperparameter, and the corresponding value is a list of possible values for that hyperparameter.
   - `scoring`: The scoring metric used to evaluate the model's performance.
   - `cv`: The number of cross-validation folds to use during hyperparameter tuning.
   - `n_jobs`: The number of CPU cores to use for parallelization (set to -1 to use all available cores).
   - `verbose`: Controls the verbosity of the output (higher values provide more detailed output).
   - `return_train_score`: Whether to include training scores in the results.

**RandomizedSearchCV**:

- **What it is**: RandomizedSearchCV is a technique that performs a randomized search over a specified hyperparameter distribution, randomly sampling a specified number of combinations.

- **Parameters**:
   - `estimator`: The machine learning model for which you want to tune hyperparameters.
   - `param_distributions`: A dictionary specifying the hyperparameter distributions to sample from. Each key represents a hyperparameter, and the corresponding value is a distribution from which values will be sampled.
   - `n_iter`: The number of random parameter combinations to try.
   - `scoring`: The scoring metric used to evaluate the model's performance.
   - `cv`: The number of cross-validation folds to use during hyperparameter tuning.
   - `n_jobs`: The number of CPU cores to use for parallelization (set to -1 to use all available cores).
   - `verbose`: Controls the verbosity of the output (higher values provide more detailed output).
   - `return_train_score`: Whether to include training scores in the results.

**Meaning of Parameters**:

1. `estimator`: This is the machine learning model that you want to optimize, such as a classifier or regressor.

2. `param_grid` (in GridSearchCV) and `param_distributions` (in RandomizedSearchCV): These parameters specify the hyperparameter search space. You define a dictionary where each key is a hyperparameter name, and the associated value is a list of possible values (GridSearchCV) or a probability distribution (RandomizedSearchCV) from which values will be sampled.

3. `scoring`: This parameter determines the evaluation metric used to assess the model's performance during hyperparameter tuning. Common choices include accuracy, mean squared error (MSE), and more, depending on the problem.

4. `cv`: It specifies the number of cross-validation folds to use during the hyperparameter search. Cross-validation helps estimate the model's performance on unseen data.

5. `n_jobs`: This parameter controls parallelization. You can set it to -1 to utilize all available CPU cores for faster hyperparameter search.

6. `verbose`: This parameter controls the verbosity of the output during the hyperparameter search. Higher values provide more detailed information about the search process.

7. `return_train_score`: When set to `True`, it includes training scores in the results, providing information on how well the model fits the training data for each hyperparameter combination.

These parameters allow you to tailor the hyperparameter search process to your specific problem and computational resources. GridSearchCV performs an exhaustive search over a specified grid, while RandomizedSearchCV samples randomly from specified distributions. The choice between them depends on the search space size and available resources.

In [2]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.linear_model import Ridge
import numpy as np

# Load the diabetes dataset
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [4]:

# Define the alpha values for Ridge Regression
# alphas = [1.e-04 1.e-03 1.e-02 1.e-01 1.e+00 1.e+01 1.e+02 1.e+03 1.e+04]
alphas = np.logspace(-4, 4, 9)


In [7]:

# Create the Ridge Regressor model
ridge_model = Ridge()

# Grid Search for Ridge Regression
param_grid = {'alpha': alphas}
grid_search = GridSearchCV(estimator=ridge_model, param_grid=param_grid,
                           cv=5, scoring='neg_mean_squared_error', verbose=1, n_jobs=-1)
grid_search.fit(X_train, y_train)


Fitting 5 folds for each of 9 candidates, totalling 45 fits


# Randomized Search for Ridge Regression


In [8]:
# Randomized Search for Ridge Regression
param_dist = {'alpha': np.random.uniform(1e-4, 1e4, 100)}
random_search = RandomizedSearchCV(estimator=ridge_model, param_distributions=param_dist,
                                   n_iter=100, cv=5, scoring='neg_mean_squared_error',
                                   verbose=1, random_state=42, n_jobs=-1)
random_search.fit(X_train, y_train)


Fitting 5 folds for each of 100 candidates, totalling 500 fits


In [9]:
# Get the best hyperparameters and models
best_params_grid = grid_search.best_params_
best_model_grid = grid_search.best_estimator_
best_params_random = random_search.best_params_
best_model_random = random_search.best_estimator_

# Evaluate the best models on the test data
y_pred_grid = best_model_grid.predict(X_test)
y_pred_random = best_model_random.predict(X_test)

# Calculate and print the performance metrics (e.g., RMSE, R-squared)
from sklearn.metrics import mean_squared_error, r2_score
rmse_grid = mean_squared_error(y_test, y_pred_grid, squared=False)
r2_grid = r2_score(y_test, y_pred_grid)
rmse_random = mean_squared_error(y_test, y_pred_random, squared=False)
r2_random = r2_score(y_test, y_pred_random)

print("Ridge Regression - Grid Search:")
print("Best Alpha:", best_params_grid['alpha'])
print("Root Mean Squared Error (RMSE):", rmse_grid)
print("R-squared (R2):", r2_grid)

print("\nRidge Regression - Randomized Search:")
print("Best Alpha:", best_params_random['alpha'])
print("Root Mean Squared Error (RMSE):", rmse_random)
print("R-squared (R2):", r2_random)


Ridge Regression - Grid Search:
Best Alpha: 0.1
Root Mean Squared Error (RMSE): 53.446111997699646
R-squared (R2): 0.46085219464119265

Ridge Regression - Randomized Search:
Best Alpha: 92.00346330034739
Root Mean Squared Error (RMSE): 72.2707813049019
R-squared (R2): 0.014172096546057777
