# Hyperparameter Tuning

Hyperparameter tuning is the process of selecting the best combination of hyperparameters for a machine learning model.

**Types:**

1. Grid Search: Exhaustive search over a specified grid of hyperparameters.
2. Random Search: Random sampling of hyperparameters from a specified distribution.
3. Bayesian Optimization: Model the objective function  and search for the the maximum.
4. Gradient-based optimization: Use the gradient decent to find the best hyperparameters.

# Cross Validation

Cross -validation is a technique used to evaluate the performance of a model on unseen data. It is used to check how well the model generalizes to new data.

In [12]:
# import libraries
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [13]:
# import data
from sklearn.datasets import load_iris

iris = load_iris()
# X and y
X = iris.data
y = iris.target

In [14]:
%%time
# define the model
from sklearn.model_selection import GridSearchCV
model = RandomForestClassifier()

# create the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    'max_depth': [4, 5, 6, 7, 8, 9, 10],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
}

# set up the grid
grid = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1
    )

# fit the model
grid.fit(X, y)

# print the best parameters
print(f"Best Parameters: {grid.best_params_}")

Fitting 5 folds for each of 168 candidates, totalling 840 fits
Best Parameters: {'bootstrap': True, 'criterion': 'gini', 'max_depth': 4, 'n_estimators': 50}
CPU times: total: 3.25 s
Wall time: 2min 53s


In [15]:
%%time
# define the model

from sklearn.model_selection import RandomizedSearchCV
model = RandomForestClassifier()

# create the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    'max_depth': [4, 5, 6, 7, 8, 9, 10],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
}

# set up the grid
grid = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1,
    n_iter=20
    )

# fit the model
grid.fit(X, y)

# print the best parameters
print(f"Best Parameters: {grid.best_params_}")

Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters: {'n_estimators': 200, 'max_depth': 4, 'criterion': 'gini', 'bootstrap': False}
CPU times: total: 828 ms
Wall time: 21.1 s
