# Hyperparameter Tuning

Hyperparameter tuning is the process of finding best combination of hyperparameters for a given model.

**Types:**

- Grid search: Exhaustive search over all possible combinations of hyperparameters.
- Random search: Randomly sample combinations of hyperparameters from a givet distribution.
- Bayesian Optimization: Model the objective function and search for the maximum.
- Gradiant-Based Optimization: Use gradient dascent to find the minimum of the objective function.

# Cross Validation

Cross validation is a technique used to evaluate the parformance of a model on unseen data. It is used to check how wall the data model generalize the new data.

In [1]:
# import libraries
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.model_selection import train_test_split, GridSearchCV

In [2]:
# load dataset
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target

In [4]:
# define the model
model = RandomForestClassifier()

# create a parameter grid
parm_grid = {
    'n_estimators': [50,100,200,300,400,500],
    'max_depth': [4,5,6,7,8,9,10]
}

# set up the grid
grid = GridSearchCV(
    estimator=model,
    param_grid=parm_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=1
)

# fit the model
grid.fit(X,y)

# print the best parameters
print(f"Best parameters:{grid.best_params_}")

Fitting 5 folds for each of 42 candidates, totalling 210 fits
Best parameters:{'max_depth': 4, 'n_estimators': 50}


In [6]:
from sklearn.model_selection import RandomizedSearchCV
# define the model
model = RandomForestClassifier()

# create a parameter grid
parm_grid = {
    'n_estimators': [50,100,200,300,400,500],
    'max_depth': [4,5,6,7,8,9,10],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
}

# set up the grid
grid = RandomizedSearchCV(
    estimator=model,
    param_distributions=parm_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=1,
    n_iter=20
)

# fit the model
grid.fit(X,y)

# print the best parameters
print(f"Best parameters:{grid.best_params_}")

Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best parameters:{'n_estimators': 300, 'max_depth': 9, 'criterion': 'entropy', 'bootstrap': True}
