# Hyperparametere Tuning
Hyperparameter tuning is the process of selecting a combination of parameters that optimizes a model's performance.

Types of Hyperparameter Tuning:
1. `Grid Search:` This is a brute-force approach where all possible combinations of hyperparameters are tried, and the best combination is selected based on a performance metric.
2. `Random Search:` This is a more efficient approach where a random subset of hyperparameters is tried, and the best combination is selected.
3. `Bayesian Optimization:` This is a more efficient approach where a probabilistic model is used to search for the best combination of hyperparameters.
4. `Gradient based Optimization:` This is a more efficient approach where a gradient-based algorithm is used to search for the best combination of hyperparameters.

# Cross-Validation
Cross validation is a process of splitting a dataset into a training set and a test set, and then training a model on the training set and evaluating its performance on the test set.

In [15]:
# import the libraries 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [16]:
# load the data 
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target

In [18]:
# define the model 
model = RandomForestClassifier()

# create the parameters gird
param_grid = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    'max_depth': [4, 5, 6, 7, 8, 9, 10],
    'criterion': ['gini', 'entropy'],
    'bootstrap': [True, False]
}

# set up the gird search or a pipline 
grid = GridSearchCV(
    estimator=model, 
    param_grid=param_grid, cv=5, 
    scoring='accuracy', 
    verbose=1, n_jobs=-1
    )

# fit the model    
grid.fit(X, y)

# print the best parameters
print(grid.best_params_)

Fitting 5 folds for each of 168 candidates, totalling 840 fits
{'bootstrap': True, 'criterion': 'gini', 'max_depth': 4, 'n_estimators': 100}


In [19]:
# set up the random search cv 
gird = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1,
    n_iter=20
)

gird.fit(X, y)

# print the best parameters
print(gird.best_params_)

Fitting 5 folds for each of 20 candidates, totalling 100 fits
{'n_estimators': 500, 'max_depth': 5, 'criterion': 'entropy', 'bootstrap': True}
