
# Hyperparameter Tuning 


## Introduction
Hyperparameters are the configuration variables that define how a machine learning model will be trained. They control aspects of the training process, such as the learning
algorithm, the number of epochs, and the learning rate. Hyperparameters are used to optimize the model's performance.

Types:

- Random Search: 
- Grid Search
- Bayesian Optimization
- Gradient-based Opyimization



# Cross validation

Cross-validation is a technique for evaluating ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data. In k-fold cross-validation, you split the input data into k subsets of data (also known as folds). You then iterate over each fold, treating it as a held out validation set, and train a model on the remaining k-1 folds (also known as training data). You then calculate the evaluation metric for each of the models on their respective held out validation sets and combine the result into a single metric.


In [2]:
# import libraries 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split ,GridSearchCV
from sklearn.metrics import accuracy_score , confusion_matrix, classification_report


In [3]:
# load the data 
from sklearn.datasets import load_iris
iris = load_iris()
 
X = iris.data
y = iris.target


In [4]:
# define the model 
model = RandomForestClassifier()

# create the parametter Grid
param_grid = {'n_estimators': [50,100,200,300,400,500], 
              'max_depth': [4,5,6,7,8,9,10],
            #   'max_features' : ['auto', 'sqrt', 'log2'],
              'criterion' : ['gini', 'entropy'],
            #   'bootstrap'  : [True,False]           
              }
# initialize the grid search object
grid = GridSearchCV(
    estimator=model,
    param_grid = param_grid,
    cv=5,
    scoring = 'f1',
    verbose=1,
    n_jobs = -1
)
# fith the model 
grid.fit(X, y)

# print the best score
print(f'Best Parametters:{grid.best_params_}')


Fitting 5 folds for each of 84 candidates, totalling 420 fits


 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan]


Best Parametters:{'criterion': 'gini', 'max_depth': 4, 'n_estimators': 50}


In [5]:

%%time
from sklearn.model_selection import RandomizedSearchCV
model = RandomForestClassifier()
# create the parametter Grid
param_grid = {'n_estimators': [50,100,200,300,400,500], 
              'max_depth': [4,5,6,7,8,9,10],
            #   'max_features' : ['auto', 'sqrt', 'log2'],
              'criterion' : ['gini', 'entropy'],
              'bootstrap'  : [True,False]           
              }
# initialize the grid search object
grid = RandomizedSearchCV(
    estimator=model,
    param_distributions= param_grid,
    cv=5,
    scoring = 'accuracy',
    verbose=1,
    n_jobs = -1,
    n_iter=20
)
# fith the model 
grid.fit(X, y)

# print the best score
print(f'Best Parametters:{grid.best_params_}')


Fitting 5 folds for each of 20 candidates, totalling 100 fits


Best Parametters:{'n_estimators': 200, 'max_depth': 9, 'criterion': 'entropy', 'bootstrap': True}
CPU times: total: 875 ms
Wall time: 37.3 s
