# **Hyperparameter Tuning**:

* Hyper parameter tuninig is the process of `finding the best hyperparameters` for a model. 

* Hyperparameters are the `parameters that are not learned by the model`.

* They are set before the training process begins.

* In this notebook, I will show you how to tune the hyperparameters of a model using the GridSearchCV and RandomizedSearchCV method. I will use the Random Forest Classifier model for this purpose.

## **Types of Hyperparameters**:
1. **Model Hyperparameters**: These are the hyperparameters that are `specific to the model` that we are using. For example, the number of trees in a Random Forest model.
2. **Algorithm Hyperparameters**: These are the hyperparameters that are `specific to the algorithm` that we are using. For example, the learning rate in the Gradient Boosting algorithm.
3. **Optimization Hyperparameters**: These are the hyperparameters that are `specific to the optimization algorithm` that we are using. For example, the batch size in the Stochastic Gradient Descent algorithm.


# Types:

1. **Grid Search**: Grid search is the process of `searching for the best hyperparameters` by manually specifying the hyperparameters and their values.
   
2. **Random Search**: Random search is the process of searching for the best hyperparameters by `randomly sampling the hyperparameters` and their values.

3. **Bayesian Optimization**: Bayesian optimization is the process of searching for the best hyperparameters by `building a probabilistic model` of the objective function and using it to select the next hyperparameters to evaluate.

4. **GradientBased Optimization**: Gradient-based optimization is the process of searching for the best hyperparameters by `computing the gradient of the objective function` with respect to the hyperparameters and using it to update the hyperparameters.

In [1]:
# import the libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score

In [2]:
# load the dataset from sklearn.datasets ('iris'):

from sklearn.datasets import load_iris
iris = load_iris()
iris.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [3]:
# use a for loop to show all the keys in the dataset:
for i in iris.keys():
    print(i)

data
target
frame
target_names
DESCR
feature_names
filename
data_module


In [4]:
print(f"The feature names of iris dataset are: \n ", iris.feature_names)

print(f"The target names of iris dataset are: \n ", iris.target_names)

The feature names of iris dataset are: 
  ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
The target names of iris dataset are: 
  ['setosa' 'versicolor' 'virginica']


In [5]:
# Specify the data and the target:

X = iris.data
y = iris.target


In [6]:
# call the model:

model = RandomForestClassifier()

# create a dictionary of parameters:

param_grid = {'n_estimators': [10, 100, 200, 300, 400, 500],
                'criterion': ['gini', 'entropy'],
                'max_depth': [None, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
                'max_features': ['auto', 'sqrt', 'log2']}

# set up the GridSearchCV:

grid = GridSearchCV(estimator=model , 
                    param_grid= param_grid, 
                    cv=5,
                    scoring='accuracy',
                    verbose=1,
                    n_jobs=-1)

# fit the model:

grid.fit(X, y)

# print the best prarameters:

print(f"The best parameters are: {grid.best_params_}")

Fitting 5 folds for each of 396 candidates, totalling 1980 fits


660 fits failed out of a total of 1980.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
406 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\model_selection\_validation.py", line 888, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\base.py", line 1466, in wrapper
    estimator._validate_params()
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\Muhammad Faizan\.conda\envs\pyt

The best parameters are: {'criterion': 'gini', 'max_depth': None, 'max_features': 'sqrt', 'n_estimators': 400}


In [7]:
# now lets try the Random Search CV:

random = RandomizedSearchCV(estimator=model,
                            param_distributions = param_grid,
                            cv=5,
                            n_iter=10,
                            scoring='accuracy',
                            verbose=1,
                            n_jobs=-1)

# fit the model:

random.fit(X, y)

# print the best parameters:
print(f"The best parameters are: {random.best_params_}")


Fitting 5 folds for each of 10 candidates, totalling 50 fits


20 fits failed out of a total of 50.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
16 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\model_selection\_validation.py", line 888, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\base.py", line 1466, in wrapper
    estimator._validate_params()
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\Muhammad Faizan\.conda\envs\python_

The best parameters are: {'n_estimators': 500, 'max_features': 'log2', 'max_depth': 20, 'criterion': 'entropy'}


---

* In this notebook I practiced the concepts of HyperParameter Tuning using GridSearchCV and RandomizedSearchCV.
* These are very useful tools to find the best parameters for the model.
* I used the RandomForestClassifier as an example, but these tools can be used with any model.
* I used the Iris dataset from sklearn.datasets.
* I hope this notebook is useful for you.

---

# About Me:

<img src="https://scontent.flhe6-1.fna.fbcdn.net/v/t39.30808-6/449152277_18043153459857839_8752993961510467418_n.jpg?_nc_cat=108&ccb=1-7&_nc_sid=127cfc&_nc_ohc=6slHzGIxf0EQ7kNvgEeodY9&_nc_ht=scontent.flhe6-1.fna&oh=00_AYCiVUtssn2d_rREDU_FoRbXvszHQImqOjfNEiVq94lfBA&oe=66861B78" width="30%">

**Muhammd Faizan**

3rd Year BS Computer Science student at University of Agriculture, Faisalabad.\
Contact me for queries/collabs/correction

[Kaggle](https://www.kaggle.com/faizanyousafonly/)\
[Linkedin](https://www.linkedin.com/in/mrfaizanyousaf/)\
[GitHub](https://github.com/faizan-yousaf/)\
[Email] faizan6t45@gmail.com or faizanyousaf815@gmail.com \
[Phone/WhatsApp]() +923065375389