<div style="color:white;display:fill;border-radius:8px;font-size:200%; letter-spacing:1.0px;"><p style="padding: 5px;color:white;text-align:left;"><b><span style='color:#fc6603'>AUTHOR: SOBIA ALAMGIR</span></b></p></div>

<a id="13"></a>
<h1 style="background-color:#435420;font-family:newtimeroman;font-size:300%;text-align:center;border-radius: 15px 50px;color:#FF9900;">Hyperparameter Tuning</h1>
<figcaption style="text-align: center;">
    <strong>
    </strong>
</figcaption>


**Hyperparameter tuning** is choosing the best settings for a model (like the depth of a tree or the learning rate) to make sure it works as well as possible on new data.

## **Types**:

- `Grid Search:` Tests all possible combinations of specified hyperparameter values to find the best one, like an exhaustive search.
  
- `Random Search:` Randomly samples hyperparameter combinations from a specified range, which is faster for large search spaces and can still find optimal settings effectively.

- `Bayesian Optimization:` Uses probabilistic models (like Gaussian processes) to predict the best hyperparameters by learning from previous trials, aiming to efficiently find the optimal combination.

- `Gradient-Based Optimization:` Uses gradients to adjust hyperparameters (similar to gradient descent for weights) to converge on the best values, typically used in neural networks (e.g., learning rate scheduling).

## **Key Concepts:**

- **Cross-validation** is a method to assess model performance by dividing the dataset into multiple parts, training on some parts, and validating on others. The most common type, **k-fold cross-validation**, splits the data into **k** folds, ensuring each fold is used as validation once to reduce overfitting and provide a reliable evaluation.

## Step-01 Load Libraries using scikit-learn

In [23]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split,GridSearchCV,RandomizedSearchCV
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score,precision_score,f1_score,recall_score, classification_report
from sklearn.datasets import load_iris

## Step-02 Load Data and Target using scikit-learn

In [7]:
iris = load_iris()
X = iris.data
y = iris.target

## Step-03 Hyperparameter tuning using `Grid Search CV`

In [None]:
%%time
# define the model
model = RandomForestClassifier()

# create the parameter Grid
params = {'n_estimators': [50,100,200,300,400,500],
          'max_depth': [4,5,6,7,8,9,10],
          'criterion':['gini','entropy'],
          'bootstrap':[True,False]
          }

# setup the grid with `Grid Search CV`
grid = GridSearchCV(
    estimator=model,
    param_grid=params,
    cv = 5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1
)

# Fit the model
grid.fit(X,y)

# Print the best Parameters
print(f'Best Parameters:{grid.best_params_}')

Fitting 5 folds for each of 168 candidates, totalling 840 fits
Best Parameters:{'bootstrap': True, 'criterion': 'gini', 'max_depth': 4, 'n_estimators': 50}
CPU times: total: 4.8 s
Wall time: 2min 4s


* **Let's save the model `Grid Search CV`**

In [21]:
import joblib
joblib.dump('grid','Decision_tree_classifier_with_GridSearchCV.pkl')
load_model_with_GridSearchCV = joblib.load('Decision_tree_classifier_with_GridSearchCV.pkl')

## Step-04 Hyperparameter Tuning using `RandomizedSearchCV`

In [20]:
%%time
# define the model
model = RandomForestClassifier()

# create the parameter Grid
params = {'n_estimators': [50,100,200,300,400,500],
          'max_depth': [4,5,6,7,8,9,10],
          'criterion':['gini','entropy'],
          'bootstrap':[True,False]
          }

# setup the grid with `Random Search CV`
grid_random = RandomizedSearchCV(
    estimator=model,
    param_distributions=params,
    cv = 5,
    scoring='accuracy',
    verbose=1,
    n_jobs=-1,
    n_iter = 20
)

# Fit the model
grid_random.fit(X,y)

# Print the best Parameters
print(f'Best Parameters:{grid_random.best_params_}')

Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters:{'n_estimators': 100, 'max_depth': 8, 'criterion': 'entropy', 'bootstrap': True}
CPU times: total: 797 ms
Wall time: 14.1 s


**Insights:**

- `Randomized Search CV` is less time taken as compare to `Grid Search CV`
- Randomized Search CV choosed `random` 100 fits, `execute 100 times/combination`.
- Grid Search CV executes 840 times to fit the model, `executes All possible combinations.`
  
  - ***Grid Search CV is more `appropriate` than any other kind of search objectors***

* **Let's save the model `Randomized Search CV`**

In [22]:
import joblib
joblib.dump('grid_random', 'Decision_tree_classier_with randomSearchCV.pkl')
load_model_with_random_search = joblib.load('Decision_tree_classier_with randomSearchCV.pkl')

<a id="13"></a>
<h1 style="background-color:#435420;font-family:newtimeroman;font-size:300%;text-align:center;border-radius: 15px 50px;color:#FF9900;">Thanks For Reading My Notebook!​</h1>
<figcaption style="text-align: center;">
    <strong>
    </strong>
</figcaption>