# <b><p style="background-color: #ff6200; font-family:calibri; color:white; font-size:100%; font-family:Verdana; text-align:center; border-radius:15px 50px;">Task 29-> Hyperparameter Tuning Techniques</p>

# Hyperparameter Tunning
Hyperparameter tuning is the process of optimizing the hyperparameters of a machine learning model to improve its performance. Hyperparameters are configuration settings used to control the training process, such as learning rate, batch size, and the number of layers in a neural network. Unlike model parameters, which are learned during training, hyperparameters must be set before the learning process begins. The goal of hyperparameter tuning is to find the best combination of hyperparameters that result in the highest model accuracy or lowest error rate. This process can be computationally intensive and may involve techniques such as grid search, random search, or more sophisticated methods like Bayesian optimization and evolutionary algorithms. Effective hyperparameter tuning can significantly enhance the predictive power of a model, making it more accurate and reliable. It is a crucial step in the machine learning pipeline, particularly for complex models and large datasets.

## Tasks:
1. [Grid Search](#1)
    -  [Regression Model](#01)
    -  [Classification Model](#11)
2. [Random Search](#2)
    -  [Regression Model](#02)
    -  [Classification Model](#12)
3. [Bayesian Optimization](#3)
    -  [Regression Model](#03)
    -  [Classification Model](#13)

In [1]:
# pip install scikit-optimize

## <span style='color:#ff6200'> Importing Libraries</span>

In [2]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline

from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer

from sklearn.metrics import mean_squared_error, r2_score
from sklearn.metrics import classification_report

import warnings
warnings.filterwarnings('ignore')

## <span style='color:#ff6200'> Load and Process Regression Dataset</span>

In [3]:
regression_df = pd.read_csv("Student_Performance.csv")
regression_df['Extracurricular Activities'] = regression_df['Extracurricular Activities'].map({'No': 0, 'Yes': 1})
regression_df.head()

Unnamed: 0,Hours Studied,Previous Scores,Extracurricular Activities,Sleep Hours,Sample Question Papers Practiced,Performance Index
0,7,99,1,9,1,91.0
1,4,82,0,4,2,65.0
2,8,51,1,7,2,45.0
3,5,52,1,5,2,36.0
4,7,75,0,8,5,66.0


In [4]:
X_r = regression_df.drop('Performance Index',axis=1)
y_r = regression_df['Performance Index']

X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(X_r, y_r, test_size=0.2)
X_train_r.shape, X_test_r.shape, y_train_r.shape, y_test_r.shape

((8000, 5), (2000, 5), (8000,), (2000,))

## <span style='color:#ff6200'> Load and Process Classification Dataset</span>

In [5]:
classification_df = pd.read_csv("glass.csv")
classification_df.head()

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Type
0,1.52101,13.64,4.49,1.1,71.78,0.06,8.75,0.0,0.0,1
1,1.51761,13.89,3.6,1.36,72.73,0.48,7.83,0.0,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.0,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.0,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.0,0.0,1


In [6]:
X_c = classification_df.drop('Type',axis=1)
y_c = classification_df['Type']

X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(X_c, y_c, test_size=0.2)
X_train_c.shape, X_test_c.shape, y_train_c.shape, y_test_c.shape

((171, 9), (43, 9), (171,), (43,))

## Some Hyperparameter Tuning Techniques
- Grid Search
- Random Search
- Bayesian Optimization

# <b><span style='color:#ff6200'> Grid Search</span>

Grid Search involves an exhaustive search over a specified parameter grid. It systematically works through multiple combinations of hyperparameters, cross-validating as it goes to determine which set of values provides the best model performance.
## Procedure:

- Define the range of values for each hyperparameter.
- Create a grid of all possible combinations.
- Train and evaluate the model for each combination using cross-validation.
- Select the combination with the best performance.
## Pros:

- Simple to understand and implement.
- Guarantees finding the best combination of parameters within the grid.
## Cons:

- Computationally expensive, especially when dealing with large datasets and many hyperparameters.
- Inefficient as it does not use any information from previous evaluations to decide the next set of parameters to try.

## <span style='color:#fcc36d'> For Regression </span>

In [7]:
simple_model = DecisionTreeRegressor()
simple_model.fit(X_train_r, y_train_r)

In [8]:
y_pred_simple = simple_model.predict(X_test_r)
mse_simple = mean_squared_error(y_test_r, y_pred_simple)
r2_simple = r2_score(y_test_r, y_pred_simple)
print(f"Simple Decision Tree - MSE: {mse_simple}, R2: {r2_simple}")

Simple Decision Tree - MSE: 8.595402777777778, R2: 0.976647045730854


In [9]:
param_grid = {
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': [None, 'auto', 'sqrt', 'log2']
}

grid_search = GridSearchCV(DecisionTreeRegressor(), param_grid, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(X_train_r, y_train_r)

best_params = grid_search.best_params_
print("Best Parameters for Decision Tree:", best_params)

best_model = grid_search.best_estimator_
y_pred_test = best_model.predict(X_test_r)

mse_test = mean_squared_error(y_test_r, y_pred_test)
r2_test = r2_score(y_test_r, y_pred_test)

print(f"\nBest Decision Tree With Grid Search - MSE: {mse_test}, R2: {r2_test}")

Best Parameters for Decision Tree: {'max_depth': 10, 'max_features': None, 'min_samples_leaf': 4, 'min_samples_split': 10}

Best Decision Tree With Grid Search - MSE: 5.802518915814977, R2: 0.9842350658380764


## <span style='color:#fcc36d'> For Classification </span>

In [10]:
dt = DecisionTreeClassifier()

param_grid = {
    'criterion': ['gini', 'entropy'],
    'splitter': ['best', 'random'],
    'max_depth': [None, 10, 20, 30, 40, 50],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': [None, 'auto', 'sqrt', 'log2']
}

grid_search = GridSearchCV(estimator=dt, param_grid=param_grid, cv=5)

grid_search.fit(X_train_c, y_train_c)

print("\nBest parameters found: ", grid_search.best_params_)

best_dt = grid_search.best_estimator_
y_pred = best_dt.predict(X_test_c)

print(classification_report(y_test_c, y_pred))


Best parameters found:  {'criterion': 'gini', 'max_depth': 10, 'max_features': 'log2', 'min_samples_leaf': 1, 'min_samples_split': 2, 'splitter': 'best'}
              precision    recall  f1-score   support

           1       0.67      0.50      0.57        16
           2       0.64      0.54      0.58        13
           3       0.42      1.00      0.59         5
           5       0.50      1.00      0.67         1
           6       0.00      0.00      0.00         1
           7       1.00      0.86      0.92         7

    accuracy                           0.63        43
   macro avg       0.54      0.65      0.56        43
weighted avg       0.66      0.63      0.62        43



# <b><span style='color:#ff6200'> Random Search</span>

Random Search is another hyperparameter optimization technique where, instead of exhaustively searching all possible combinations of hyperparameters, it randomly samples a specified number of combinations. This can be more efficient than Grid Search, especially when the hyperparameter space is large.

## Procedure:
- Define the range of values for each hyperparameter.
- Specify the number of random combinations to sample.
- Randomly sample combinations of hyperparameters.
- Train and evaluate the model for each random combination using cross-validation.
- Select the combination with the best performance.
## Pros:
- Less computationally expensive: More efficient than Grid Search, especially for large datasets and many hyperparameters.
Can explore a larger hyperparameter space: Since it samples randomly, it can potentially find better hyperparameters by exploring a wider range.
- Flexible: Does not require a predefined grid, allowing more flexibility in the search space.
## Cons:
- No guarantee of finding the best combination: It may not find the optimal set of hyperparameters since it doesn't exhaustively search all possible combinations.
- Results can vary: Different runs of Random Search can yield different results due to the randomness in sampling.

## <span style='color:#fcc36d'> For Regression </span>

In [11]:
dtr = DecisionTreeRegressor()

param_dist = {
    'criterion': ['mse', 'friedman_mse', 'mae', 'poisson'],
    'splitter': ['best', 'random'],
    'max_depth': [None, 10, 20, 30, 40, 50],
    'max_features': [None, 'auto', 'sqrt', 'log2']
}

random_search = RandomizedSearchCV(estimator=dtr, param_distributions=param_dist, n_iter=100, cv=5, random_state=42)

random_search.fit(X_train_r, y_train_r)

print("\nBest parameters found: ", random_search.best_params_)

best_dtr = random_search.best_estimator_
y_pred = best_dtr.predict(X_test_r)

mse = mean_squared_error(y_test_r, y_pred)
r2 = r2_score(y_test_r, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R^2 Score: {r2}")


Best parameters found:  {'splitter': 'best', 'max_features': None, 'max_depth': 10, 'criterion': 'friedman_mse'}
Mean Squared Error: 6.231533068951505
R^2 Score: 0.9830694720714974


## <span style='color:#fcc36d'> For Classification </span>

In [12]:
dt = DecisionTreeClassifier()

param_dist = {
    'criterion': ['gini', 'entropy'],
    'splitter': ['best', 'random'],
    'max_depth': [None, 10, 20, 30, 40, 50],
    'max_features': [None, 'auto', 'sqrt', 'log2']
}

random_search = RandomizedSearchCV(estimator=dt, param_distributions=param_dist, n_iter=100, cv=5, random_state=42)

random_search.fit(X_train_c, y_train_c)

print("\nBest parameters found: ", random_search.best_params_)

best_dt = random_search.best_estimator_
y_pred = best_dt.predict(X_test_c)

print(classification_report(y_test_c, y_pred))


Best parameters found:  {'splitter': 'best', 'max_features': 'sqrt', 'max_depth': 10, 'criterion': 'entropy'}
              precision    recall  f1-score   support

           1       0.75      0.75      0.75        16
           2       0.90      0.69      0.78        13
           3       0.38      0.60      0.46         5
           5       0.50      1.00      0.67         1
           6       0.50      1.00      0.67         1
           7       1.00      0.71      0.83         7

    accuracy                           0.72        43
   macro avg       0.67      0.79      0.69        43
weighted avg       0.78      0.72      0.74        43



# <b><span style='color:#ff6200'> Bayesian Optimization</span>

Bayesian Optimization is an efficient alternative to Grid Search for hyperparameter tuning. Instead of an exhaustive search, it uses probabilistic models to select the next set of hyperparameters to try based on past evaluations.

## Procedure:

- Define the range of values for each hyperparameter.
- Create a probabilistic model to predict the performance of combinations.
- Use this model to select the next combination of hyperparameters to evaluate.
- Train and evaluate the model with the selected combination.
- Update the probabilistic model based on the evaluation.
- Repeat steps 3-5 until convergence or a stopping criterion is met.
## Pros:

- More efficient than Grid Search, especially with large datasets and many hyperparameters.
- Utilizes information from previous evaluations to make better decisions on the next hyperparameters to try.
## Cons:

- More complex to understand and implement.
- Requires more advanced libraries and methods (e.g., scikit-optimize, hyperopt, Optuna).

## <span style='color:#fcc36d'> For Regression </span>

In [13]:
param_space = {
    'splitter': Categorical(['best', 'random']),
    'max_depth': Integer(1, 50),
    'min_samples_split': Integer(2, 20),
    'min_samples_leaf': Integer(1, 20),
    'max_features': Categorical([None, 'auto', 'sqrt', 'log2'])
}

dt = DecisionTreeRegressor()

opt = BayesSearchCV(estimator=dt, search_spaces=param_space, n_iter=32, cv=5)

opt.fit(X_train_r, y_train_r)

print("Best parameters found: ", opt.best_params_)

best_dt = opt.best_estimator_
y_pred = best_dt.predict(X_test_r)

print("Mean Squared Error: ", mean_squared_error(y_test_r, y_pred))

Best parameters found:  OrderedDict({'max_depth': 29, 'max_features': 'auto', 'min_samples_leaf': 5, 'min_samples_split': 6, 'splitter': 'random'})
Mean Squared Error:  5.6554584903163105


## <span style='color:#fcc36d'> For Classification </span>

In [14]:
param_space = {
    'criterion': Categorical(['gini', 'entropy']),
    'splitter': Categorical(['best', 'random']),
    'max_depth': Integer(1, 50),
    'min_samples_split': Integer(2, 20),
    'min_samples_leaf': Integer(1, 20),
    'max_features': Categorical([None, 'auto', 'sqrt', 'log2'])
}

dt = DecisionTreeClassifier()

opt = BayesSearchCV(estimator=dt, search_spaces=param_space, n_iter=32, cv=5)

opt.fit(X_train_c, y_train_c)

print("Best parameters found: ", opt.best_params_)

best_dt = opt.best_estimator_
y_pred = best_dt.predict(X_test_c)

print(classification_report(y_test_c, y_pred))

Best parameters found:  OrderedDict({'criterion': 'gini', 'max_depth': 50, 'max_features': 'auto', 'min_samples_leaf': 1, 'min_samples_split': 8, 'splitter': 'best'})
              precision    recall  f1-score   support

           1       0.65      0.81      0.72        16
           2       0.86      0.46      0.60        13
           3       0.14      0.20      0.17         5
           5       0.33      1.00      0.50         1
           6       1.00      1.00      1.00         1
           7       1.00      0.71      0.83         7

    accuracy                           0.63        43
   macro avg       0.66      0.70      0.64        43
weighted avg       0.71      0.63      0.64        43

