<a href="https://colab.research.google.com/github/kalyanram0542/mypython/blob/master/Hyperparameters_Tuning_Machine_Learning_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tuning Hyperparameters of Machine Learning Model

## Make synthetic dataset

Generate the dataset

In [21]:
!pip install sklearn plotly

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
from sklearn.datasets import make_classification

X, Y = make_classification(n_samples=200, n_classes=2, n_features=10, n_redundant=0, random_state=1)

In [3]:
X.shape, Y.shape

((200, 10), (200,))

## Data split (80/20 ratio)

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

In [5]:
X_train.shape, Y_train.shape

((160, 10), (160,))

In [6]:
X_test.shape, Y_test.shape

((40, 10), (40,))

### Building a simple machine learning model using Random Forest

In [7]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

rf = RandomForestClassifier(max_features=5, n_estimators=100)

In [8]:
rf.fit(X_train, Y_train)

In [9]:
rf.score(X_test,Y_test)

0.85

In [10]:
Y_pred=rf.predict(X_test)

In [11]:
accuracy_score(Y_pred,Y_test)

0.85

In [12]:
Y_pred, Y_test

(array([0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0,
        1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0]),
 array([0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0,
        1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0]))

### Hyperparameter Tuning


Now we will be performing the tuning of hyperparameters of Random forest model. The hyperparameters that we will tune includes **max_features** and the **n_estimators**.

Note: Some codes modified from scikit-learn

Firstly, we will import the necessary modules.

The **GridSearchCV()** function from scikit-learn will be used to perform the hyperparameter tuning. Particularly, GridSearchCV() function can perform the typical functions of a classifier such as ***fit, score*** and ***predict*** as well as ***predict_proba, decision_function, transform*** and ***inverse_transform***.

Secondly, we define variables that are necessary input to the GridSearchCV() function.

In [13]:
from sklearn.model_selection import GridSearchCV
import numpy as np

max_features_range = np.arange(1,6,1)
n_estimators_range = np.arange(10,210,10)
param_grid = dict(max_features=max_features_range, n_estimators=n_estimators_range)

rf = RandomForestClassifier()

grid = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5)

In [14]:
grid.fit(X_train, Y_train)

In [15]:
print("The best parameters are %s with a score of %0.2f"
      % (grid.best_params_, grid.best_score_))

The best parameters are {'max_features': 1, 'n_estimators': 60} with a score of 0.91


### Dataframe of Grid search parameters and their Accuracy scores

In [16]:
import pandas as pd

grid_results = pd.concat([pd.DataFrame(grid.cv_results_["params"]),pd.DataFrame(grid.cv_results_["mean_test_score"], columns=["Accuracy"])],axis=1)
grid_results.head()

Unnamed: 0,max_features,n_estimators,Accuracy
0,1,10,0.80625
1,1,20,0.84375
2,1,30,0.84375
3,1,40,0.8875
4,1,50,0.85


### Preparing data for making contour plots

In [17]:
grid_contour = grid_results.groupby(['max_features','n_estimators']).mean()
grid_contour

Unnamed: 0_level_0,Unnamed: 1_level_0,Accuracy
max_features,n_estimators,Unnamed: 2_level_1
1,10,0.80625
1,20,0.84375
1,30,0.84375
1,40,0.88750
1,50,0.85000
...,...,...
5,160,0.89375
5,170,0.88750
5,180,0.89375
5,190,0.88750


### Pivoting the data

In [19]:
grid_reset = grid_contour.reset_index()
grid_reset.columns = ['max_features', 'n_estimators', 'Accuracy']
grid_pivot = grid_reset.pivot('max_features', 'n_estimators')
grid_pivot

Unnamed: 0_level_0,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy,Accuracy
n_estimators,10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200
max_features,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2
1,0.80625,0.84375,0.84375,0.8875,0.85,0.90625,0.86875,0.8625,0.8875,0.8875,0.875,0.89375,0.8875,0.89375,0.875,0.90625,0.86875,0.8875,0.8875,0.8875
2,0.88125,0.8625,0.88125,0.88125,0.875,0.88125,0.89375,0.88125,0.88125,0.9,0.89375,0.8875,0.8875,0.89375,0.88125,0.89375,0.875,0.89375,0.88125,0.8875
3,0.8625,0.89375,0.875,0.90625,0.9,0.8875,0.8875,0.8875,0.8875,0.89375,0.8875,0.9,0.89375,0.9,0.89375,0.8875,0.8875,0.89375,0.8875,0.8875
4,0.86875,0.875,0.89375,0.9,0.8875,0.9,0.89375,0.8875,0.9,0.89375,0.89375,0.89375,0.89375,0.8875,0.8875,0.89375,0.89375,0.8875,0.89375,0.8875
5,0.8875,0.9,0.8875,0.8875,0.8875,0.8875,0.89375,0.89375,0.8875,0.89375,0.89375,0.8875,0.8875,0.89375,0.89375,0.89375,0.8875,0.89375,0.8875,0.89375


In [20]:
x = grid_pivot.columns.levels[1].values
y = grid_pivot.index.values
z = grid_pivot.values

## 2D Contour Plot

In [22]:
import plotly.graph_objects as go

# X and Y axes labels
layout = go.Layout(
            xaxis=go.layout.XAxis(
              title=go.layout.xaxis.Title(
              text='n_estimators')
             ),
             yaxis=go.layout.YAxis(
              title=go.layout.yaxis.Title(
              text='max_features') 
            ) )

fig = go.Figure(data = [go.Contour(z=z, x=x, y=y)], layout=layout )

fig.update_layout(title='Hyperparameter tuning', autosize=False,
                  width=500, height=500,
                  margin=dict(l=65, r=50, b=65, t=90))

fig.show()

## 3D Surface Plot

In [23]:
import plotly.graph_objects as go


fig = go.Figure(data= [go.Surface(z=z, y=y, x=x)], layout=layout )
fig.update_layout(title='Hyperparameter tuning',
                  scene = dict(
                    xaxis_title='n_estimators',
                    yaxis_title='max_features',
                    zaxis_title='Accuracy'),
                  autosize=False,
                  width=800, height=800,
                  margin=dict(l=65, r=50, b=65, t=90))
fig.show()