# **GridSearchCV on RandomForestClassifier**

### In this kernel we will be appplying GridSearch for Hyperparameter Tuning for a classifier

>*I will be using RandomForestClassifier but any Classsifier can be used*

In [None]:
import pandas as pd
import seaborn as sns
import numpy as np

from pprint import pprint
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score

In [None]:
data = pd.read_csv('../input/data.csv')
data.head()

### Checking for columns for Nan 

We will check and remove any column that is not required.

In [None]:
data.isnull().sum()

### Dropping Columns

Column : id, Unnamed: 32 will be dropped

id is not required for classification and Unnamed: 32 has Nan

In [None]:
df = data.drop(['Unnamed: 32','id'], axis=1)

print("Final Columns in Dataset")
print('='*50)
print(df.isnull().sum())
print('='*50)

### Spliting the data for train and test

We changed the target column that is diagnosis to binary 0 and 1.

* X -> attributes or features that will help predict out target column diagnosis
* y -> Target column

In [None]:
X = df.iloc[:,1:]
y = np.where(df['diagnosis']=='M', 1,0).astype(int)

X_train, X_test, y_Train, y_Test = train_test_split(X, y, test_size =0.2, random_state =5)

### Classifier RandomForest

I will be using RandomForestClassifier for demo. You can choose any classsifier on which parameter tuning is required.

The default parameters are also displayed

In [None]:
model = RandomForestClassifier()

print("Default Parameters ")
print('='*50)

pprint(model.get_params())

print('='*50)

### Parameters of classifier and their possible values for tuning

*These are based on RandomForestClassifier.*
>I will be applying GridSearch for tuning bootstrap, n_estimators, criterion, min_samples_leaf, max_features.

You should use your paramteres as per classifier.

In [None]:
bootstrap_v = [True, False]
n_estimators_v = list(range(100,2000,200))
criterion = ['gini', 'entropy']
min_sample_leaf_v = list(range(1,5,2))
max_features_v = ['sqrt', 'log2']


>Building the set of parameters to pass as variable to gridsearch

In [None]:
grid_params  = {
    'bootstrap' : bootstrap_v,
    'n_estimators' : n_estimators_v,
    'criterion' : criterion,
    'min_samples_leaf' : min_sample_leaf_v,
    'max_features' : max_features_v
}

print("Tuning Parameters")
print('='*50)

pprint(grid_params)
print('='*50)

## Applying gridSearch on model and fitting it

 >We passed our classifier as  estimator.

* estimator = model to apply gridSearch
* param_grid = the parameter set for tuning the classifier
* cv = the cross-validation factor.
* verbose = the intensity of background work that gets printed while fitting

In [None]:
grid_search = GridSearchCV(estimator=model, param_grid=grid_params, cv=3, verbose=1)

In [None]:
grid_search.fit(X_train, y_Train)

print('Best Parameters for our classsifier')
print('='*50)
print(grid_search.best_params_)
print('='*50)

## Function that will evaluate the working of our Classifier on test set

>It prints the parameters of classsifier, Classification report and Accuracy Score

In [None]:
def evaluate(model, X, y):
    
    pprint(model.get_params())
    print('=='*50)
    predictions = model.predict(X)
    report = classification_report(y, predictions)
    
    score = accuracy_score(y_true= y, y_pred= predictions)
    
    print(report)
    print('=='*50)
    print("{} {:0.2f}%".format("Accuracy Score :: ", score*100))
    
    

### Evaluation of our best Estimator Selected from GridSearchCV

In [None]:
evaluate(grid_search.best_estimator_, X_test, y_Test)

### Evaluation of our base Model

In [None]:
model.fit(X_train, y_Train)
evaluate(model, X_test, y_Test)

## Conclusion

>Our **Base Model accuracy was 97.37%** but after *hyperparameter tuning* our accuracy increased to **98.25% on Tuned Model**.

This is a significant increase in accuracy.

*Score when i ran the kernel, but the accuracy will improve on paramter tuning in most cases*