<a href="https://colab.research.google.com/github/satishgunjal/Machine-Learning-Using-Python/blob/master/15_Hyperparameter_Tuning/Exercise_Hyperparameter_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercise Hyperparameter Tuning
## Problem Statement
* For digits dataset in sklearn.datasets try following models and find the best performing model. Also find the optimal paramters for that classifier

  * from sklearn import svm
  * from sklearn.ensemble import RandomForestClassifier
  * from sklearn.linear_model import LogisticRegression
  * from sklearn.naive_bayes import GaussianNB
  * from sklearn.naive_bayes import MultinomialNB
  * from sklearn.tree import DecisionTreeClassfier

## Lets load the digits datset

In [1]:
from sklearn.datasets import load_digits
digits = load_digits()
dir(digits)

['DESCR', 'data', 'images', 'target', 'target_names']

## Understanding the dataset
* digits.DESCR > Description of the dataset
* digits.data > Contains 1797 training example. Since each image is 8x8 digts, 64 pixel is the size of each example
* digits.target > Contains target value for each training examples, so it conatins 1797 y labels
* digits.target_names > Contains name for each target since we have 10 possible classes it conatins 10 names only
* Here digits.data is our independent/inputs/ X variables
* And digits.target is our dependent/target/y varaibale

##Lets split the dataset

In [16]:
import pandas as pd
from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(digits.data,digits.target,test_size=0.2)
print("len of X_train is %s" % (len(X_train)))
print("len of X_test is %s" % (len(X_test)))
print("len of y_train is %s" % (len(y_train)))
print("len of y_test is %s" % (len(y_test)))

len of X_train is 1437
len of X_test is 360
len of y_train is 1437
len of y_test is 360


Import the required libraries

In [0]:
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB
from sklearn import tree

Lets create model parameters JSON to test it using GridSearchCV

In [0]:
model_params = {
    'SVM': {
        'model': svm.SVC(gamma='auto'),
        'params' : {
            'C': [1,10,20],
            'kernel': ['rbf','linear']
        }  
    },
    'RandomForrestClassifier': {
        'model': RandomForestClassifier(),
        'params' : {
            'n_estimators': [1,5,10]
        }
    },
    'LogisticRegression' : {
        'model': LogisticRegression(solver='liblinear',multi_class='auto'),
        'params': {
            'C': [1,5,10]
        }
    },
    'GaussianNB' : {
      'model': GaussianNB(),
      'params': { }
    },
    'MultinomialNB' : {
      'model': MultinomialNB(),
      'params': {  }
    },
    'DecisionTreeClassifier' : {
      'model': tree.DecisionTreeClassifier(),
      'params': {
          'criterion': ['gini','entropy']
        }
    }
}

In [23]:
from sklearn.model_selection import GridSearchCV

scores = []

for model_name, mp in model_params.items():
  cls = GridSearchCV(mp['model'],mp['params'], cv =5, return_train_score =False)
  cls.fit(X_train,y_train)
  scores.append({
      'model_name': model_name,
      'best_score': cls.best_score_,
      'best_params':cls.best_params_})
  
df = pd.DataFrame(scores,columns = ['model_name','best_score','best_params'])
df

Unnamed: 0,model_name,best_score,best_params
0,SVM,0.978426,"{'C': 1, 'kernel': 'linear'}"
1,RandomForrestClassifier,0.942255,{'n_estimators': 10}
2,LogisticRegression,0.958244,{'C': 1}
3,GaussianNB,0.821148,{}
4,MultinomialNB,0.903971,{}
5,DecisionTreeClassifier,0.855967,{'criterion': 'entropy'}
