#### Model Evaluation

**Evaluation metrics for Multi label classification**

In [None]:
from sklearn import datasets
iris=datasets.load_iris()
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test= train_test_split(iris.data, iris.target, test_size=0.5, random_state=4)

from sklearn.tree import DecisionTreeClassifier
classifier=DecisionTreeClassifier(max_depth=2)
classifier.fit(X_train,Y_train)
Y_pred=classifier.predict(X_test)
print(iris.target_names)

# Confusion matrix
from sklearn import metrics
from sklearn.metrics import confusion_matrix
cm=confusion_matrix(Y_test,Y_pred)
print(cm)

['setosa' 'versicolor' 'virginica']
[[30  0  0]
 [ 0 19  3]
 [ 0  2 21]]


In [None]:
# Accuracy
print("Accuracy:", metrics.accuracy_score(Y_test, Y_pred))

# Precision for each class
print("Precision:", metrics.precision_score(Y_test, Y_pred, average=None))

# Precision - will return the total ratio of tp/(tp + fp)
print("Precision:", metrics.precision_score(Y_test, Y_pred, average='micro'))

# Recall with None
print("Recall:", metrics.recall_score(Y_test, Y_pred, average=None))

# Recall with micro
print("Recall:", metrics.recall_score(Y_test, Y_pred, average='micro'))

# F1 score with None
print("F1 score:", metrics.f1_score(Y_test, Y_pred, average=None))

# F1 score with micro
print("F1 score:", metrics.f1_score(Y_test, Y_pred, average='micro'))

Accuracy: 0.9333333333333333
Precision: [1.        0.9047619 0.875    ]
Precision: 0.9333333333333333
Recall: [1.         0.86363636 0.91304348]
Recall: 0.9333333333333333
F1 score: [1.         0.88372093 0.89361702]
F1 score: 0.9333333333333333


In [None]:
from sklearn.metrics import classification_report
print(classification_report(Y_test,Y_pred, target_names=iris.target_names))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        30
  versicolor       0.90      0.86      0.88        22
   virginica       0.88      0.91      0.89        23

    accuracy                           0.93        75
   macro avg       0.93      0.93      0.93        75
weighted avg       0.93      0.93      0.93        75



**Cross validation**

**K- fold cross validation** - as opposed to the normal Hold out method (Train-Test split)

* Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. 

The general procedure is as follows:

* Shuffle the dataset randomly.
* Split the dataset into k groups

For each unique group:
* Take the group as a hold out or test data set
* Take the remaining groups as a training data set
* Fit a model on the training set and evaluate it on the test set
* Retain the evaluation score and discard the model
* Summarize the skill of the model using the sample of model evaluation scores


**GridSearchCV**

In [None]:
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn import datasets, svm
#https://scikit-learn.org/stable/datasets/index.html
# Load the digit data
digits = datasets.load_digits()
print(digits.DESCR)
print(digits.data.shape)

# View the features of the first observation
print(digits.data[500:501])

# View the target of the first observation
print(digits.target[500:501])

.. _digits_dataset:

Optical recognition of handwritten digits dataset
--------------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 5620
    :Number of Attributes: 64
    :Attribute Information: 8x8 image of integer pixels in the range 0..16.
    :Missing Attribute Values: None
    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
    :Date: July; 1998

This is a copy of the test set of the UCI ML hand-written digits datasets
https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.

Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each blo

In [None]:
# Create dataset 1
data1_features = digits.data[:1000]
data1_target = digits.target[:1000]

# Create dataset 2
data2_features = digits.data[1000:]
data2_target = digits.target[1000:]

parameter_candidates = [
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['linear','rbf','poly']}
]

# Create a classifier object with the classifier and parameter candidates
clf = GridSearchCV(estimator=svm.SVC(), param_grid=parameter_candidates, 
                   n_jobs=-1, cv=20) # change cv to 10 and see

# Train the classifier on data1's feature and target data
clf.fit(data1_features, data1_target)  

# View the accuracy score
print('Best score for data1:', clf.best_score_) 

Best score for data1: 0.984


In [None]:
# View the best parameters for the model found using grid search
print('Best C:',clf.best_estimator_.C) 
print('Best Kernel:',clf.best_estimator_.kernel)
print('Best Gamma:',clf.best_estimator_.gamma)

Best C: 100
Best Kernel: poly
Best Gamma: 0.0001


In [None]:
# Train a new classifier using the best parameters found by the grid search
svm.SVC(C=100, kernel='poly', gamma=0.0001).fit(data1_features, data1_target).score(data2_features, data2_target)

0.9535759096612296