## MultiClass Classification

Some algorithms such as SGD Classifiers, Random Forest Classifiers, and naive Bayes Classifiers are capable of handling multiple classes.

Others such as SVMs or Logistic Regression are strictly binary classifiers.

**OvR (one-versus the rest strategy )**

One way to create a system for detecting 10 classes, is to train 10 binary classifiers, one for each digit, and then get the decision score from every classifier for that image and identify image with the highest scoring classifier.



**OvO (one vs one)**

Train a binary classifier for every pair of digits: one to distinguish 0s and 1s, another to distinguish 0s and 2s, another for distinguish 0s and 2s,. If there are n classes, you need to train n x (n-1) / 2 classifiers 

Scikit Learn detects when binary classification algorithm used for a multiclass classification problem, and automatically runs either OvR or OvO, depending on the algorithm.

In [1]:
from sklearn.svm import SVC
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1)
mnist.keys()
X, y = mnist['data'], mnist['target']

In [2]:
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
svm_clf = SVC()
svm_clf.fit(X_train, y_train)

In [3]:
some_image_pxs = mnist['data'].values[0]
some_image = some_image_pxs.reshape(28,-1)

Under the hood, SKlearn actually used OvO for the above SVC classifier, it trained 45 binary classifier, got their decision scores and selected the class that won most duels.

In [4]:
svm_clf.predict([some_image_pxs])



array(['5'], dtype=object)

In [5]:
scores_per_digits = svm_clf.decision_function([some_image_pxs])



In [6]:
scores_per_digits

array([[ 1.72501977,  2.72809088,  7.2510018 ,  8.3076379 , -0.31087254,
         9.3132482 ,  1.70975103,  2.76765202,  6.23049537,  4.84771048]])

In [7]:
import numpy as np
np.argmax(scores_per_digits), svm_clf.classes_

(5, array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], dtype=object))

In case you need to train a classifier Ovr you should use its class form

In [8]:
from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier()
sgd_clf.fit(X_train, y_train)
sgd_clf.predict([some_image_pxs])



array(['5'], dtype='<U1')

Remember that, SGD can work on Multiple Classs directly, so none of OvR or OvO was run

In [9]:
sgd_clf.decision_function([some_image_pxs])



array([[ -9661.56965908, -22048.07168665,  -7829.77134425,
          1550.53509737, -27334.55827455,   5483.16990077,
        -24043.27516549, -24215.54666256,  -8891.68540738,
         -8752.76852579]])

In [11]:
# To evaluate the classifier cross_val_score() can be used

from sklearn.model_selection import cross_val_score
cross_val_score(sgd_clf, X_train, y_train, cv=3, scoring="accuracy")

array([0.8798, 0.8793, 0.8676])

The accuracy is with random method 10%, so getting % is not a bad score, but standardization or normalization boosts the accuracy.

In [12]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train.astype(np.float64))
cross_val_score(sgd_clf, X_train_scaled, y_train, cv=3, scoring="accuracy")



array([0.90155, 0.8943 , 0.90755])

### Error Analysis

If your project were real, you explore your data, prepare data,  try out multiple models (shortlist the best ones and fine-tune hyperparameters using GridSearchCV) and automate as much as possible.

Now, assume you've found promising model, it is time for you to analyze the rror

In [13]:
#First look at confusion matrix
from sklearn.model_selection import cross_val_predict
y_train_pred = cross_val_predict(sgd_clf, X_train_scaled, y_train, cv=3)