<a href="https://colab.research.google.com/github/muntazirabidi/machine_learning_tutorials/blob/main/multiclass_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multiclass Classification

Multiclass classification is the task of classifying instances into one of three or more classes. For example, classifying a set of images as "dog," "cat," "bird," or "fish" would be a multiclass classification problem.

There are several approaches to multiclass classification, including one-versus-all and one-versus-one. In one-versus-all (OVA) classification, a separate binary classifier is trained for each class, with the class being classified as "positive" and all other classes as "negative." For example, in the "dog," "cat," "bird," "fish" classification problem, four classifiers would be trained: one to distinguish dogs from non-dogs, one to distinguish cats from non-cats, one to distinguish birds from non-birds, and one to distinguish fish from non-fish.

In one-versus-one (OVO) classification, a binary classifier is trained for every pair of classes. For the "dog," "cat," "bird," "fish" classification problem, this would result in six classifiers being trained: one to distinguish dogs from cats, one to distinguish dogs from birds, one to distinguish dogs from fish, one to distinguish cats from birds, one to distinguish cats from fish, and one to distinguish birds from fish.

There are pros and cons to both approaches. OVA can be more efficient, as it requires training fewer classifiers, but it can be less effective if the "negative" class is highly diverse. OVO requires training more classifiers, but each classifier only has to distinguish between two classes, which can make it more effective.

> Some algorithms (such as Support Vector Machine classifiers) scale poorly with the size of the training set. For these algorithms OvO is preferred because it is faster to train many classifiers on small training sets than to train few classifiers on large training sets. For most binary classification algorithms OvR is preffered. 


Scikit-Learn detects when you try to use a binary classification algorithms for a multi-class classification task, and it automatically runs OvR or OvO depending on the algorithm. Lets try this with SVM. We first load the MNIST data and split it into test and train datasets.

In [15]:
from sklearn.datasets import fetch_openml
import numpy as np
from IPython.display import set_matplotlib_formats
import matplotlib as mpl
import matplotlib.pyplot as plt 


mnist = fetch_openml('mnist_784', version=1)
X, y = mnist['data'], mnist['target']
y=y.astype(np.uint8)
X_train, X_test, y_train, y_test = X.to_numpy()[:60000], X.to_numpy()[60000:], y.to_numpy()[:60000], y.to_numpy()[60000:]


In [16]:
from sklearn.svm import SVC

svm_clf = SVC()
svm_clf.fit(X_train, y_train)

some_digit = X.to_numpy()[0] # some digit
svm_clf.predict([some_digit])

array([5], dtype=uint8)

In [17]:
some_digit_scores = svm_clf.decision_function([some_digit])
some_digit_scores

array([[ 1.72501977,  2.72809088,  7.2510018 ,  8.3076379 , -0.31087254,
         9.3132482 ,  1.70975103,  2.76765202,  6.23049537,  4.84771048]])

In [6]:
np.argmax(some_digit_scores)

5

In [7]:
svm_clf.classes_

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8)

In [10]:
svm_clf.classes_[5]

5

In [13]:
from sklearn.multiclass import OneVsRestClassifier
ovr_clf = OneVsRestClassifier(SVC())
ovr_clf.fit(X_train, y_train)
ovr_clf.predict([some_digit])

array([5], dtype=uint8)

In [18]:

from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42)

sgd_clf.fit(X_train, y_train)
sgd_clf.predict([some_digit])

array([3], dtype=uint8)

In [19]:
# simply scaling the input can increase the accuracy.
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled =scaler.fit_transform(X_train.astype(np.float64))
cross_val_score(sgd_clf, X_train_scaled, y_train, cv=3, scoring = 'accuracy')

array([0.8983, 0.891 , 0.9018])

## Error Analysis
In ML project one should start with exploring sdata preparation, try out multiple models, shortlisitng the best ones and fine-tune their hyperparameters using `GridSearchCV` and automate as much as possible. Now you have found a promising model and want to improve it. One way to do it is to analyse the types of errors it makes. First look at the confusion martrix

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import cross_val_predict

y_train_pred = cross_val_predict(sgd_clf, X_train_scaled, y_train, cv=3)
conf_mx = confusion_matrix(y_train, y_train_pred)

conf_mx