<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Multiclass-Cassification" data-toc-modified-id="Multiclass-Cassification-1">Multiclass Cassification</a></span><ul class="toc-item"><li><span><a href="#Multilabel-Classification" data-toc-modified-id="Multilabel-Classification-1.1">Multilabel Classification</a></span></li></ul></li></ul></div>

# Multiclass Cassification
We've taken a look at binary classifiers - a model that distinguishes between two classes. Now, we will look at multi-class classifiers, which are models that can distinguish between more than two classes. There are some classifiers that can do both binary and multi (`SGDClassifier`, `RandomForest`), but some are strictly binary (`LogisticRegression`, `SVM`). Let's take a look at why this is the case.

Multi-class classifiers can work various ways. For example, if we took the MNIST dataset and wanted to create a model that could classify the images into 10 classes (0 to 9), we can take two approaches:
- *One-versus-rest (OvR)*: Create 10 separate binary classifiers. Then, we take an input and run it through all 10 binary classifiers. The class that scores the highest performance score (i.e. the '3-detector') would be selected.
- *One-versus-one (OvO)*: We train a binary classifier for every possible pair of classes. For example, we would train a classifier to distinguish between '0' and '1', and then '1', '2', and so on. Therefore, if there are $N$ classes, we create $N \times (N-1) / 2$ binary classifiers. In this example, we would have to train 45 classifiers and see which class wins the most 'duels' in performance. 
    - An advantage of the this, is that each classifier only needs to be trained on the part of the training set that the two classes need to distinguish, rather than the whole thing. This means that it scales better with larger training sets, and it's faster to train many classifiers on small training sets. 

In the end, binary classifiers will prefer to adopt the first method (*one-versus-all*), while exclusive multi-class classifiers prefer to adopt the second (*one-versus-one*). You can customize your classifier using Scikit-Learn. All you have to do is to pass a classifier (binary or multi) through a constructor. 

Let's take a support vector machine classifier as an example. I'm not going to get into support vector machines yet, but SVC's scale poorly with larger training sets. One-vs-One is preferred, but we can actually change this ourselves. For example:

In [53]:
# SVC (one-vs-one)
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC

# define dataset
X, y = make_classification(n_samples=1000, n_features=10, 
                           n_informative=5, n_redundant=5, n_classes=3, random_state=1)

# define model
model = SVC(decision_function_shape='ovo') # one-vs-one

# fit model
model.fit(X, y)

# make predictions
model.predict(X)[:10]

array([1, 0, 1, 2, 0, 2, 2, 2, 0, 0])

In [55]:
# SVC (one-vs-rest)

# additional libraries
from sklearn.multiclass import OneVsRestClassifier

# define dataset
X, y = make_classification(n_samples=1000, n_features=10,
                           n_informative=5, n_redundant=5, n_classes=3, random_state=1)

# changing strategy
ovr_svc = OneVsRestClassifier(SVC())

# fit model
ovr_svc.fit(X, y)

# make predictions
ovr_svc.predict(X)[:10]


array([1, 0, 1, 2, 0, 2, 2, 2, 0, 0])

## Multilabel Classification

So far, we've looked at models that classify data points to a single label. However, there are instances in which we might want to classify multiple labels for a single data point. For example, if we built a face recognition software to classify the faces of Alice, Bob, and Charlie, a single-label-classifier can only consider pictures that contain only one face. However, a multi-label classifier can classify a picture of Alice and Charlie, with an output of $[1,0,1]$. 

Take a KNN classifier for an example. Say that we want to create a classifier that tells us if a digit/image is a large number, an odd number, or both.

In [58]:
# data collection (don't show)
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
mnist.keys()
y = y.astype(np.uint8)
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

In [59]:
from sklearn.neighbors import KNeighborsClassifier

y_train_large = (y_train >= 7) # large number
y_train_odd = (y_train % 2 == 1) # odd number
y_multilabel = np.c_[y_train_large, y_train_odd] # creating multi-label

knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_multilabel)

knn_clf.predict([some_digit])

array([[False, False]])

In this chunk of code, `y_multilabel` has created two target labels for each datapoint (image). The output seems to make sense - the number 5 is not large, but is an odd number. 

Evaluating multi-label classifiers are also quite similar to evaluating other classifiers. Of course, it depends on the problem and what we are looking for, but a common one is to use the $F_1$ score. 

In [67]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import f1_score

y_train_knn_pred = cross_val_predict(knn_clf, X_train, y_multilabel, cv=3)
f1_score(y_multilabel, y_train_knn_pred, average="macro", zero_division = True)

0.9420401854714066