# Chapter 3 - Classification
## MNIST dataset 
### Multilabel Classification System

In some cases, we need our classifier to output **more than one class** per instance.

First of all, let's fetch our mnist data:

In [5]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [2]:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
mnist.keys()

dict_keys(['data', 'target', 'feature_names', 'DESCR', 'details', 'categories', 'url'])

Let's create the following variables:
- `X`: contains the full dataset
- `y`: contains the labels
- `X_train`: Training set
- `X_test`: Test set
- `y_train`: Labels training set
- `y_test`: Labels test set

In [3]:
import numpy as np
X, y = mnist["data"], mnist["target"]
y = y.astype(np.uint8)
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

some_digit = X[0]

Let's look at a simple example, just for illustration purposes.

We are creating a `y_multilabel` array containing two target labels for each digit image:
- The first contains wheter or not a digit is large (7, 8, 9)
- The seconds if it's odd.

Then, we create a `KNeighborsClassifier` and we train it using multible targets array.

In [6]:
from sklearn.neighbors import KNeighborsClassifier

y_train_large = (y_train >= 7)
y_train_odd = (y_train % 2 == 1)
y_multilabel = np.c_[y_train_large, y_train_odd]

kn_clf = KNeighborsClassifier()
kn_clf.fit(X_train, y_multilabel)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

Now, we can make a prediction and notice the two output values:

In [7]:
kn_clf.predict([some_digit])

array([[False,  True]])

It is right, digit (digit 5 is not large and it's odd).

There are many ways to evaluate multilabel classification and it really depends on the project.

> One way is to evaluate F1 score for each individual label. They simply compute the average score.

In [None]:
from sklearn.model_selection import cross_val_predict
y_train_kn_pred = cross_val_predict(kn_clf, X_train, y_multilabel, cv=3)

In [None]:
from sklearn.metrics import f1_score
f1_score(y_multilabel, y_train_kn_pred, average="macro")

This assumes all labels are equally important, however, which may not be the case.
If you want to give a weight equal to its support, simply set `average="weighted"`in the preceding code.