# Multi-Class and Multi-label Classification

In this notebook we address the following questions:

1. If all Scikit-Learn classifier are capable of multi-class classification, do you need to do something special with SVC?
2. How do you go from multi-class to multi-label? Do you need to wrap the classifier into a `OneVsRest` meta-transformer?
3. What is the correct way of taking an XGBoost classifier and make it return a multi-label prediction?

For this, we create two datasets: a multi-class and a multi-label one.

In [48]:
from sklearn.datasets import make_classification
from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.multiclass import OneVsRestClassifier

In [20]:
x_mc, y_mc = make_classification(n_samples=500, n_features=20, n_informative=5,
                                 n_redundant=5, n_classes=5)
x_mc_tr, x_mc_val, y_mc_tr, y_mc_val = train_test_split(x_mc, y_mc, test_size=0.2,
                                                       random_state=42, stratify=y_mc)

We indicate with `x_ml` and `y_ml` the multilabel dataset.

**Question**: what does it mean to stratify w.r.t. a multi-label y?

In [21]:
x_ml, y_ml = make_multilabel_classification(n_samples=500, n_features=20, n_classes=5,
                                      n_labels=2, allow_unlabeled=False, random_state=42)
x_ml_tr, x_ml_val, y_ml_tr, y_ml_val = train_test_split(x_ml, y_ml, test_size=0.2,
                                                       random_state=42, stratify=y_ml)

## Multiclass classification with SVC

Since the datasets are generated at random, we don't bother looking for the best hyper-parameter configuration. We apply an SVC with default setting to the multi-class dataset.
Support Vector Classifiers are known to scale poorly with the number of samples. This is why the default approach to multiclass classification for SVCs is One-vs-One.

In the example below, SVC can predict out of the box the multi-class outputs.

In [26]:
clf = SVC()
clf.fit(x_mc_tr, y_mc_tr)
pred_mc_val = clf.predict(x_mc_val)

In [27]:
pred_mc_val

array([3, 0, 0, 1, 3, 1, 2, 1, 1, 1, 4, 2, 1, 1, 1, 1, 1, 3, 1, 1, 0, 3,
       1, 1, 3, 0, 3, 1, 1, 1, 2, 1, 1, 0, 2, 2, 0, 4, 4, 1, 2, 3, 4, 2,
       1, 4, 3, 1, 2, 0, 2, 3, 1, 4, 4, 4, 3, 0, 4, 1, 0, 1, 2, 2, 0, 1,
       4, 3, 2, 2, 2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 3, 4, 4, 3, 4, 4, 1, 1,
       1, 0, 0, 2, 1, 1, 4, 3, 3, 3, 0, 3])

In [28]:
print(classification_report(y_mc_val, pred_mc_val))

              precision    recall  f1-score   support

           0       0.86      0.63      0.73        19
           1       0.59      0.90      0.72        21
           2       0.60      0.45      0.51        20
           3       0.82      0.70      0.76        20
           4       0.64      0.70      0.67        20

    accuracy                           0.68       100
   macro avg       0.70      0.68      0.68       100
weighted avg       0.70      0.68      0.68       100



In the report above, the support of the 5 classes is almost identical by construction, therefore there is no difference between the macro and the weighted average.

## Multi-label Classification

If we now run the code below

```py
clf.fit(x_ml_tr, y_ml_tr)
pred_ml_val = clf.predict(x_ml_val)
```

We get the error

```
y should be a 1d array, got an array of shape (400, 5) instead.
```

This shows that we cannot run multi-label classification out of the box. If, however, we wrap the classifier into `OneVsRest`, we obtain a result.

In [34]:
clf = OneVsRestClassifier(SVC())
clf.fit(x_ml_tr, y_ml_tr)
pred_ml_val = clf.predict(x_ml_val)

In [37]:
pred_ml_val[:5]

array([[0, 1, 1, 1, 0],
       [0, 1, 1, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 1, 1, 0],
       [0, 0, 0, 1, 0]])

In [39]:
print(classification_report(y_ml_val, pred_ml_val))

              precision    recall  f1-score   support

           0       0.89      0.84      0.86        37
           1       0.84      0.85      0.85        61
           2       0.88      0.88      0.88        58
           3       0.88      0.86      0.87        49
           4       0.67      0.38      0.48        21

   micro avg       0.86      0.81      0.83       226
   macro avg       0.83      0.76      0.79       226
weighted avg       0.85      0.81      0.83       226
 samples avg       0.88      0.86      0.85       226



## Random Forests

Random Forests are supposed to deal with multi-class and multi-label classification out of the box.

### Multi-Class Classification

In [56]:
clf = RandomForestClassifier()
clf.fit(x_mc_tr, y_mc_tr)
pred_mc_val = clf.predict(x_mc_val)

In [57]:
pred_mc_val

array([3, 2, 0, 1, 3, 4, 0, 1, 0, 1, 4, 0, 2, 1, 1, 3, 3, 2, 1, 1, 3, 3,
       1, 4, 3, 0, 3, 2, 4, 1, 2, 1, 1, 0, 2, 2, 0, 4, 4, 1, 2, 3, 4, 2,
       1, 4, 3, 1, 2, 3, 2, 3, 1, 4, 4, 3, 3, 0, 4, 1, 0, 1, 2, 2, 0, 0,
       1, 3, 2, 2, 2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 3, 4, 4, 3, 4, 4, 0, 1,
       1, 0, 0, 2, 1, 4, 4, 3, 3, 3, 0, 3])

In [58]:
print(classification_report(y_mc_val, pred_mc_val))

              precision    recall  f1-score   support

           0       0.75      0.63      0.69        19
           1       0.59      0.62      0.60        21
           2       0.65      0.55      0.59        20
           3       0.67      0.70      0.68        20
           4       0.62      0.75      0.68        20

    accuracy                           0.65       100
   macro avg       0.66      0.65      0.65       100
weighted avg       0.65      0.65      0.65       100



### Multi-Label Classification

We can pass a multi-label dataset to a Random Forest estimator without wrapping it into a `OneVsRest` meta-estimator.

In [59]:
clf.fit(x_ml_tr, y_ml_tr)
pred_ml_val = clf.predict(x_ml_val)

In [61]:
pred_ml_val[:5]

array([[0, 0, 1, 1, 0],
       [0, 1, 1, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 0, 1, 0],
       [0, 0, 0, 1, 0]])

In [63]:
print(classification_report(y_ml_val, pred_ml_val))

              precision    recall  f1-score   support

           0       0.74      0.62      0.68        37
           1       0.82      0.95      0.88        61
           2       0.81      0.93      0.86        58
           3       0.78      0.80      0.79        49
           4       0.83      0.24      0.37        21

   micro avg       0.80      0.79      0.79       226
   macro avg       0.80      0.71      0.72       226
weighted avg       0.80      0.79      0.77       226
 samples avg       0.83      0.85      0.81       226



## XGBoost for multi-class classification

In [42]:
import xgboost as xgb

dtrain = xgb.DMatrix(data=x_mc_tr, label=y_mc_tr)
dtest = xgb.DMatrix(data=x_mc_val, label=y_mc_val)

XGBoost has a `multi:softprob` option, which, according to the documentation, is

> same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata * nclass matrix. The result contains predicted probability of each data point belonging to each class.

Note also that the `eval_metric` parameters admits a `mlogloss` option corresponding to the multi-class log-loss. The documentation for this parameter points directly to Scikit-Learn's documentation for the same function. 

In [66]:
params = {'max_depth': 2, 'eta': 1, 'objective': 'multi:softmax',
          'eval_metric': 'mlogloss', 'num_class': 5}
bst = xgb.train(params, dtrain, 2)

In [67]:
preds = bst.predict(dtest)

In [68]:
preds

array([0., 0., 0., 1., 3., 1., 2., 3., 0., 1., 4., 2., 2., 2., 2., 3., 3.,
       2., 2., 3., 2., 3., 1., 4., 2., 2., 2., 2., 4., 0., 0., 0., 3., 0.,
       2., 2., 0., 4., 4., 3., 2., 3., 2., 2., 3., 4., 3., 4., 2., 0., 2.,
       3., 0., 0., 4., 3., 3., 0., 4., 3., 0., 2., 2., 2., 0., 0., 3., 3.,
       0., 1., 0., 4., 4., 0., 3., 2., 4., 4., 0., 4., 3., 4., 1., 3., 4.,
       1., 1., 3., 3., 2., 0., 2., 2., 4., 4., 3., 3., 3., 0., 0.],
      dtype=float32)

In [69]:
print(classification_report(y_mc_val, preds))

              precision    recall  f1-score   support

           0       0.48      0.58      0.52        19
           1       0.62      0.24      0.34        21
           2       0.38      0.50      0.43        20
           3       0.48      0.60      0.53        20
           4       0.50      0.45      0.47        20

    accuracy                           0.47       100
   macro avg       0.49      0.47      0.46       100
weighted avg       0.50      0.47      0.46       100



## XGBoost for multi-label classification

`xgb.dtrain` throws an error if we try to pass a 2D array of one-hot encoded labels. The (rather cryptic) error message is

> only size-1 arrays can be converted to Python scalars

The `xgb` package, however, provides a Scikit-Learn API and, in particular, the `XGBClassifier` class. This, alone, does not work with multi-label datasets, but it can be wrapped into a `OneVsRestClassifier` meta-estimator.

In [78]:
params = {'max_depth': 2, 'eta': 1, 'objective': 'multi:softprob',
          'eval_metric': 'mlogloss', 'num_class': 5}
clf = OneVsRestClassifier(xgb.XGBClassifier(**params))

In [79]:
clf.fit(x_ml_tr, y_ml_tr)
pred_ml_val = clf.predict(x_ml_val)

In [80]:
pred_ml_val[:5]

array([[0, 0, 1, 0, 0],
       [0, 1, 1, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 1, 1, 0],
       [1, 0, 0, 1, 0]])

In [81]:
print(classification_report(y_ml_val, pred_ml_val))

              precision    recall  f1-score   support

           0       0.77      0.65      0.71        37
           1       0.89      0.84      0.86        61
           2       0.85      0.88      0.86        58
           3       0.79      0.78      0.78        49
           4       0.58      0.52      0.55        21

   micro avg       0.81      0.77      0.79       226
   macro avg       0.78      0.73      0.75       226
weighted avg       0.81      0.77      0.79       226
 samples avg       0.84      0.81      0.79       226



  _warn_prf(average, modifier, msg_start, len(result))
