# Multiclass and Multilabel Classification

The sklearn.multiclass module implements meta-estimators to solve multiclass and multilabel classification problems by decomposing such problems into binary classification problems. multioutput regression is also supported.

## Multiclass Classification

In machine learning, multiclass or multinomial classification is the problem of classifying instances into one of three or more classes (Wikipedia).

In [1]:
import numpy as np

Examples:

1. 1d or column vector containing more than two discrete values. An example of a vector y for 3 samples:

In [3]:
y = np.array(['apple', 'pear', 'apple', 'pineapple'])

y

array(['apple', 'pear', 'apple', 'pineapple'], dtype='<U9')

2. parse binary matrix of shape (n_samples, n_classes) with a single element per row, where each column represents one class. An example of a sparse binary matrix y for 3 samples, where the columns, in order, are orange, apple and pear:

In [4]:
from scipy import sparse

In [32]:
row_ind = np.array([0, 1, 2])
col_ind = np.array([1, 2, 0])

In [33]:
y_sparse = sparse.csr_matrix((np.ones(3), (row_ind, col_ind)))

In [34]:
y_sparse

<3x3 sparse matrix of type '<class 'numpy.float64'>'
	with 3 stored elements in Compressed Sparse Row format>

In [35]:
print(y_sparse)

  (0, 1)	1.0
  (1, 2)	1.0
  (2, 0)	1.0


In [36]:
y_sparse.todense()

matrix([[0., 1., 0.],
        [0., 0., 1.],
        [1., 0., 0.]])

In [37]:
np.array([1, 2, 0])

array([1, 2, 0])

In [38]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, y = datasets.load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8)

### Fit strategies

#### 1. Decomposition Strategies

##### 1.1. One vs. Rest (OvR)

The strategy consists in fitting one classifier per class. For each classifier, the class is fitted against all the other classes. In addition to its computational efficiency (only n_classes classifiers are needed), one advantage of this approach is its interpretability. Since each class is represented by one and only one classifier, it is possible to gain knowledge about the class by inspecting its corresponding classifier. This is the most commonly used strategy and is a fair default choice.

e = k

e: Number of estimators

k: Number of Classes

In [39]:
from sklearn.linear_model import LogisticRegression

In [40]:
from sklearn.multiclass import OneVsRestClassifier

In [41]:
OvR = OneVsRestClassifier(LogisticRegression())

In [42]:
OvR.fit(X, y)

OneVsRestClassifier(estimator=LogisticRegression(C=1.0, class_weight=None,
                                                 dual=False, fit_intercept=True,
                                                 intercept_scaling=1,
                                                 l1_ratio=None, max_iter=100,
                                                 multi_class='auto',
                                                 n_jobs=None, penalty='l2',
                                                 random_state=None,
                                                 solver='lbfgs', tol=0.0001,
                                                 verbose=0, warm_start=False),
                    n_jobs=None)

In [43]:
OvR.estimators_

[LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False),
 LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False),
 LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False)]

In [44]:
y_pred = OvR.predict(X_test)

In [47]:
from sklearn.metrics import accuracy_score, confusion_matrix

In [46]:
accuracy_score(y_test, y_pred)

0.9666666666666667

In [48]:
confusion_matrix(y_test, y_pred)

array([[ 8,  0,  0],
       [ 0, 11,  1],
       [ 0,  0, 10]])

#### 1.2. One vs. One (OvO)

constructs one classifier per pair of classes. At prediction time, the class which received the most votes is selected.

e = k(k-1)/2

In [49]:
from sklearn.multiclass import OneVsOneClassifier

In [50]:
OvO = OneVsOneClassifier(LogisticRegression())

In [51]:
OvO.fit(X_train, y_train)

OneVsOneClassifier(estimator=LogisticRegression(C=1.0, class_weight=None,
                                                dual=False, fit_intercept=True,
                                                intercept_scaling=1,
                                                l1_ratio=None, max_iter=100,
                                                multi_class='auto', n_jobs=None,
                                                penalty='l2', random_state=None,
                                                solver='lbfgs', tol=0.0001,
                                                verbose=0, warm_start=False),
                   n_jobs=None)

In [52]:
OvO.estimators_

(LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False),
 LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False),
 LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=100,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False))

In [53]:
y_pred = OvO.predict(X_test)

In [54]:
accuracy_score(y_test, y_pred)

1.0

##### 1.3. Error-Correcting Output-Codes (ECOC)

e depends on code_size.

0 < code_size < 1: Compressed model, e < k

code_size > 1: Redundant model.

In [56]:
from sklearn.multiclass import OutputCodeClassifier

In [57]:
clf = OutputCodeClassifier(LogisticRegression(),
                           code_size=1.4, random_state=0)

In [58]:
clf.fit(X_train, y_train)

OutputCodeClassifier(code_size=1.4,
                     estimator=LogisticRegression(C=1.0, class_weight=None,
                                                  dual=False,
                                                  fit_intercept=True,
                                                  intercept_scaling=1,
                                                  l1_ratio=None, max_iter=100,
                                                  multi_class='auto',
                                                  n_jobs=None, penalty='l2',
                                                  random_state=None,
                                                  solver='lbfgs', tol=0.0001,
                                                  verbose=0, warm_start=False),
                     n_jobs=None, random_state=0)

In [59]:
y_pred = clf.predict(X_test)

In [60]:
y_pred

array([1, 2, 1, 0, 0, 2, 0, 0, 2, 0, 2, 2, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0,
       0, 2, 0, 2, 2, 2, 0, 0])

In [61]:
y_test

array([1, 2, 2, 1, 0, 2, 0, 0, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0,
       0, 2, 1, 2, 2, 2, 1, 0])

In [62]:
from sklearn.metrics import confusion_matrix

In [64]:
confusion_matrix(y_test, y_pred)

array([[8, 0, 0],
       [8, 4, 0],
       [0, 1, 9]])

In [65]:
accuracy_score(y_test, y_pred)

0.7

### 2. Hierarchy strategy

Decision trees and decision trees ensembles

In [66]:
from sklearn.tree import DecisionTreeClassifier

In [67]:
clf = DecisionTreeClassifier()

In [68]:
clf.fit(X_train, y_train)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=None, splitter='best')

In [69]:
y_pred = clf.predict(X_test)

In [70]:
y_pred

array([1, 2, 2, 1, 0, 2, 0, 0, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0,
       0, 2, 1, 2, 2, 2, 1, 0])

## Multilabel Classification and Multipleoutput Regression

Multilabel Classification: Many categorical variables as output

Multioutput Regression: Many continuous variables as output

As also a combination of them.

In [250]:
from sklearn.multioutput import MultiOutputClassifier, MultiOutputRegressor

In [71]:
from sklearn.multioutput import ClassifierChain, RegressorChain, ClassifierMixin