# M/L Commando Course
## Multi* classification
Scikit learn (v0.24) has: 
* good support for multiclass problems (now natively supported by all classifiers)
* OK support for multilabel problems
* Spotty support for multioutput-multiclass problems (for instance, no metric/scoring support at all!)

In this notebook we demonstrate the first of these and try to deal with the latter two...

Our current recommendation is to use a different tech for the latter two, such as Keras (neural networks) ... this gives you much more control over how you apply loss functions and metrics.

In [45]:
from math import exp
from random import random, shuffle, choice, randint
import numpy as np
import sklearn
from matplotlib import pyplot as plt
print ('scikit-learn version:', sklearn.__version__)

from sklearn.utils.multiclass import type_of_target

print(sklearn.metrics.SCORERS.keys())

scikit-learn version: 0.23.1
dict_keys(['explained_variance', 'r2', 'max_error', 'neg_median_absolute_error', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_root_mean_squared_error', 'neg_mean_poisson_deviance', 'neg_mean_gamma_deviance', 'accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'roc_auc_ovr_weighted', 'roc_auc_ovo_weighted', 'balanced_accuracy', 'average_precision', 'neg_log_loss', 'neg_brier_score', 'adjusted_rand_score', 'homogeneity_score', 'completeness_score', 'v_measure_score', 'mutual_info_score', 'adjusted_mutual_info_score', 'normalized_mutual_info_score', 'fowlkes_mallows_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'jaccard', 'jaccard_macro', 'jaccard_micro', 'jaccard_samples', 'jaccard_weighted'])


## Multiclass - MNIST digit recognition
A multiclass problem is simply a non-binary classification problem.

Typically $y = n$ where $n \in {Class~IDs}$

Alternative one-hot encoding can be used:

$y_{one~hot}=[0,0, \cdots, y_{C}=1 , \cdots ,0,0]$ which indicates $y \in C$

In a regular multiclass problem, each sample can be a member of only one class.  (If this is not the case, consider a multi-label approach instead)

In [None]:
datafname = "data/mnist_data.npz"

# from keras.datasets import mnist
# mnist_data = mnist.load_data()
# np.savez_compressed(datafname, np.array(mnist_data))
mnist_data = np.load(datafname)["arr_0"]
((tX, ty), (vX, vy)) = mnist_data

print(mnist_data)

print(tX.shape)
print(ty.shape, "<- cardinality=1")
print(np.unique(ty), "<- multiclass classes=10")

for i in range(9):
    plt.subplot(330+1+i)
    plt.imshow(tX[i])
plt.show()

In [None]:
from sklearn.preprocessing import MinMaxScaler

tX = tX.reshape((len(tX), 28*28))
print(tX.shape)
vX = vX.reshape((len(vX), 28*28))
print(ty.shape)

max_n = 1000

print(np.min(tX), np.max(tX))
sc = MinMaxScaler()
# tX = sc.fit_transform(tX)[0:max_n, :]
# vX = sc.transform(vX)[0:max_n,:]
print(np.min(tX), np.max(tX))
ty = ty[0:max_n]
vy = vy[0:max_n]
print(tX.shape)
print(ty.shape)

# for i in range(9):
#     plt.subplot(330+1+i)
#     plt.imshow(tX[i].reshape(28,28))
# plt.show()

In [None]:
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix, multilabel_confusion_matrix, f1_score

estimators = [LogisticRegression(multi_class="multinomial", max_iter=1000), 
              LogisticRegression(multi_class="ovr", max_iter=1000),
              SGDClassifier(max_iter=1000),
              SVC(),
             ]
for ix,est in enumerate(estimators):
    est.fit(tX,ty)
    print("Estimator #{}".format(ix))
    scs = cross_val_score(est, tX, ty, cv=5, scoring="f1_weighted")
    print(scs, np.mean(scs))
#     print(confusion_matrix(ty, est.predict(tX)))
    print("Test set f1", f1_score(vy, est.predict(vX), average="weighted"))
    print("Test set confusion mx:\n",confusion_matrix(vy, est.predict(vX)))

## Multilabel
Multioutput-multilabel (or multilabel-indicator) problems have a "wide" target, where each element can take a binary value (any number of these can be 1).

$y = [y_0, y_1, y_2 ... y_N]$ where $y_0 \in \{0,1\}$, $y1 \in \{0,1\}$, etc

The binary flags show membership of each class, so column 0 represents $Class_0$, column 1 is $Class_1$, etc

It may be easier not to think of these as "classes" at all but tags (or labels as the name suggests) which are non mutually exclusive:  e.g. films might have (comedy)+(horror), (comedy)+(romance), (action)+(horror) etc

In [44]:
from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier, LinearRegression, SGDRegressor, LogisticRegression, RidgeClassifier
from sklearn.multioutput import MultiOutputClassifier, ClassifierChain
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler

X, y = make_multilabel_classification(n_samples=1000, n_features=10, n_classes=5, n_labels=2,
                                      allow_unlabeled=True,
                                      random_state=666)

print("\nType of target is", type_of_target(y),"\n")

X_train, X_test, y_train, y_test = train_test_split(X,y)
scale = StandardScaler()
X_train = scale.fit_transform(X_train)
X_test = scale.transform(X_test)

print("Try normal classifier")
est = LogisticRegression() #SGDClassifier(loss="log")
try:
    est.fit(X_train, y_train)
except Exception as e:
    print(e)
    
this_scoring = "f1_weighted"
est= MultiOutputClassifier(LogisticRegression())
est.fit(X_train, y_train)
scs = cross_val_score(est, X_train, y_train, cv=5, scoring=this_scoring)
print("\nMOC", scs, np.mean(scs))
print(multilabel_confusion_matrix(y_train, est.predict(X_train)))
    
est = ClassifierChain(LogisticRegression())
print(y_train.shape, y_train.ndim)
est.fit(X_train, y_train)
scs = cross_val_score(est, X_train, y_train, cv=5, scoring=this_scoring)
print("\nCCh", scs, np.mean(scs))
print(multilabel_confusion_matrix(y_train, est.predict(X_train)))

est= OneVsRestClassifier(LogisticRegression())
est.fit(X_train, y_train)
scs = cross_val_score(est, X_train, y_train, cv=5, scoring=this_scoring)
print("\nOvR", scs, np.mean(scs))
print(multilabel_confusion_matrix(y_train, est.predict(X_train)))

#Multi-layer perceptron (Neural Network) has native multilabel support
est= MLPClassifier(max_iter=10000, early_stopping=False)
est.fit(X_train, y_train)
scs = cross_val_score(est, X_train, y_train, cv=5, scoring=this_scoring)
print("\nMLP", scs, np.mean(scs))
print(multilabel_confusion_matrix(y_train, est.predict(X_train)))


MLP [0.73613656 0.70431199 0.72587237 0.73444921 0.73579433] 0.7273128942031183
[[[473   0]
  [  0 277]]

 [[458   0]
  [  0 292]]

 [[485   0]
  [  0 265]]

 [[457   0]
  [  0 293]]

 [[420   0]
  [  0 330]]]


## Multiclass-Multioutput
Multiclass-multioutput problems have a "wide" target, where each element can take a class value (these do not have to be binary).

We have N sets of possible "class groups" $G_0, G_1 .. G_N$
Each class group is some set of classes $G_0 = \{C_{0a}, C_{0b}, ..\}$; $G_1=\{C_{1a}, C_{1b}, ...\}$ 

$y = [y_0, y_1, y_2 ... y_N]$ where $y_0 \in G_0$, $y1 \in G_1$, etc 

In [None]:
X=None
n_classes_per_col = 3 # in practice there might a different n of classes for each column
y_width = 3  # this is the total width of each y entry
X_width = 10 # this is the desired total width of each X entry (total num of features)
av_labels_per_multilabel_iter=2 # controls the density of binary labels per accumulative iteration
features_per_iter= int(X_width/(n_classes_per_col-1))
for seed in range(n_classes_per_col-1):
    tempX, tempy = make_multilabel_classification(n_samples=1000, 
                                        n_features= features_per_iter, 
                                        n_classes=y_width, 
                                        n_labels=av_labels_per_multilabel_iter,
                                        allow_unlabeled=True,
                                        random_state=seed)
    if X is None:
        X=tempX
        y=tempy
    else:
        X = np.concatenate([X,tempX], axis=1)
        y += tempy
        
print(X.shape, y.shape)
print("X training sample\n", X[0:5])
print("y training sample\n", y[0:5])


X_train, X_test, y_train, y_test = train_test_split(X,y)
scale = StandardScaler()
X_train = scale.fit_transform(X_train)
X_test = scale.transform(X_test)

print("Type of target=", type_of_target(y_train))

from sklearn.metrics import f1_score

estimators= [ MultiOutputClassifier(LogisticRegression(multi_class="multinomial")), KNeighborsClassifier() ]

# Have to do a manual evaluation, since sklearn's metrics don't support multioutput-multiclass yet...
for est in estimators:
    print("\n", type(est).__name__)
    est.fit(X_train, y_train)
    y_hats = est.predict(X_test)
    f1s = []
    for c in range(y_test.shape[1]):
        yc = y_test[:,c]
        yhc = y_hats[:,c]
        this_f1 = f1_score(yc, yhc, average="weighted")
        f1s.append(this_f1)
        print("For column {} F1 = {}".format(c, this_f1))
        print(confusion_matrix(yc,yhc))
    print("Per column F1s:", np.round(f1s,2), "Mean F1:", np.round(np.mean(f1s),2))