# MultiLabel Classification

### Dependencies:
#### Python (2.7)
* pip install sklearn
* pip install scikit-multilearn
* pip install future
* pip install python-igraph
* python-graph-tool: Use this tutorial to install it on ubuntu https://zhangkaiyuan.com/2018/03/10/Install-graphtools-on-Ubuntu/

### Sources:
* http://scikit.ml/api/classify.html#ensemble-approaches
* http://scikit.ml/
* https://en.wikipedia.org/wiki/Multi-label_classification
* https://www.analyticsvidhya.com/blog/2017/08/introduction-to-multi-label-classification/
* https://pdfs.semanticscholar.org/6b56/91db1e3a79af5e3c136d2dd322016a687a0b.pdf
* http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/pr07.pdf

## Generate data
We will generate artificial ,multi-label dataset 

In [2]:
from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split


"""
sparse=True: returns sparse matrix (matrix having large amount of zero elemets)
n_labels: The average number of labels per instance
return_indicator='sparse': returns the sparse binary indicator format.
allow_unlabeled: If True, some instances might not belong to any class.
"""
X, y = make_multilabel_classification(sparse=True, n_labels=20, return_indicator = 'sparse', allow_unlabeled = False)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

# Techniques:

## 1. Transformation methods:
Problem transformation methods transform a multi-label classification problem in one or more single-label classification problem.

### 1.1 Binary relevance:
This baseline approach, amounts to independently training one binary classifier for each label: Given an unseen sample, the combined model then predicts all labels for this sample for which the respective classifiers predict a positive result.

Suppose we have q labels, the binary relevance method creates q new data sets, one for each label and train single-label classifiers on each new data set.
(Note that this approach will not work well when there's dependencies between the labels). 

![title](img/binary_relevance1.png)
![title](img/binary_relevance2.png)


![title](img/binary_relevance_chart.png)

This method is the most simple and efficient it has one drawback: doesn’t consider labels correlation because it treats every target variable independently.

In [11]:
from skmultilearn.problem_transform import BinaryRelevance
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score


classifier = BinaryRelevance(GaussianNB())

# train:
classifier.fit(X_train, y_train)

# predict:
prediction = classifier.predict(X_test)

# check accuracy:
accuracy_score(y_test, prediction)

0.69999999999999996

### 1.2 Classifier chains:
In this method, each classifier is trained on the output of the previous classifier (when the first one is trained on the input data)


For example, if we have the following dataset X with 4 labels Y:
![title](img/classifier_chains2.png)

We would transform the probelm to the following:
(The white represents the output and the yellow the input)
![title](img/classifier_chains.png)

We can think of classifier chains in the following way:
![title](img/classifier_chains_chart.png)

This method combines the computational efficiency of the Binary Relevance  while still being able to take the label dependencies into account for classification.

In [12]:
from skmultilearn.problem_transform import ClassifierChain

classifier = ClassifierChain(GaussianNB())

# train:
classifier.fit(X_train, y_train)

# predict:
prediction = classifier.predict(X_test)

# check accuracy:
accuracy_score(y_test, prediction)

0.84999999999999998

### 1.3 Label powerset:
In this method, we will transform the problem into a multi-class: this transformation creates one binary classifier for every label combination found in the training set.

For example, the follwoing dataset
![title](img/label_powerset.png)
turn into the follwoing:
![title](img/label_powerset2.png)
(we can see that (x1, x4) and (x3, x6) have the same labels)

In [16]:
from skmultilearn.problem_transform import LabelPowerset

classifier = LabelPowerset(GaussianNB())

# train:
classifier.fit(X_train, y_train)

# predict:
prediction = classifier.predict(X_test)

# check accuracy:
accuracy_score(y_test, prediction)

0.75

## 2. Adapted algorithm
Instead of changing the problems, we could change the algorithm to support multi label classification.

Example of these algorithms are KNN, desicion trees, boosting, neural networks, ets 

### 2.1 Multi Lavel KNN (MLkNN)

MLkNN is derived from the traditional K-nearest neighbor (KNN) algorithm. 
In detail, for each unseen instance, its K nearest neighbors in the training set are firstly identified. After that, based on statistical information gained from the label sets of these neighboring instances (i.e. the number of neighboring instances belonging to each possible class) maximum a posteriori (MAP) principle is utilized to determine the label set for the unseen instance. 


For more information:
http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/pr07.pdf

In [18]:
from skmultilearn.adapt import MLkNN

classifier = MLkNN(k=20)

# train
classifier.fit(X_train, y_train)

# predict
prediction = classifier.predict(X_test)

# check accuracy
accuracy_score(y_test, prediction)

0.84999999999999998

## 3. Ensemble learning
Ensemble learning is a machine learning paradigm where multiple learners are trained to solve the same problem.

It is often useful to train more than one model for a subset of labels in multi-label classification (for large label spaces - a well-selected smaller label subspace can allow more efficient classification)


As rule of thumb, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone

An example code for an ensemble of RandomForests under a Label Powerset multi-label classifiers trained for each label subspace - partitioned using fast greedy community detection methods on a label co-occurrence graph looks like this:


In [29]:
from sklearn.ensemble import RandomForestClassifier
from skmultilearn.problem_transform import LabelPowerset
from skmultilearn.cluster import IGraphLabelCooccurenceClusterer
from skmultilearn.ensemble import LabelSpacePartitioningClassifier

# base classifier
base_classifier = RandomForestClassifier()

# setup problem transformation approach with sparse matrices for random forest
problem_transform_classifier = LabelPowerset(classifier=base_classifier,
    require_dense=[False, False])

# partition the label space using fastgreedy community detection
clusterer = IGraphLabelCooccurenceClusterer('fastgreedy', weighted=True, include_self_edges=True)

# setup the ensemble metaclassifier
classifier = LabelSpacePartitioningClassifier(problem_transform_classifier, clusterer)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

accuracy_score(y_test, prediction)

0.84999999999999998