# Classification

This notebook discusses Multi-label classification methods for the [academia.stackexchange.com](https://academia.stackexchange.com/) data dump.

Multilabel classification can be divided into three categories: problem transformation, algorithm adaption and ensembles.

## Table of Contents
* [Problem Transformation](#problem_transformation)
* [Algorithm Adaption](#algorithm_adaption)
* [Ensembles](#ensembles)

<a id='problem_transformation'/>

## Problem Transformation

Problem transformation methods divide the multi label classification into P binary classification tasks (b = |Labels|).

**Multioutput Classifier**
MultiouputClassifier transforms sklearn classifiers into classifiers capable of Binary Relevence.

**DecisionTreeClassifier**
Find logical structures inside the data and classify the label based on the resulting rule set.

**KNeighborsClassifier**
Find the k nearest neighbours and classify the label if based on the neighbours label.

**MLPClassifier**
Uses a multi-layer perceptron to decide whether the label should be assigned or not.

**LinearSVC**
Builds a hyperplane in the feature space that seperates positive and negative samples. Assigns label based on the location inside the feature space.

**LogisticRegression**
Calculates the probrability for that the samples belongs to the label.

**Classifier Chain**
<cite>[Read et al., 2011][1]</cite>
Connects the binary classifiers by using the results of the previous binary classifiers.

**LabelPowerset**
<cite>[Read et al., 2011][1]</cite>
Uses a binary classifier for each set of labels occuring the data set.

**ClasswiseClassifier**
This is a self written classifier, that improves the binary relevence by adding grid search and undersampling.

[1]: https://doi.org/10.1007/s10994-011-5256-5

<a id='multioutput'/>

## Algorithm Adaption

Classifiers in the category of algorithm adaption where specifically designed for multi-label classification tasks. They are usually adaptions of classifiers used for binary classification.

**MLkNN**

> Firstly, for each test instance, its k nearest neighbors in the training set are identified. Then, according to statistical information gained from the label sets of these neighboring instances, i.e. the number of neighboring instances belonging to each possible class, maximum a posteriori (MAP) principle is utilized to determine the label set for the test instance.
<cite>[Zhang & Zhou, 2007][1]</cite>

**MLARAM**

> an extension of fuzzy Adaptive Resonance Associative Map (ARAM) – an Adaptive Resonance Theory (ART)based neural network. It aims at speeding up the classification process in the presence of very large data.
<cite>[F. Benites & E. Sapozhnikova, 2015][2]</cite>

[1]: https://doi.org/10.1016/j.patcog.2006.12.019
[2]: https://doi.org/10.1109/ICDMW.2015.14

<a id=ensembles/>

## Ensembles

Classifier ensembles usually train a bunch of classifiers that decide together which labels should be applied to a sample.

**RAkEL**
> Rakel: randomly breaking the initial set of labels into a number of small-sized labelsets, and employing [Label powerset] to train a corresponding multilabel classifier.
<cite>[Tsoumakas et al., 2011][1]</cite>

*RAkELo*
> Divides the label space in to m subsets of size k, trains a Label Powerset classifier for each subset and assign a label to an instance if more than half of all classifiers (majority) from clusters that contain the label assigned the label to the instance.
<cite>[skmultilearn][2]</cite>

*RAkELd*
>Divides the label space in to equal partitions of size k, trains a Label Powerset classifier per partition and predicts by summing the result of all trained classifiers.
<cite>[skmultilearn][3]</cite>

**MajorityVotingClassifier**
Uses a couple of multi-label classifiers and decides on the classfication by majority voting.


[1]: https://doi.org/10.1109/TKDE.2010.164
[2]: http://scikit.ml/api/skmultilearn.ensemble.rakelo.html#skmultilearn.ensemble.RakelO
[3]: http://scikit.ml/api/skmultilearn.ensemble.rakeld.html#skmultilearn.ensemble.RakelD