# Problem Transformation

This notebook discusses Multi-label algorithm probelm transformation methods for the [academia.stackexchange.com](https://academia.stackexchange.com/) data dump.

## Table of Contents
* [Data import](#data_import)
* [Methods](#methods)

In [1]:
%load_ext autoreload
%autoreload 2

from joblib import dump, load
from academia_tag_recommender.classifier import Classifier

RANDOM_STATE = 0

<a id='data_import'/>

## Data import

In [5]:
from academia_tag_recommender.test_train_data import get_X_reduced, get_y, get_test_train_data
from academia_tag_recommender.preprocessing_definition import PreprocessingDefinition

preprocessing = PreprocessingDefinition('tfidf', 'basic', 'basic', 'english', '1,1', 'TruncatedSVD')

X = get_X_reduced(preprocessing)
y = get_y()
X_train, X_test, y_train, y_test = get_test_train_data(X, y)

c:\users\monique\masterthesis\academia_tag_recommender\models\dimension_reduction\v=tfidf&t=basic&p=basic&s=english&n=1,1&dim=TruncatedSVD.joblib


<a id='methods'/>

## Methods

Algorithms in scikit-learn suited for Multi-label problem transformation:
- [DecisionTreeClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier)
- [ExtraTreeClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.ExtraTreeClassifier.html#sklearn.tree.ExtraTreeClassifier)
- [KNeighborsClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier)
- [RadiusNeighborsClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.RadiusNeighborsClassifier.html#sklearn.neighbors.RadiusNeighborsClassifier)
- [MLPClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier)
- [RidgeClassifierCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifierCV.html#sklearn.linear_model.RidgeClassifierCV)

Multi-label suport can be added to all Classifiers using.
- [MultioutputClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html#sklearn.multioutput.MultiOutputClassifier)

sklearn-multilabel:
- [Label Powerset](http://scikit.ml/api/skmultilearn.problem_transform.lp.html#skmultilearn.problem_transform.LabelPowerset)

In [6]:
clf_paths = []
def create_classifier(classifier):
    clf = Classifier(classifier, preprocessing)
    clf.fit(X_train, y_train)
    clf.score(X_test, y_test)
    path = clf.save()
    print(clf.training_time)
    clf.evaluation.print_stats()
    clf_paths.append(path)

**DecisionTreeClassifier**

In [None]:
from sklearn.tree import DecisionTreeClassifier

create_classifier(DecisionTreeClassifier(random_state=RANDOM_STATE))

**ExtraTreeClassifier**

In [6]:
from sklearn.tree import ExtraTreeClassifier

create_classifier(ExtraTreeClassifier(random_state=RANDOM_STATE))

               Hamming Loss             Accuracy                 Precision                Recall                   F1                       
samples        0.01160854615248754      0.002299128751210068     0.08287955792191029      0.0863201839303001       0.07668573963951505      
micro                                                            0.07781897491821156      0.0804255702822108       0.07910080475314253      
macro                                                            0.019676004676311764     0.0195830566799495       0.019232965302402448     


**KNeighborsClassifier**

In [None]:
from sklearn.neighbors import KNeighborsClassifier

create_classifier(KNeighborsClassifier())

**RadiusNeighborsClassifier**

In [None]:
from sklearn.neighbors import RadiusNeighborsClassifier

create_classifier(RadiusNeighborsClassifier())

**MLPClassifier**

In [None]:
from sklearn.neural_network import MLPClassifier

create_classifier(MLPClassifier(random_state=RANDOM_STATE))

**RidgeClassifierCV**

*TODO: implementation needs to be adjusted for different prediction format: `[125 278 302 ...  80  64 158]`*

In [14]:
from sklearn.linear_model import RidgeClassifierCV

#create_classifier(RidgeClassifierCV(alphas=[1e-3, 1e-2, 1e-1, 1]))

**MultioutputClassifier**

In [None]:
from sklearn.svm import LinearSVC

create_classifier(Multioutput(LinearSVC(random_state=RANDOM_STATE)))

In [None]:
from sklearn.linear_model import LogisticRegression

create_classifier(Multioutput(LogisticRegression(random_state=RANDOM_STATE)))

In [None]:
from sklearn.naive_bayes import ComplementNB

create_classifier(Multioutput(ComplementNB()))

In [None]:
from sklearn.naive_bayes import MultinomialNB

create_classifier(Multioutput(MultinomialNB()))