<h2>Multilabel Classification</h2>

In this module, we look at Logistic Regression and SoftMax regression for multilabel classification problems. 

The data is wine quality, the features are as below

1. fixed acidity
2. volatile acidity
3. citric acid
4. residual sugar
5. chlorides
6. free sulfur dioxide
7. total sulfur dioxide
8. density
9. pH
10. sulphates
11. alcohol
12. quality (score between 0 and 10) <- this is our target

In [54]:
import numpy as np
import pandas as pd

#the delimiter in this data is ';' instead of ','
wine = pd.read_csv('winequality-red.csv',sep=';')
wine.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [60]:
#all columns are numeric except for target
wine.dtypes

fixed acidity           float64
volatile acidity        float64
citric acid             float64
residual sugar          float64
chlorides               float64
free sulfur dioxide     float64
total sulfur dioxide    float64
density                 float64
pH                      float64
sulphates               float64
alcohol                 float64
quality                   int64
quality_cat             float64
dtype: object

A quick check on the target shows some rare values:

In [57]:
wine['quality'].value_counts()

5    681
6    638
7    199
4     53
8     18
3     10
Name: quality, dtype: int64

To simplify this analysis, we will create new quality bins to have the label more balance

1. quality <= 5
2. quality = 6
3. quality >= 7

Recall, we can use pandas.cut() to bin a variable:

In [76]:
wine["quality_cat"] = pd.cut(wine["quality"],              #the column to use for labeling
                             bins=[0, 5, 6, np.inf],  #the ranges of the labels
                             labels=[1, 2, 3]) 
wine['quality_cat'].value_counts()

1    744
2    638
3    217
Name: quality_cat, dtype: int64

In [77]:
wine.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,quality_cat
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,1
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,1
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,1
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,2
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,1


Similar to any other data, we split training and testing first. We will stratify by the class quality

In [78]:
from sklearn.model_selection import StratifiedShuffleSplit

split = StratifiedShuffleSplit(n_splits=1, test_size=0.25, random_state=42)

for train_index, test_index in split.split(wine, wine['quality']):
    strat_train_set = wine.loc[train_index]
    strat_test_set = wine.loc[test_index]
    
trainX = strat_train_set.iloc[:,:-2]                   #since index -2 is the original quality, we exclude it from trainX
trainY = strat_train_set.iloc[:,-1]                    #and trainY is index -1 which is quality_cat
testX = strat_test_set.iloc[:,:-2]
testY = strat_test_set.iloc[:,-1]
trainX.shape, trainY.shape, testX.shape, testY.shape

((1199, 11), (1199,), (400, 11), (400,))

All columns are numeric without missing data. We will perform log transformation then standardization.

In [79]:
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import FunctionTransformer

def log_transform(X):           
    log_X = np.log(X + 1.001)                      
    return np.c_[X,log_X]

num_pipeline = Pipeline([
    ('log transform', FunctionTransformer(log_transform, validate=False)),
    ('standardize', StandardScaler())
])

Since there are no class columns, we just need to fit_transform from num_pipeline without ColumnTransformer

In [81]:
trainX_prc = num_pipeline.fit_transform(trainX)
testX_prc = num_pipeline.transform(testX)
trainX_prc.shape, testX_prc.shape

((1199, 22), (400, 22))

<h3>Logistic Regression</h3>

We will finetune models using all 3 regularization methods. In terms of writing codes, nothing changes compared to the binary classification case :)

<h4>L2 Regularization</h4>

In [82]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import f1_score

#create new model
logistic = LogisticRegression()

from sklearn.model_selection import GridSearchCV

param_grid = [{'C': [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1 , 5, 10, 50, 100]}]

#the default is l2 anyway, but we can specify it so that the code is clear
#logistic regression is also trained iteratively, we can increase max_iter if you see some warning from sklearn
logistic = LogisticRegression(penalty='l2', max_iter=5000)

grid_search = GridSearchCV(logistic, param_grid, cv=5, scoring='accuracy', return_train_score=True)

grid_search.fit(trainX_prc,trainY)

GridSearchCV(cv=5, error_score=nan,
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=5000, multi_class='auto',
                                          n_jobs=None, penalty='l2',
                                          random_state=None, solver='lbfgs',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid=[{'C': [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10,
                                50, 100]}],
             pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
             scoring='accuracy', verbose=0)

Let's look at the best model

In [83]:
print(grid_search.best_params_)
print(grid_search.best_score_)

{'C': 100}
0.638863319386332


And apply it on the testing data

In [84]:
best_l2_logistic = grid_search.best_estimator_

best_l2_logistic.score(testX_prc, testY)

0.6325

<h4>L1 Regularization</h4>

This is similar to LASSO, we add sum of absolute values to the training objective

In [85]:
param_grid = [{'C': [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1 , 5, 10, 50, 100]}]

#now we need to specify penalty to l1
#also, we need to set solver to 'liblinear' because the default solver doesn't support l1
logistic = LogisticRegression(penalty='l1', max_iter=5000, solver='liblinear')

grid_search = GridSearchCV(logistic, param_grid, cv=5, scoring='accuracy', return_train_score=True)

grid_search.fit(trainX_prc,trainY)

GridSearchCV(cv=5, error_score=nan,
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=5000, multi_class='auto',
                                          n_jobs=None, penalty='l1',
                                          random_state=None, solver='liblinear',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid=[{'C': [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10,
                                50, 100]}],
             pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
             scoring='accuracy', verbose=0)

Best model performance:

In [86]:
print(grid_search.best_params_)
print(grid_search.best_score_)

{'C': 1}
0.6280230125523013


And in the testing data

In [87]:
best_l1_logistic = grid_search.best_estimator_

best_l1_logistic.score(testX_prc, testY)

0.6425

<h4>Elastic Net Regularization</h4>

Similar to Elastic Net model in regression, we use both l1 and l2 regularization in the same model. Now we need to finetune both C and l1_ratio

In [88]:
#now penalty is changed to elasticnet
#and we need to change solver to saga
#this model is taking a long time to train due to the data size
#so I reduce the number of values in each parameter
logistic = LogisticRegression(penalty='elasticnet', max_iter=5000, solver='saga')

param_grid = [{
    'C': [0.001, 0.01, 0.1, 1, 10, 100],
    'l1_ratio': [0.25, 0.5, 0.75]
}]

grid_search = GridSearchCV(logistic, param_grid, cv=5, scoring='accuracy', return_train_score=True)

grid_search.fit(trainX_prc,trainY)

GridSearchCV(cv=5, error_score=nan,
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=5000, multi_class='auto',
                                          n_jobs=None, penalty='elasticnet',
                                          random_state=None, solver='saga',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid=[{'C': [0.001, 0.01, 0.1, 1, 10, 100],
                          'l1_ratio': [0.25, 0.5, 0.75]}],
             pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
             scoring='accuracy', verbose=0)

Best model performance

In [89]:
print(grid_search.best_params_)
print(grid_search.best_score_)

{'C': 0.1, 'l1_ratio': 0.75}
0.641352859135286


And apply to testing data

In [90]:
best_enet_logistic = grid_search.best_estimator_

best_enet_logistic.score(testX_prc, testY)

0.63

<h3>SoftMax Regression</h3>

We still use Logistic regression. Besides changing multi_class to 'multinomial', everything else is exactly the same as above :) We still try all three regularization methods and finetune them

<h4>L2 Regularization</h4>

In [91]:
#create new model
logistic = LogisticRegression(multi_class='multinomial')

from sklearn.model_selection import GridSearchCV

param_grid = [{'C': [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1 , 5, 10, 50, 100]}]

#the default is l2 anyway, but we can specify it so that the code is clear
#logistic regression is also trained iteratively, we can increase max_iter if you see some warning from sklearn
logistic = LogisticRegression(penalty='l2', max_iter=5000)

grid_search = GridSearchCV(logistic, param_grid, cv=5, scoring='accuracy', return_train_score=True)

grid_search.fit(trainX_prc,trainY)

GridSearchCV(cv=5, error_score=nan,
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=5000, multi_class='auto',
                                          n_jobs=None, penalty='l2',
                                          random_state=None, solver='lbfgs',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid=[{'C': [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10,
                                50, 100]}],
             pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
             scoring='accuracy', verbose=0)

Let's look at the best model

In [92]:
print(grid_search.best_params_)
print(grid_search.best_score_)

{'C': 100}
0.638863319386332


And apply it on the testing data

In [93]:
best_l2_logistic = grid_search.best_estimator_

best_l2_logistic.score(testX_prc, testY)

0.6325

<h4>L1 Regularization</h4>

This is similar to LASSO, we add sum of absolute values to the training objective

In [94]:
param_grid = [{'C': [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1 , 5, 10, 50, 100]}]

#now we need to specify penalty to l1
#also, we need to set solver to 'saga' because it is the only one supports both multinomial and l1 regularization
logistic = LogisticRegression(multi_class='multinomial', penalty='l1', max_iter=5000, solver='saga')

grid_search = GridSearchCV(logistic, param_grid, cv=5, scoring='accuracy', return_train_score=True)

grid_search.fit(trainX_prc,trainY)

GridSearchCV(cv=5, error_score=nan,
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=5000,
                                          multi_class='multinomial',
                                          n_jobs=None, penalty='l1',
                                          random_state=None, solver='saga',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid=[{'C': [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10,
                                50, 100]}],
             pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
             scoring='accuracy', verbose=0)

Best model performance:

In [95]:
print(grid_search.best_params_)
print(grid_search.best_score_)

{'C': 10}
0.6396931659693166


And in the testing data

In [96]:
best_l1_logistic = grid_search.best_estimator_

best_l1_logistic.score(testX_prc, testY)

0.6375

<h4>Elastic Net Regularization</h4>

Similar to Elastic Net model in regression, we use both l1 and l2 regularization in the same model. Now we need to finetune both C and l1_ratio

In [97]:
#now penalty is changed to elasticnet
#and we need to change solver to saga
#this model is taking a long time to train due to the data size
#so I reduce the number of values in each parameter
logistic = LogisticRegression(multi_class='multinomial',penalty='elasticnet', max_iter=5000, solver='saga')

param_grid = [{
    'C': [0.001, 0.01, 0.1, 1, 10, 100],
    'l1_ratio': [0.25, 0.5, 0.75]
}]

grid_search = GridSearchCV(logistic, param_grid, cv=5, scoring='accuracy', return_train_score=True)

grid_search.fit(trainX_prc,trainY)

GridSearchCV(cv=5, error_score=nan,
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=5000,
                                          multi_class='multinomial',
                                          n_jobs=None, penalty='elasticnet',
                                          random_state=None, solver='saga',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid=[{'C': [0.001, 0.01, 0.1, 1, 10, 100],
                          'l1_ratio': [0.25, 0.5, 0.75]}],
             pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
             scoring='accuracy', verbose=0)

Best model performance

In [98]:
print(grid_search.best_params_)
print(grid_search.best_score_)

{'C': 0.1, 'l1_ratio': 0.75}
0.641352859135286


And apply to testing data

In [99]:
best_enet_logistic = grid_search.best_estimator_

best_enet_logistic.score(testX_prc, testY)

0.63

<h3>All Model Summarization</h3>

|Model|Training CV Accuracy| Testing Accuracy|
|-----|--------------------|-----------------|
|L2 Logistic|0.6389|0.6325|
|L1 Logistic|0.6280|0.6425|
|ENet Logistic|0.6414|0.6300|
|L2 SoftMax|0.6389|0.6325|
|L1 SoftMax|0.6397|0.6375|
|ENet SoftMax|0.6414|0.63|

This is a difficult dataset so all models' performance are similar at around 64% accuracy. Furthermore, L2 and ENet Logistic models seem to have identical performance to L2 and ENet SoftMax models. The best model on test data is L1 Logistic, however the improvement is very slight.