## Import Packages Etc

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
from sklearn.model_selection import train_test_split
from sklearn.base import BaseEstimator
from sklearn.base import ClassifierMixin
from sklearn.base import MetaEstimatorMixin
from sklearn.base import clone
from sklearn.utils import check_X_y, check_array
from sklearn.utils.validation import check_is_fitted
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score, hamming_loss
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from imblearn.under_sampling import RandomUnderSampler
from sklearn.model_selection import GridSearchCV

Using TensorFlow backend.


## Task 0: Load the Yeast Dataset

In [2]:
# Write your code here
dataset = pd.read_csv('yeast.csv')
dataset.head(3)
X = dataset[dataset.columns[1:103]]
Y = dataset[dataset.columns[103:117]]
Y.head(3)

Unnamed: 0,Class1,Class2,Class3,Class4,Class5,Class6,Class7,Class8,Class9,Class10,Class11,Class12,Class13,Class14
0,0,0,0,0,0,0,1,1,0,0,0,1,1,0
1,0,0,1,1,0,0,0,0,0,0,0,0,0,0
2,0,1,1,0,0,0,0,0,0,0,0,1,1,0


### Train-Test Split

In [3]:
X_train,X_test,y_train,y_test = train_test_split(X,Y,train_size=0.7) 

## Task 1: Implement the Binary Relevance Algorithm

We pass the base estimator to Binary relevance and append each estimator prediction for the number labels (14 classes for this dataset) in a list.
We will first check the X and Y to be 2D and 1D array respectively using validation. 
We will then fit the ith column of Y and append it in estimator list and will clone it every time to keep a copy of estimator for every ith column, instead of updating it for all 14 labels.
Prediction is made after checking whether the data is fitted or not. 

In [5]:
class BinaryRelevance(BaseEstimator, ClassifierMixin, MetaEstimatorMixin):
    
    """ We are implementing Binary Relevance Classifier. In this for each label a distinct classifier is trained to
    to indicate label relevance."""
    
    """ Constructor for the classifier object 
    """
    
    def __init__(self,estimator):
        
        
        self.estimator = estimator
        self.estimators_ =[]
       
        
    def fit(self, X, y):
        
        """ Fitting the estimator.
        
        Parameters:
        X : (sparse) array-like, shape = [n_samples, n_features]
            Data
        y : (sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
            Multi-label targets
            
        """
        X,y = check_X_y(X,y, accept_sparse = True, multi_output = True)
        y = check_array(y,accept_sparse=True)
        
        for i in range(y.shape[1]):
            self.estimators_.append(clone(self.estimator).fit(X,y[:,i])) 
    
    
            
    def predict(self,X):
        
        """ Predicts the labels ."""
        pred =[]
                
        check_is_fitted(self,'estimators_')
        for e in self.estimators_ :
            pred.append(e.predict(X))
        arr = np.asarray(pred)
        return arr.T

## Task 2: Implement the Binary Relevance Algorithm with Under-Sampling

We are undersampling the data. This is done to create a balanced dataset by downsizing the majority class, which is done by removing observations at random until the dataset is balanced. 
We traversed the columns of Y, one at a time to find whether the target label is imbalanced or not and under sampled the dataset if the target labels are not balanced. We then fit the under sampled data by appending estimator for each label in a list after checking whether the data is fitted or not and made predictions on the fitted data.

In [6]:
class BinaryRelevanceSampling(BaseEstimator, ClassifierMixin, MetaEstimatorMixin):
    
    def __init__(self,estimator):
        
        self.estimator = estimator
        self.estimators_ =[]
       
        
    def fit(self, X, y):
        
        X,y = check_X_y(X,y, accept_sparse = True, multi_output = True)
        y = check_array(y,accept_sparse=True)
        
        for i in range(y.shape[1]):
            
            unique_elements, target_count = np.unique(y[:,i],return_counts =True)
            label0_ratio = round(target_count[0] / (target_count[1] + target_count[0]), 2)
            label1_ratio = round(target_count[1] / (target_count[1] + target_count[0]), 2)
            label_ratio = (label0_ratio/label1_ratio)

            if label_ratio != 1:
                rus = RandomUnderSampler(random_state=42)
                X_res, y_res = rus.fit_resample(X, y[:,i])
                self.estimators_.append(clone(self.estimator).fit(X,y[:,i])) 
            else:
                self.estimators_.append(clone(self.estimator).fit(X,y[:,i])) 
    
    
            
    def predict(self,X):
        pred =[]
                
        check_is_fitted(self,'estimators_')
        for e in self.estimators_ :
            pred.append(e.predict(X))
        return np.array(pred).T

## Task 3: Compare the Performance of Different Binary Relevance Approaches

Here, we compare the performance of different Binary Relevance Approaches  and used GridSearchCV to find the best performing estimator from DecisionTree, SVC and LogisticRegression for our binary relevance. We passed a dictionary of classifiers with hyper tuned parameters and used hamming_loss scorer for finding out the performance of each classifier.

## Grid Search

#### Without Under Sampling 

In [11]:
from sklearn.metrics import make_scorer
param_grid = [
           
            {'estimator':[DecisionTreeClassifier()], 'estimator__criterion': ['gini', 'entropy'], 'estimator__max_depth':[10, 25, 50], 'estimator__min_samples_split': [2,5,10]},
            {'estimator':[SVC()], 'estimator__C':[0.1,1,10,100,1000], 'estimator__gamma':[1,0.1,0.01,0.001,0.0001], 'estimator__kernel':['rbf','linear']},
            {'estimator':[LogisticRegression()], 'estimator__max_iter':[10000,15000,20000]}
]

best_model = GridSearchCV(BinaryRelevance(param_grid), param_grid, verbose=2, n_jobs=1, scoring=make_scorer(hamming_loss, greater_is_better = False))
best_model.fit(X_train, y_train)

print("Best param set:")
print(best_model.best_params_)
print(best_model.best_score_)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 3 folds for each of 71 candidates, totalling 213 fits
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2 
[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.4s remaining:    0.0s


[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2, total=   1.5s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=25, estimator__min_samples_split=2, total=   2.0s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=25, estimator__

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=50, estimator__min_samples_split=2, total=   2.2s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=50, estimator__

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=10, estimator__min_samples_split=2, total=   1.9s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=10,

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=25, estimator__min_samples_split=2, total=   2.2s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=25

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=25, estimator__min_samples_split=10, total=   2.0s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=50, estimator__min_samples_split=10, total=   2.0s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=

[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.1, estimator__kernel=linear, total=   1.7s
[CV] estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.01, estimator__kernel=rbf 
[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.01, estimator__kernel=rbf, total=   2.0s
[CV] estimator=SVC(C=0.1, cache_siz

[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.0001, estimator__kernel=rbf, total=   1.8s
[CV] estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.0001, estimator__kernel=rbf 
[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.0001, estimator__kernel=rbf, total=   1.8s
[CV] estimator=SVC(C=0.1, cac

[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.1, estimator__kernel=rbf, total=   2.5s
[CV] estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.1, estimator__kernel=rbf 
[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.1, estimator__kernel=rbf, total=   2.4s
[CV] estimator=SVC(C=1, cache_size=200, class_weight=None, co

[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.001, estimator__kernel=linear, total=   1.8s
[CV] estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.001, estimator__kernel=linear 
[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.001, estimator__kernel=linear, total=   1.8s
[CV] estimator=SVC(C=1, cache_size

[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=1, estimator__kernel=linear, total=   3.2s
[CV] estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=1, estimator__kernel=linear 
[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=1, estimator__kernel=linear, total=   2.3s
[CV] estimator=SVC(C=10, cache_size=200, class_we

[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=0.01, estimator__kernel=linear, total=   2.3s
[CV] estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=0.001, estimator__kernel=rbf 
[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=0.001, estimator__kernel=rbf, total=   2.1s
[CV] estimator=SVC(C=10, cache_size

[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=1, estimator__kernel=rbf, total=   3.5s
[CV] estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False), estimator__C=100, estimator__gamma=1, estimator__kernel=rbf 
[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False), estimator__C=100, estimator__gamma=1, estimator__kernel=rbf, total=   3.5s
[CV] estimator=SVC(C=100, cache_size=200, class_weight

[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.01, estimator__kernel=rbf, total=   2.7s
[CV] estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.01, estimator__kernel=rbf 
[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.01, estimator__kernel=rbf, total=   2.6s
[CV] estimator=SVC(C=100, cache_size=200, c

[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.0001, estimator__kernel=rbf, total=   2.1s
[CV] estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.0001, estimator__kernel=linear 
[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.0001, estimator__kernel=linear, total=   5.4s
[CV] estimator=SVC(C=100,

[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.1, estimator__kernel=linear, total= 1.1min
[CV] estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.1, estimator__kernel=linear 
[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.1, estimator__kernel=linear, total= 1.6min
[CV] estimator=SVC(C=1000, c

[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.001, estimator__kernel=linear, total= 1.6min
[CV] estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.001, estimator__kernel=linear 
[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.001, estimator__kernel=linear, total= 1.5min
[CV] estimato



[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 
[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
               




[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 
[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.2s




[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 
[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
               

[Parallel(n_jobs=1)]: Done 213 out of 213 | elapsed: 28.9min finished



Best param set:
{'estimator': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False), 'estimator__C': 1, 'estimator__gamma': 1, 'estimator__kernel': 'rbf'}
-0.18835008870490832


#### Passing SVM (best model) from Grid Search to Binary Relevance without sampling

In [13]:
y_pred = model_tuned.predict(X_test)

# Evaluating the performnace with SVM
print(("Hamming Loss : %f " ) % (hamming_loss(y_test,y_pred)))

Hamming Loss : 0.191263 


#### With Under Sampling

In [8]:
from sklearn.metrics import make_scorer
param_grid = [
           
            {'estimator':[DecisionTreeClassifier()], 'estimator__criterion': ['gini', 'entropy'], 'estimator__max_depth':[10, 25, 50], 'estimator__min_samples_split': [2,5,10]},
            {'estimator':[SVC()], 'estimator__C':[0.1,1,10,100,1000], 'estimator__gamma':[1,0.1,0.01,0.001,0.0001], 'estimator__kernel':['rbf','linear']},
            {'estimator':[LogisticRegression()], 'estimator__max_iter':[10000,15000,20000]}
]

model_tuned = GridSearchCV(BinaryRelevanceSampling(param_grid), param_grid, verbose=2, n_jobs=1, scoring=make_scorer(hamming_loss, greater_is_better = False))
model_tuned.fit(X_train, y_train)

print("Best param set:")
print(model_tuned.best_params_)
print(model_tuned.best_score_)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 3 folds for each of 71 candidates, totalling 213 fits
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2 
[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.5s remaining:    0.0s


[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2, total=   1.6s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=25, estimator__min_samples_split=2, total=   2.4s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=25, estimator__

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=50, estimator__min_samples_split=2, total=   2.6s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=50, estimator__

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=10, estimator__min_samples_split=2, total=   2.2s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=10,

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=25, estimator__min_samples_split=2, total=   2.4s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=25

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=25, estimator__min_samples_split=10, total=   2.0s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=50, estimator__min_samples_split=10, total=   2.0s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=

[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.1, estimator__kernel=linear, total=   1.7s
[CV] estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.01, estimator__kernel=rbf 
[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.01, estimator__kernel=rbf, total=   2.1s
[CV] estimator=SVC(C=0.1, cache_siz

[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.0001, estimator__kernel=rbf, total=   1.9s
[CV] estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.0001, estimator__kernel=rbf 
[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.0001, estimator__kernel=rbf, total=   1.8s
[CV] estimator=SVC(C=0.1, cac

[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.1, estimator__kernel=rbf, total=   2.4s
[CV] estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.1, estimator__kernel=rbf 
[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.1, estimator__kernel=rbf, total=   2.4s
[CV] estimator=SVC(C=1, cache_size=200, class_weight=None, co

[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.001, estimator__kernel=linear, total=   1.8s
[CV] estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.001, estimator__kernel=linear 
[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.001, estimator__kernel=linear, total=   1.8s
[CV] estimator=SVC(C=1, cache_size

[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=1, estimator__kernel=linear, total=   2.4s
[CV] estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=1, estimator__kernel=linear 
[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=1, estimator__kernel=linear, total=   2.5s
[CV] estimator=SVC(C=10, cache_size=200, class_we

[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=0.01, estimator__kernel=linear, total=   2.3s
[CV] estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=0.001, estimator__kernel=rbf 
[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=0.001, estimator__kernel=rbf, total=   2.2s
[CV] estimator=SVC(C=10, cache_size

[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=1, estimator__kernel=rbf, total=   3.5s
[CV] estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False), estimator__C=100, estimator__gamma=1, estimator__kernel=rbf 
[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False), estimator__C=100, estimator__gamma=1, estimator__kernel=rbf, total=   3.5s
[CV] estimator=SVC(C=100, cache_size=200, class_weight

[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.01, estimator__kernel=rbf, total=   3.1s
[CV] estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.01, estimator__kernel=rbf 
[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.01, estimator__kernel=rbf, total=   2.7s
[CV] estimator=SVC(C=100, cache_size=200, c

[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.0001, estimator__kernel=rbf, total=   2.2s
[CV] estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.0001, estimator__kernel=linear 
[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.0001, estimator__kernel=linear, total=   5.9s
[CV] estimator=SVC(C=100,

[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.1, estimator__kernel=linear, total= 1.1min
[CV] estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.1, estimator__kernel=linear 
[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.1, estimator__kernel=linear, total= 1.7min
[CV] estimator=SVC(C=1000, c

[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.001, estimator__kernel=linear, total= 1.7min
[CV] estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.001, estimator__kernel=linear 
[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.001, estimator__kernel=linear, total= 1.3min
[CV] estimato




[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.4s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 


[Parallel(n_jobs=1)]: Done 213 out of 213 | elapsed: 29.6min finished


[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.3s
Best param set:
{'estimator': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False), 'estimator__C': 1, 'estimator__gamma': 1, 'estimator__kernel': 'rbf'}
-0.18835008870490832


#### Passing SVM (best model) from Grid Search to Binary Relevance with sampling

In [10]:
y_pred = model_tuned.predict(X_test)

# Evaluating the performnace with SVM
print(("Hamming Loss : %f " ) % (hamming_loss(y_test,y_pred)))

Hamming Loss : 0.191263 


## Task 4: Implement the Classifier Chains Algorithm

This classifier constructs a chain of classifiers for multi-label
    classification. A classifier chain model generates a chain of binary classifiers each of which predicts the presence or absence of a specific label. 
    For each label a
    classifier is trained. The classifier predicts one label. The first classifier predicts the first label of
    the label vector, whereas the second predicts the
    second label, but the first label is appended to the feature vector.
    

In [14]:
class ClassifierChain(BaseEstimator, ClassifierMixin, MetaEstimatorMixin):
    
    def __init__(self,estimator):
        
        self.estimator = estimator
        self.estimators_ = []
       
        
    def fit(self, X, y):
        
        X,y = check_X_y(X,y, accept_sparse = True, multi_output = True)
        y = check_array(y,accept_sparse=True)
        
        
        for i in range(y.shape[1]):
            
            c = clone(self.estimator)
            
            """Here, all classifiers in the chain from the nodes before
            this one are fitted to the training data, including any
            subsequently predicted and appended labels."""
            
            if (i==0):
                c.fit(X,y[:,0])
            else:
                
                """ if the classifiers aren't the first classifiers in the
                    chain then we use a transformed version of the features, i.e.
                    the previously predicted labels are appended.
                """
                stacked = np.hstack((X, y[:, :i]))
                c.fit(stacked, y[:, i])
                
        
            self.estimators_.append(c)          
            
    def predict(self,X):
        
        check_is_fitted(self,'estimators_')
        
        for i, c in enumerate(self.estimators_):
            
            if i == 0:
                y_pred = (c.predict(X)).reshape(-1, 1)
            else:
                stacked = np.hstack((X, y_pred))
                new_y = c.predict(stacked)
                y_pred = np.hstack((y_pred, new_y.reshape(-1, 1)))
                
        return y_pred

## Task 5: Evaluate the Performance of the Classifier Chains Algorithm

## Grid Search

In [16]:
from sklearn.metrics import make_scorer
param_grid = [
           
            {'estimator':[DecisionTreeClassifier()], 'estimator__criterion': ['gini', 'entropy'], 'estimator__max_depth':[10, 25, 50], 'estimator__min_samples_split': [2,5,10]},
            {'estimator':[SVC()], 'estimator__C':[0.1,1,10,100,1000], 'estimator__gamma':[1,0.1,0.01,0.001,0.0001], 'estimator__kernel':['rbf','linear']},
            {'estimator':[LogisticRegression()], 'estimator__max_iter':[10000,15000,20000]}
]

tuned_model = GridSearchCV(ClassifierChain(param_grid), param_grid, verbose=2, n_jobs=1, scoring=make_scorer(hamming_loss, greater_is_better = False))
tuned_model.fit(X_train, y_train)

print("Best param set:")
print(tuned_model.best_params_)
print(tuned_model.best_score_)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 3 folds for each of 71 candidates, totalling 213 fits
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2 
[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.2s remaining:    0.0s


[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2, total=   1.2s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=25, estimator__min_samples_split=2, total=   1.7s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=25, estimator__

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=50, estimator__min_samples_split=2, total=   1.7s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=50, estimator__

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=10, estimator__min_samples_split=2, total=   1.4s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=10,

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=25, estimator__min_samples_split=2, total=   1.4s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=25

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=25, estimator__min_samples_split=10, total=   1.4s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=25,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=

[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=50, estimator__min_samples_split=10, total=   1.4s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=50,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=10,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=entropy, estimator__max_depth=

[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.1, estimator__kernel=linear, total=   1.4s
[CV] estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.01, estimator__kernel=rbf 
[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.01, estimator__kernel=rbf, total=   2.1s
[CV] estimator=SVC(C=0.1, cache_siz

[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.0001, estimator__kernel=rbf, total=   1.9s
[CV] estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.0001, estimator__kernel=rbf 
[CV]  estimator=SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=0.1, estimator__gamma=0.0001, estimator__kernel=rbf, total=   1.9s
[CV] estimator=SVC(C=0.1, cac

[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.1, estimator__kernel=rbf, total=   1.9s
[CV] estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.1, estimator__kernel=rbf 
[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.1, estimator__kernel=rbf, total=   2.0s
[CV] estimator=SVC(C=1, cache_size=200, class_weight=None, co

[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.001, estimator__kernel=linear, total=   1.2s
[CV] estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.001, estimator__kernel=linear 
[CV]  estimator=SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1, estimator__gamma=0.001, estimator__kernel=linear, total=   1.2s
[CV] estimator=SVC(C=1, cache_size

[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=1, estimator__kernel=linear, total=   1.5s
[CV] estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=1, estimator__kernel=linear 
[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=1, estimator__kernel=linear, total=   1.7s
[CV] estimator=SVC(C=10, cache_size=200, class_we

[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=0.01, estimator__kernel=linear, total=   1.6s
[CV] estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=0.001, estimator__kernel=rbf 
[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=10, estimator__gamma=0.001, estimator__kernel=rbf, total=   2.1s
[CV] estimator=SVC(C=10, cache_size

[CV]  estimator=SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=1, estimator__kernel=rbf, total=   3.6s
[CV] estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False), estimator__C=100, estimator__gamma=1, estimator__kernel=rbf 
[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False), estimator__C=100, estimator__gamma=1, estimator__kernel=rbf, total=   3.6s
[CV] estimator=SVC(C=100, cache_size=200, class_weight

[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.01, estimator__kernel=rbf, total=   1.5s
[CV] estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.01, estimator__kernel=rbf 
[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.01, estimator__kernel=rbf, total=   1.5s
[CV] estimator=SVC(C=100, cache_size=200, c

[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.0001, estimator__kernel=rbf, total=   2.0s
[CV] estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.0001, estimator__kernel=linear 
[CV]  estimator=SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=100, estimator__gamma=0.0001, estimator__kernel=linear, total=   6.6s
[CV] estimator=SVC(C=100,

[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.1, estimator__kernel=linear, total= 1.1min
[CV] estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.1, estimator__kernel=linear 
[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.1, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.1, estimator__kernel=linear, total= 1.1min
[CV] estimator=SVC(C=1000, c

[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.001, estimator__kernel=linear, total= 1.1min
[CV] estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.001, estimator__kernel=linear 
[CV]  estimator=SVC(C=1000, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), estimator__C=1000, estimator__gamma=0.001, estimator__kernel=linear, total=  56.3s
[CV] estimato



[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 


[Parallel(n_jobs=1)]: Done 213 out of 213 | elapsed: 22.3min finished


[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.2s
Best param set:
{'estimator': SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False), 'estimator__C': 100, 'estimator__gamma': 0.01, 'estimator__kernel': 'rbf'}
-0.21035735405930558


### Passing SVM (best model) from Grid Search to Classifier Chain

In [17]:
y_pred = tuned_model.predict(X_test)

# Evaluating the performnace with SVM
print(("Hamming Loss : %f " ) % (hamming_loss(y_test,y_pred)))

Hamming Loss : 0.213991 


####  Binary Relevance performed a little better than Classifier Chain here probably as hamming loss is more in classifier chain because the loss is propagated through label predictions as each predicated label value is stacked in the dataset for next label prediction.


## Task 6: Reflect on the Performance of the Different Models Evaluated



The dataset consists of 103 descriptive features and there are 14 functional classes and a gene can be associated with any number of these classes.
We have split the data into 70% train set and 30% test set.


**Evaluation Measure:**


We considered hamming loss and F1 score as the evaluation measures. Hamming loss is the fraction of labels that are incorrectly predicted. We’ll be using hamming loss as it gives each label equal weight (Multi label classification). F1 score is used for binary data (Binary classes).


**Binary Relevance Performance (Grid Search):**


We are using grid search to select the estimator that has the best performance. The base estimators that we are considering for implementing binary relevance are Logistic Regression, Support Vector Machine (SVM) and Decision tree classifiers. Out of all these estimators, we got SVM the best base estimator.
We observed that the hamming loss for Support Vector Machine is **0.191263.**

This means that Support Vector Machine is the best estimator for the given dataset for multi-label classification. 


**Binary Relevance Performance with undersampling  (Grid Search):**


We checked the target value count in each of the 14 labels to find out which one is imbalanced, if the ratio of target values in a label is not 50%, then we performed under sampling for that label column using RandomUnderSampler which down sampled the majority target value by randomly removing data rows until the dataset is balanced. Then we use grid search to select the estimator that has the best performance. The base estimators that we are considering for implementing binary relevance are Logistic Regression, Support Vector Machine (SVM) and Decision tree classifiers. We observed that the hamming loss  for Support Vector Machine is **0.191263.**

This means that Support Vector Machine is the best estimator for the given dataset for multi-label classification. From this we can conclude that, there was no significant difference in the hamming loss after undersampling the dataset. 


**Classifier Chain Algorithm:**


We are implementing a Classifier Chain Algorithm. The base estimators that we are considering for implementing binary relevance are Logistic Regression, Support Vector Machine (SVM) and Decision tree classifiers.
Then we use grid search to select the estimator that has the best performance. We observed that the hamming loss for Support Vector Machine is **0.213991.** 

This means that Support Vector Machine is the best estimator for the classifier chain algorithm. 



**Conclusion:**


From this we can conclude that SVM is the best estimator for both Binary Relevance and Classifier Chain algorithms, with and without undersampling. 


Binary Relevance performed a little better than Classifier Chain here probably as hamming loss is more in classifier chain because the loss is propagated through label predictions as each predicated label value is stacked in the dataset for next label prediction.

