### Step forward feature selection

Step forward feature selection starts by training a machine learning model for each feature in the dataset and selects, as the starting feature, the one that returns the best performing model, according to a certain evaluation criteria we choose.

In the second step, it creates machine learning models for all combinations of the feature selected in the previous step and a second feature. It selects the pair that produces the best performing algorithm.

It continues by adding 1 feature at a time to the features that were pre-selected in previous steps, until a pre-determined stopping criteria.

In theory, models with more features, perform better. The algorithm will continue adding new features until a criteria is met. For example, until the model performance does not increase beyond a certain threshold. Or, as implemented in the library we will discuss in this notebook, until a certain number of features is selected.

The model performance metric can be the roc_auc for classification and the r squared for regression for example, and it is determined by the user. 

Step forward feature selection is called a greedy procedure, because it evaluates many possible single, double, triple and so on feature combinations. Therefore, it is very computationally expensive, and sometimes, if the feature space is big, even unfeasible.

There is a special package in Python that implements this type of feature selection: mlxtend.
http://rasbt.github.io/mlxtend/

In the mlxtend implementation of the Step Forward Feature Selection, the stopping criteria is an arbitrarily set number of features. So the search will finish when we reach the desired number of selected features.

This is somewhat arbitrary, we might be selecting a sub-opimal number of features, or likewise, a high number of features. But, by looking at the performance metric returned by the algorithm as it selects the features, we can have a view, if more features do add value, or not. 


**Note**
If we wanted to stop the search by using another criteria, we would have to code the algorithm ourselves, unfortunately :(

Here I will use the Step Forward feature selection algorithm from mlxtend in a classification and regression dataset.

In [2]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import roc_auc_score, r2_score,balanced_accuracy_score,top_k_accuracy_score,precision_score,recall_score

from mlxtend.feature_selection import SequentialFeatureSelector as SFS

## Classification

In [3]:
# load dataset

data = pd.read_csv('C:/Users/RAJENDRA REDDY/Downloads/finalData.csv')
data.shape

(1004, 36)

In [4]:
data.head()

Unnamed: 0,chroma_stft_min,chroma_stft_max,chroma_cqt_min,chroma_cqt_max,chroma_cens_min,chroma_cens_max,melspectogram_min,melspectogram_max,mfcc_min,mfcc_max,...,zero_crossing_rate_min,zero_crossing_rate_max,tempogram_min,tempogram_max,delta_mfcc_min,delta_mfcc_max,mel_to_stft_min,mel_to_stft_max,class,song
0,0.000465,1,0.0155,1,0.0,0.896673,4.63e-06,8115.6733,-179.931,152.82954,...,0.017578,0.510742,-3.41e-16,1,-22.53457,24.518091,0,19.284609,0,Sai Aaye Ghar Mere_shortened.wav
1,0.000995,1,0.055937,1,0.015298,0.7114,4.67e-07,911.08636,-205.9167,153.3341,...,0.044922,0.393555,-2.9e-16,1,-24.84063,25.185534,0,10.810534,0,Sai Baba De Kol_shortened.wav
2,0.002606,1,0.045407,1,0.0,0.748225,1.75e-06,4857.339,-153.78363,138.30722,...,0.023438,0.501953,-3.06e-16,1,-22.603357,29.282093,0,17.607744,0,Sai Baba Humko_shortened.wav
3,0.001447,1,0.041263,1,0.0,0.782758,3.11e-07,3757.0784,-194.9471,146.71315,...,0.020508,0.225586,-3.02e-16,1,-23.918428,24.815857,0,15.681977,0,Sai Baba Ji Kar Do_shortened.wav
4,0.002157,1,0.040596,1,0.001198,0.717803,5.21e-06,4824.614,-189.73987,157.58157,...,0.02002,0.328613,-3.98e-16,1,-19.628017,24.007666,0,16.976337,0,Sai Baba Mujhe Gale Se_shortened.wav


**Important**

In all feature selection procedures, it is good practice to select the features by examining only the training set. And this is to avoid overfit.

In [6]:
# separate train and test sets

X_train, X_test, y_train, y_test = train_test_split(
    data.drop(labels=['class','song'], axis=1),
    data['class'],
    test_size=0.3,
    random_state=0)

X_train.shape, X_test.shape

((702, 34), (302, 34))

### Remove Correlated features

Step Forward Feature Selection takes a long time to run, so to speed it up we will reduce the feature space by removing correlated features first.

In [6]:
# remove correlated features to reduce the feature space

def correlation(dataset, threshold):
    col_corr = set()  # Set of all the names of correlated columns
    corr_matrix = dataset.corr()
    for i in range(len(corr_matrix.columns)):
        for j in range(i):
            if abs(corr_matrix.iloc[i, j]) > threshold: # we are interested in absolute coeff value
                colname = corr_matrix.columns[i]  # getting the name of column
                col_corr.add(colname)
    return col_corr

corr_features = correlation(X_train, 0.8)
print('correlated features: ', len(set(corr_features)) )

correlated features:  10


In [7]:
# remove correlated features
X_train.drop(labels=corr_features, axis=1, inplace=True)
X_test.drop(labels=corr_features, axis=1, inplace=True)

X_train.shape, X_test.shape

NameError: name 'corr_features' is not defined

### Step Forward Feature Selection

For the Step Forward feature selection algorithm, we are going to use the class SFS from MLXtend:
http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/

In [8]:
# within the SFS we indicate:

# 1) the algorithm we want to create, in this case RandomForests
# (note that I use few trees to speed things up)

# 2) the stopping criteria: want to select 10 features 

# 3) wheter to perform step forward or step backward

# 4) the evaluation metric: in this case the roc_auc
# 5) the cross-validation

# this is going to take a while, do not despair

sfs = SFS(RandomForestClassifier(n_estimators=10, n_jobs=4, random_state=0), 
           k_features=10, # the more features we want, the longer it will take to run
           forward=True, 
           floating=False, # see the docs for more details in this parameter
           verbose=2, # this indicates how much to print out intermediate steps
           scoring='roc_auc',
           cv=2)

sfs = sfs.fit(np.array(X_train), y_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\lo

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\loc

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\loc

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 328, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Traceback (most recent call last):
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
    scores = scorer(estimator, X_test, y_test)
  File "c:\users\rajendra reddy\appdata\local\programs\python\python37\lib\site-packages\sklearn\metrics\_scorer.py", line 

From the output above, we can see that after adding the 8th feature, the performance begins to plateau. Adding the 9th and 10th feature did not increase the performance.

If instead of selecting 10 features, we select more as the stopping criteria, we could have a clearer view of the progression of the performance vs number of features.

In [9]:
selected_feat = X_train.columns[list(sfs.k_feature_idx_)]
selected_feat

Index(['chroma_stft_min', 'chroma_stft_max', 'chroma_cqt_min',
       'chroma_cqt_max', 'chroma_cens_min', 'chroma_cens_max',
       'melspectogram_min', 'melspectogram_max', 'mfcc_min', 'mfcc_max'],
      dtype='object')

### Compare performance of feature subsets

In [105]:
# function to train random forests and evaluate the performance

def run_randomForests(X_train, X_test, y_train, y_test):
    
    rf = RandomForestClassifier(n_estimators=200, random_state=39, max_depth=4)
    rf.fit(X_train, y_train)

    print('Train set')
    pred = rf.predict_proba(X_train)
    print('Random Forests roc-auc: {}'.format(roc_auc_score(y_train, pred,multi_class="ovo")))
    
    print('Test set')
    pred = rf.predict_proba(X_test)
    print('Random Forests roc-auc: {}'.format(roc_auc_score(y_test, pred,multi_class="ovo")))
    
X_train = X_train[selected_feat]
X_test =  X_test[selected_feat]
run_randomForests(X_train,
                  X_test,
                  y_train, y_test)

Train set
Random Forests roc-auc: 0.898904577296608
Test set
Random Forests roc-auc: 0.7890227022081862


In [230]:
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(criterion="entropy", max_depth=9)
clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('Decision tree roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

Decision tree roc-auc: 0.9900751570556199


In [234]:
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import AdaBoostClassifier

clf = AdaBoostClassifier(n_estimators=100)
clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('Ada Boost roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

Ada Boost roc-auc: 0.8384308611325159


In [233]:
from sklearn.ensemble import GradientBoostingClassifier


clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
  max_depth=1, random_state=0).fit(X_train, y_train)

clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('GradientBoostingClassifier roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

GradientBoostingClassifier roc-auc: 0.9671051986056828


In [245]:
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor
est = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1,
     max_depth=1, random_state=0, loss='ls').fit(X_train, y_train)
est = est.fit(X_train,y_train)
r2_score(y_train, est.predict(X_train))

0.4215925429000902

In [249]:
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingClassifier
clf = HistGradientBoostingClassifier(max_iter=100).fit(X_train, y_train)
clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('HistGradientBoostingClassifier roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

HistGradientBoostingClassifier roc-auc: 1.0


In [251]:
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('Gaussian roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

Gaussian roc-auc: 0.7688651657742775


In [252]:
from sklearn.ensemble import ExtraTreesClassifier
clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,
     min_samples_split=2, random_state=0)

clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('ExtraTreesClassifier roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

ExtraTreesClassifier roc-auc: 1.0


In [253]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import VotingClassifier

clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(n_estimators=50, random_state=1)
clf3 = GaussianNB()

eclf = VotingClassifier(
     estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)],
     voting='soft')

params = {'lr__C': [1.0, 100.0], 'rf__n_estimators': [20, 200]}

grid = GridSearchCV(estimator=eclf, param_grid=params, cv=5)
grid = grid.fit(X_train,y_train)
y_pred = grid.predict_proba(X_train)
print('ExtraTreesClassifier roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logist

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logist

ExtraTreesClassifier roc-auc: 0.973094602281672


In [263]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.svm import LinearSVC
estimators = [
    ('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
    ('svr', make_pipeline(StandardScaler(),ExtraTreesClassifier(n_estimators=10, max_depth=None,
     min_samples_split=2, random_state=0)))
]
clf = StackingClassifier(
    estimators=estimators, final_estimator=GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
  max_depth=1, random_state=0)
)
clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('StackingClassifier roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

StackingClassifier roc-auc: 0.7620933977455716


In [261]:
from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier

clf = BaggingClassifier(base_estimator=SVC(),
                        n_estimators=10, random_state=0)
clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('BaggingClassifier roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

BaggingClassifier roc-auc: 0.7029083517060815


In [268]:
from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import NeighborhoodComponentsAnalysis
from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.svm import LinearSVC
from sklearn.datasets import make_classification

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import AdaBoostClassifier

clf = AdaBoostClassifier(n_estimators=100)
clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('Ada Boost roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))


clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
  max_depth=1, random_state=0).fit(X_train, y_train)

clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('GradientBoostingClassifier roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))


clf = HistGradientBoostingClassifier(max_iter=100).fit(X_train, y_train)
clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('HistGradientBoostingClassifier roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))


clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,
     min_samples_split=2, random_state=0)

clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('ExtraTreesClassifier roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(n_estimators=50, random_state=1)
clf3 = GaussianNB()

eclf = VotingClassifier(
     estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)],
     voting='soft')

params = {'lr__C': [1.0, 100.0], 'rf__n_estimators': [20, 200]}

grid = GridSearchCV(estimator=eclf, param_grid=params, cv=5)
grid = grid.fit(X_train,y_train)
y_pred = grid.predict_proba(X_train)
print('Voting Classifier roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

clf = DecisionTreeClassifier(criterion="entropy", max_depth=9)
clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print("Decision Tree Accuracy:",roc_auc_score(y_train, y_pred,multi_class="ovo"))

clf = BaggingClassifier(base_estimator=SVC(),
                        n_estimators=10, random_state=0)
clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print('BaggingClassifier roc-auc: {}'.format(roc_auc_score(y_train, y_pred,multi_class="ovo")))

clf = KNeighborsClassifier(n_neighbors = 5)
clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)

print("KNN {}nn score: {}",roc_auc_score(y_train, y_pred,multi_class="ovo"))

clf = GaussianNB()
clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print("Accuracy of Naive Bayes Algo: ", roc_auc_score(y_train, y_pred,multi_class="ovo"))


nca = NeighborhoodComponentsAnalysis(random_state=42)
n = []
for i in range(500):
    
    knn = KNeighborsClassifier(n_neighbors=i+1)
    clf = Pipeline([('nca', nca), ('knn', knn)])
    clf = clf.fit(X_train,y_train)
    y_pred = clf.predict_proba(X_train)
    n.append(roc_auc_score(y_train, y_pred,multi_class="ovo"))
print("Accuracy of NeighborhoodComponentsAnalysis:",max(n))



clf = MLPClassifier(random_state=1, max_iter=600).fit(X_train, y_train)
clf = clf.fit(X_train,y_train)
y_pred = clf.predict_proba(X_train)
print("Accuracy of MLPClassifier",roc_auc_score(y_train, y_pred,multi_class="ovo"))



Ada Boost roc-auc: 0.8384308611325159
GradientBoostingClassifier roc-auc: 0.9671051986056828
HistGradientBoostingClassifier roc-auc: 1.0
ExtraTreesClassifier roc-auc: 1.0


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logist

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logist

Voting Classifier roc-auc: 0.973094602281672
Decision Tree Accuracy: 0.9901212504995727
BaggingClassifier roc-auc: 0.7029083517060815
KNN {}nn score: {} 0.8754564170522106
Accuracy of Naive Bayes Algo:  0.7688651657742775
Accuracy of NeighborhoodComponentsAnalysis: 1.0
Accuracy of MLPClassifier 0.7234985540569634


In [21]:
# and for comparison, we train random forests using
# all features (except the correlated ones, which we removed already)

run_randomForests(X_train,
                  X_test,
                  y_train, y_test)

Train set
Random Forests roc-auc: 0.90487007632422
Test set
Random Forests roc-auc: 0.7799622283428216


As you see, in this dataset, with 10 features we obtain a similar performance than that obtained using all variables in the dataset.

## Regression

Let's now repeat the process but in the context of regression. With the house prices dataset from Kaggle, the aim is to predict the continuous target: House Price.

In [22]:
# load dataset
data = pd.read_csv('C:/Users/RAJENDRA REDDY/Downloads/finalData.csv')
data.shape

(1004, 36)

In [23]:
# In practice, feature selection should be done after data pre-processing,
# so ideally, all the categorical variables are encoded into numbers,
# and then you can assess how deterministic they are of the target

# here for simplicity I will use only numerical variables
# select numerical columns:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
numerical_vars = list(data.select_dtypes(include=numerics).columns)
data = data[numerical_vars]
data.shape

(1004, 35)

In [25]:
# separate train and test sets

feature_cols = ['chroma_stft_min', 'chroma_stft_max', 'chroma_cqt_min',
       'chroma_cqt_max', 'chroma_cens_min', 'chroma_cens_max',
       'melspectogram_min', 'melspectogram_max', 'mfcc_min', 'mfcc_max',
       'rms_min', 'rms_max', 'spectral_centroid_min', 'spectral_centroid_max',
       'spectral_bandwidth_min', 'spectral_bandwidth_max',
       'spectral_contrast_min', 'spectral_contrast_max',
       'spectral_flatness_min', 'spectral_flatness_max',
       'spectral_rolloff_min', 'spectral_rolloff_max', 'poly_features_min',
       'poly_features_max', 'tonnetz_min', 'tonnetz_max',
       'zero_crossing_rate_min', 'zero_crossing_rate_max', 'tempogram_min',
       'tempogram_max', 'delta_mfcc_min', 'delta_mfcc_max', 'mel_to_stft_min',
       'mel_to_stft_max']

X_train, X_test, y_train, y_test = train_test_split(data[feature_cols],data['class'],test_size=0.3,random_state=0)

X_train.shape, X_test.shape

((702, 34), (302, 34))

### Remove correlated features

In [26]:
# find and remove correlated features

def correlation(dataset, threshold):
    col_corr = set()  # Set of all the names of correlated columns
    corr_matrix = dataset.corr()
    for i in range(len(corr_matrix.columns)):
        for j in range(i):
            if abs(corr_matrix.iloc[i, j]) > threshold: # we are interested in absolute coeff value
                colname = corr_matrix.columns[i]  # getting the name of column
                col_corr.add(colname)
    return col_corr

corr_features = correlation(X_train, 0.8)
print('correlated features: ', len(set(corr_features)) )

correlated features:  10


In [27]:
# removed correlated features
X_train.drop(labels=corr_features, axis=1, inplace=True)
X_test.drop(labels=corr_features, axis=1, inplace=True)

X_train.shape, X_test.shape

((702, 24), (302, 24))

In [28]:
X_train.fillna(0, inplace=True)
X_test.fillna(0, inplace=True)

### Step Forward Feature Selection

In [29]:
# step forward feature selection

sfs = SFS(RandomForestRegressor(n_estimators=10, n_jobs=4, random_state=10), 
           k_features=20, 
           forward=True, 
           floating=False, 
           verbose=2,
           scoring='r2',
           cv=2)

sfs = sfs.fit(np.array(X_train), y_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done  24 out of  24 | elapsed:   13.8s finished

[2021-05-19 00:56:52] Features: 1/20 -- score: 0.0847914556302356[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.4s remaining:    0.0s
[Parallel(n_jobs=1)]: Done  23 out of  23 | elapsed:  1.5min finished

[2021-05-19 00:58:20] Features: 2/20 -- score: 0.0847914556302356[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done  22 out of  22 | elapsed:    8.2s finished

[2021-05-19 00:58:28] Features: 3/20 -- score: 0.0847914556302356[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  

From the logs above, we see that after ~17 features, adding more features does not really improve performance.

In [30]:
# indices of the selected columns
sfs.k_feature_idx_

(0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 23)

In [31]:
# selected columns
X_train.columns[list(sfs.k_feature_idx_)]

Index(['chroma_stft_min', 'chroma_stft_max', 'chroma_cqt_max',
       'chroma_cens_min', 'chroma_cens_max', 'melspectogram_min',
       'melspectogram_max', 'mfcc_max', 'rms_min', 'spectral_centroid_min',
       'spectral_centroid_max', 'spectral_contrast_min',
       'spectral_contrast_max', 'spectral_flatness_min', 'tonnetz_max',
       'zero_crossing_rate_min', 'tempogram_min', 'tempogram_max',
       'delta_mfcc_min', 'mel_to_stft_min'],
      dtype='object')

### Compare performance of feature subsets

In [42]:
# function to train random forests and evaluate the performance

def run_randomForests(X_train, X_test, y_train, y_test):
    
    rf = RandomForestRegressor(n_estimators=200, random_state=39, max_depth=4)
    rf.fit(X_train, y_train)

    print('Train set')
    pred = rf.predict(X_train)
    print('Random Forests roc-auc: {}'.format(r2_score(y_train, pred)))
    
    print('Test set')
    pred = rf.predict(X_test)
    print('Random Forests roc-auc: {}'.format(r2_score(y_test, pred)))

In [40]:
selected_feat = X_train.columns[list(sfs.k_feature_idx_)]

In [41]:
# evaluate performance of algorithm built
# using selected features

run_randomForests(X_train[selected_feat],
                  X_test[selected_feat],
                  y_train, y_test)

Train set
Random Forests roc-auc: 0.49848488235607
Test set
Random Forests roc-auc: 0.36608657564555447


In [35]:
# and for comparison, we train random forests using
# all features (except the correlated ones, which we removed already)

run_randomForests(X_train,
                  X_test,
                  y_train, y_test)

Train set
Random Forests roc-auc: 0.5077416751276408
Test set
Random Forests roc-auc: 0.35794660293184233


We see that the algorithm with 20 features performs as well as that with 24 features.

That is all for this lecture. I hope you enjoyed it and see you in the next one!