#### Connectionist Bench (Sonar, Mines vs. Rocks) Data Set

Source:

The data set was contributed to the benchmark collection by Terry Sejnowski, now at the Salk Institute and the University of California at San Deigo. The data set was developed in collaboration with R. Paul Gorman of Allied-Signal Aerospace Technology Center.

Data Set Information:

The file "sonar.mines" contains 111 patterns obtained by bouncing sonar signals off a metal cylinder at various angles and under various conditions. The file "sonar.rocks" contains 97 patterns obtained from rocks under similar conditions. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. The data set contains signals obtained from a variety of different aspect angles, spanning 90 degrees for the cylinder and 180 degrees for the rock.

Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The integration aperture for higher frequencies occur later in time, since these frequencies are transmitted later during the chirp.

The label associated with each record contains the letter "R" if the object is a rock and "M" if it is a mine (metal cylinder). The numbers in the labels are in increasing order of aspect angle, but they do not encode the angle directly.


In [1]:
#Read the dataset

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [9]:
sonar = pd.read_csv(r'sonar.all-data.csv')

In [18]:
sonar.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
1,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
2,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
3,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R
4,0.0286,0.0453,0.0277,0.0174,0.0384,0.099,0.1201,0.1833,0.2105,0.3039,...,0.0045,0.0014,0.0038,0.0013,0.0089,0.0057,0.0027,0.0051,0.0062,R


METHODOLOGY: 						
						
This data set can be used in a number of different ways to test learning						
speed	 quality of ultimate learning	 ability to generalize	 or combinations			
of these factors.						
						
In [1]	 Gorman and Sejnowski report two series of experiments: an					
aspect-angle independent" series, in which the whole data set is used without controlling for aspect angle, and an "aspect-angle dependent						
series in which the training and testing sets were carefully controlled to						
ensure that each set contained cases from each aspect angle in						
appropriate proportions.						
						
For the aspect-angle independent experiments the combined set of 208 cases						
is divided randomly into 13 disjoint sets with 16 cases in each.  For each						
experiment	 12 of these sets are used as training data	 while the 13th is				
reserved for testing.  The experiment is repeated 13 times so that every						
case appears once as part of a test set.  The reported performance is an						
average over the entire set of 13 different test sets	 each run 10 times.					
						
It was observed that this random division of the sample set led to rather						
uneven performance.  A few of the splits gave poor results	 presumably					
because the test set contains some samples from aspect angles that are						
under-represented in the corresponding training set.  This motivated Gorman						
and Sejnowski to devise a different set of experiments in which an attempt						
was made to balance the training and test sets so that each would have a						
representative number of samples from all aspect angles.  Since detailed						
aspect angle information was not present in the data base of samples	 the					
208 samples were first divided into clusters	 using a 60-dimensional					
Euclidian metric	 each of these clusters was then divided between the					
104-member training set and the 104-member test set.  						
						
The actual training and testing samples used for the "aspect angle						
dependent" experiments are marked in the data files.  The reported						
performance is an average over 10 runs with this single division of the						
data set.						
						
A standard back-propagation network was used for all experiments.  The						
network had 60 inputs and 2 output units	 one indicating a cylinder and the					
other a rock.  Experiments were run with no hidden units (direct						
connections from each input to each output) and with a single hidden layer						
with 2	3	6	12	 or 24 units.  Each network was trained by 300 epochs over		
the entire training set.						
						
The weight-update formulas used in this study were slightly different from						
the standard form.  A learning rate of 2.0 and momentum of 0.0 was used.						
Errors less than 0.2 were treated as zero.  Initial weights were uniform						
random values in the range -0.3 to +0.3.						
						
RESULTS: 						
						
For the angle independent experiments	 Gorman and Sejnowski report the					
following results for networks with different numbers of hidden units:						
						
Hidden	% Right on	Std.	% Right on	Std.		
Units	Training set	Dev.	Test Set	Dev.		
------	------------	----	----------	----		
0	89.4		2.1	77.1		8.3
2	96.5		0.7	81.9		6.2
3	98.8		0.4	82		7.3
6	99.7		0.2	83.5		5.6
12	99.8		0.1	84.7		5.7
24	99.8		0.1	84.5		5.7
						
For the angle-dependent experiments Gorman and Sejnowski report the						
following results:						
						
Hidden	% Right on	Std.	% Right on	Std.		
Units	Training set	Dev.	Test Set	Dev.		
------	------------	----	----------	----		
0	79.3		3.4	73.1		4.8
2	96.2		2.2	85.7		6.3
3	98.1		1.5	87.6		3
6	99.4		0.9	89.3		2.4
12	99.8		0.6	90.4		1.8
24     100.0		0	89.2		1.4	
						
Not surprisingly	 the network's performance on the test set was somewhat					
better when the aspect angles in the training and test sets were balanced.						
						
Gorman and Sejnowski further report that a nearest neighbor classifier on						
the same data gave an 82.7% probability of correct classification.						
						
Three trained human subjects were each tested on 100 signals	 chosen at					
random from the set of 208 returns used to create this data set.  Their						
responses ranged between 88% and 97% correct.  However	 they may have been					
using information from the raw sonar signal that is not preserved in the						
processed data sets presented here.						
						
REFERENCES: 						
						
1. Gorman	 R. P.	 and Sejnowski	 T. J. (1988).  "Analysis of Hidden Units			
in a Layered Network Trained to Classify Sonar Targets" in Neural Networks						
Vol. 1	 pp. 75-89.					


In [20]:
sonar.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 207 entries, 0 to 206
Data columns (total 61 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       207 non-null    float64
 1   1       207 non-null    float64
 2   2       207 non-null    float64
 3   3       207 non-null    float64
 4   4       207 non-null    float64
 5   5       207 non-null    float64
 6   6       207 non-null    float64
 7   7       207 non-null    float64
 8   8       207 non-null    float64
 9   9       207 non-null    float64
 10  10      207 non-null    float64
 11  11      207 non-null    float64
 12  12      207 non-null    float64
 13  13      207 non-null    float64
 14  14      207 non-null    float64
 15  15      207 non-null    float64
 16  16      207 non-null    float64
 17  17      207 non-null    float64
 18  18      207 non-null    float64
 19  19      207 non-null    float64
 20  20      207 non-null    float64
 21  21      207 non-null    float64
 22  22

In [29]:
sonar.groupby(60).size()

60
M    111
R     96
dtype: int64

In [42]:
X = sonar.values[:,0:-1]
y = sonar.values[:,-1]


In [43]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state = 7)

In [44]:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

from sklearn.pipeline import Pipeline
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier, RandomForestClassifier, ExtraTreesClassifier

In [57]:
models = {'LR': LogisticRegression(), 'KNN':KNeighborsClassifier(),'DTC':DecisionTreeClassifier(),
        'svm':SVC(),'naivebayes':GaussianNB(), 'LDA':LinearDiscriminantAnalysis(),
        'AdaBC':AdaBoostClassifier(),'GraBoost':GradientBoostingClassifier(),
         'RFC':RandomForestClassifier(), 'ETC':ExtraTreesClassifier()}

for each in models:
    model=models[each]
    model.fit(X_train,y_train)
    ypred = model.predict(X_test)
    print(f"model name: {models[each]}")
    print("Accuracy: \n", accuracy_score(y_test, ypred))
    print("Classification report: \n", classification_report(y_test, ypred))
    print("Confusion matrix: \n", confusion_matrix(y_test, ypred))
    print("\n\n\n\n\n")
    

model name: LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)
Accuracy: 
 0.7142857142857143
Classification report: 
               precision    recall  f1-score   support

           M       0.82      0.69      0.75        26
           R       0.60      0.75      0.67        16

    accuracy                           0.71        42
   macro avg       0.71      0.72      0.71        42
weighted avg       0.74      0.71      0.72        42

Confusion matrix: 
 [[18  8]
 [ 4 12]]






model name: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')
Accuracy: 
 0.7619047619047619
Clas

As we see from the results, the best performing model here is ExtraTreesClassifier and RandomForestClassifier

Now we will see if we can improve the acuracy of the above models using GridSearchCV and RandomisedSearchCV
Also since the accuracy is dependant on sample size, we will use kfold cross validation method to improve our models

In [59]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV


##### GridSearchCV for LinearRegression

In [63]:
clf = GridSearchCV(LogisticRegression(max_iter=1000), {
    'solver': ['liblinear', 'newton-cg', 'sag', 'saga' , 'lbfgs'],
    'penalty' : ['l1', 'l2', 'elasticnet', 'none'],
    'multi_class' : ['auto', 'ovr', 'multinomial']},
    cv = 5
     )
clf.fit(X_train, y_train)
clf.cv_results_

ValueError: Solver newton-cg supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: Solver sag supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: Only 'saga' solver supports elasticnet penalty, got solver=liblinear.

ValueError: Solver newton-cg supports only 'l2' or 'none' penalties, got elasticnet penalty.

ValueError: Solver sag supports only 'l2' or 'none' penalties, got elasticnet penalty.

ValueError: l1_ratio must be between 0 and 1; got (l1_ratio=None)

ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got elasticnet penalty.

ValueError: penalty='none' is not supported for the liblinear solver

ValueError: Solver newton-cg supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: Solver sag supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: 

ValueError: Solver liblinear does not support a multinomial backend.

ValueError: Solver liblinear does not support a multinomial backend.

ValueError: Solver liblinear does not support a multinomial backend.

ValueError: Solver liblinear does not support a multinomial backend.

ValueError: Solver liblinear does not support a multinomial backend.

ValueError: Solver newton-cg supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: Solver sag supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: Solver liblinear does not support a multinomial backend.

ValueError: Solver liblinear does not support a multinomial backend.

ValueError: Solver liblinear does not support a multinomial backend.

ValueError: Solver liblinear does not support a multinomial backend.

ValueError: Solver liblinear does not support a multinomial backend.

ValueError: Only 'saga' solver supports elasticnet p



{'mean_fit_time': array([0.0026001 , 0.00020018, 0.00019965, 0.04840341, 0.00020046,
        0.00120006, 0.00599966, 0.00499997, 0.00900078, 0.00479946,
        0.00020022, 0.        , 0.        , 0.        , 0.00019989,
        0.        , 0.02720184, 0.06680403, 0.09740729, 0.01820126,
        0.00260057, 0.00020032, 0.00019979, 0.0524025 , 0.        ,
        0.00140038, 0.00519924, 0.00460062, 0.01160145, 0.00599985,
        0.        , 0.00040035, 0.        , 0.        , 0.00019999,
        0.        , 0.03280091, 0.0824059 , 0.09340606, 0.01800117,
        0.00099993, 0.00019994, 0.00020008, 0.04020281, 0.00020041,
        0.        , 0.01140056, 0.0062006 , 0.0126009 , 0.01040068,
        0.00019994, 0.00020003, 0.        , 0.        , 0.00020008,
        0.00019999, 0.05780449, 0.09600658, 0.11180696, 0.02720213]),
 'std_fit_time': array([4.90193715e-04, 4.00352478e-04, 3.99303436e-04, 9.02517878e-03,
        4.00924683e-04, 4.00018735e-04, 1.07261866e-06, 6.32339410e-04,
     

In [64]:
df = pd.DataFrame(clf.cv_results_)
df

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_multi_class,param_penalty,param_solver,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.0026,0.00049,0.0004,0.00049,auto,l1,liblinear,"{'multi_class': 'auto', 'penalty': 'l1', 'solv...",0.727273,0.727273,0.757576,0.727273,0.69697,0.727273,0.019165,27
1,0.0002,0.0004,0.0,0.0,auto,l1,newton-cg,"{'multi_class': 'auto', 'penalty': 'l1', 'solv...",,,,,,,,34
2,0.0002,0.000399,0.0,0.0,auto,l1,sag,"{'multi_class': 'auto', 'penalty': 'l1', 'solv...",,,,,,,,35
3,0.048403,0.009025,0.0008,0.0004,auto,l1,saga,"{'multi_class': 'auto', 'penalty': 'l1', 'solv...",0.636364,0.727273,0.818182,0.69697,0.666667,0.709091,0.062398,29
4,0.0002,0.000401,0.0,0.0,auto,l1,lbfgs,"{'multi_class': 'auto', 'penalty': 'l1', 'solv...",,,,,,,,37
5,0.0012,0.0004,0.000201,0.000401,auto,l2,liblinear,"{'multi_class': 'auto', 'penalty': 'l2', 'solv...",0.787879,0.787879,0.757576,0.757576,0.757576,0.769697,0.014845,13
6,0.006,1e-06,0.0002,0.0004,auto,l2,newton-cg,"{'multi_class': 'auto', 'penalty': 'l2', 'solv...",0.757576,0.757576,0.848485,0.69697,0.878788,0.787879,0.066391,1
7,0.005,0.000632,0.0004,0.00049,auto,l2,sag,"{'multi_class': 'auto', 'penalty': 'l2', 'solv...",0.757576,0.757576,0.848485,0.69697,0.878788,0.787879,0.066391,1
8,0.009001,0.000632,0.0006,0.00049,auto,l2,saga,"{'multi_class': 'auto', 'penalty': 'l2', 'solv...",0.757576,0.757576,0.848485,0.69697,0.878788,0.787879,0.066391,1
9,0.004799,0.000749,0.0006,0.00049,auto,l2,lbfgs,"{'multi_class': 'auto', 'penalty': 'l2', 'solv...",0.757576,0.757576,0.848485,0.69697,0.878788,0.787879,0.066391,1


In [78]:
df[['param_multi_class','param_penalty', 'param_solver','mean_test_score']].sort_values(by='mean_test_score', ascending=False).dropna().head(10)

Unnamed: 0,param_multi_class,param_penalty,param_solver,mean_test_score
27,ovr,l2,sag,0.787879
8,auto,l2,saga,0.787879
26,ovr,l2,newton-cg,0.787879
29,ovr,l2,lbfgs,0.787879
9,auto,l2,lbfgs,0.787879
28,ovr,l2,saga,0.787879
7,auto,l2,sag,0.787879
6,auto,l2,newton-cg,0.787879
49,multinomial,l2,lbfgs,0.781818
48,multinomial,l2,saga,0.781818


In [79]:
print(clf.best_estimator_)
print(clf.best_params_)
print(clf.best_score_)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=1000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='newton-cg', tol=0.0001, verbose=0,
                   warm_start=False)
{'multi_class': 'auto', 'penalty': 'l2', 'solver': 'newton-cg'}
0.7878787878787878


As we see, previously when we ran this model using the default parameters we obtained the accuracy of 71.43% for Logistic Regression model. But if we tune the parameters, we obtain the accuracy of 78.78%.



Likewise we will use GridSearchcv for rest of the models 

##### For model KNN, GridsearchCV

In [82]:
clf = GridSearchCV(KNeighborsClassifier(), {
    'n_neighbors' : [4,5,6,7,8,9,10],
    'weights' : ['uniform', 'distance'],
    'algorithm' :['auto', 'ball_tree', 'kd_tree', 'brute'],
    'leaf_size' : [20,30,40,50]},
    cv = 5
     )
clf.fit(X_train, y_train)
clf.cv_results_

{'mean_fit_time': array([0.00199924, 0.0013998 , 0.00140028, 0.0012002 , 0.00100079,
        0.0014008 , 0.00140004, 0.00100017, 0.00120082, 0.0012002 ,
        0.00100069, 0.00120101, 0.00140028, 0.0012012 , 0.00140066,
        0.0010004 , 0.00160046, 0.00100036, 0.00100131, 0.00119987,
        0.00099983, 0.0010005 , 0.00099964, 0.00139985, 0.00140104,
        0.00099983, 0.00099955, 0.00159998, 0.00080056, 0.00079951,
        0.00140018, 0.00100069, 0.00100074, 0.00099993, 0.00100079,
        0.00100031, 0.00100007, 0.00080009, 0.00099969, 0.00100031,
        0.00120058, 0.00099921, 0.00120049, 0.00100093, 0.00100026,
        0.00139971, 0.00119958, 0.00099959, 0.0015995 , 0.00119996,
        0.001401  , 0.0012001 , 0.00120058, 0.00140066, 0.00120029,
        0.00120049, 0.0012001 , 0.00120049, 0.00140095, 0.00099998,
        0.00120125, 0.00119987, 0.00100026, 0.00100002, 0.00099926,
        0.00159926, 0.00180049, 0.00160131, 0.00159302, 0.00119934,
        0.00100007, 0.00140071,

In [83]:
df = pd.DataFrame(clf.cv_results_)
df

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_algorithm,param_leaf_size,param_n_neighbors,param_weights,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.001999,6.322597e-04,0.002801,7.479339e-04,auto,20,4,uniform,"{'algorithm': 'auto', 'leaf_size': 20, 'n_neig...",0.757576,0.727273,0.787879,0.818182,0.727273,0.763636,0.035339,65
1,0.001400,4.900382e-04,0.001600,4.895523e-04,auto,20,4,distance,"{'algorithm': 'auto', 'leaf_size': 20, 'n_neig...",0.818182,0.757576,0.818182,0.818182,0.727273,0.787879,0.038331,17
2,0.001400,4.903304e-04,0.002600,4.906622e-04,auto,20,5,uniform,"{'algorithm': 'auto', 'leaf_size': 20, 'n_neig...",0.818182,0.727273,0.878788,0.696970,0.757576,0.775758,0.065275,49
3,0.001200,4.000667e-04,0.001200,4.001870e-04,auto,20,5,distance,"{'algorithm': 'auto', 'leaf_size': 20, 'n_neig...",0.818182,0.757576,0.878788,0.727273,0.757576,0.787879,0.054208,17
4,0.001001,9.725608e-07,0.001999,9.818678e-07,auto,20,6,uniform,"{'algorithm': 'auto', 'leaf_size': 20, 'n_neig...",0.727273,0.727273,0.696970,0.666667,0.727273,0.709091,0.024242,145
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
219,0.000800,4.000904e-04,0.001201,4.001388e-04,brute,50,8,distance,"{'algorithm': 'brute', 'leaf_size': 50, 'n_nei...",0.818182,0.757576,0.696970,0.636364,0.727273,0.727273,0.060606,97
220,0.000600,4.899793e-04,0.002200,3.995659e-04,brute,50,9,uniform,"{'algorithm': 'brute', 'leaf_size': 50, 'n_nei...",0.727273,0.606061,0.575758,0.545455,0.636364,0.618182,0.062398,209
221,0.000801,4.002578e-04,0.000999,4.370285e-07,brute,50,9,distance,"{'algorithm': 'brute', 'leaf_size': 50, 'n_nei...",0.787879,0.696970,0.757576,0.636364,0.757576,0.727273,0.054208,97
222,0.000600,4.899793e-04,0.002000,2.780415e-07,brute,50,10,uniform,"{'algorithm': 'brute', 'leaf_size': 50, 'n_nei...",0.696970,0.606061,0.606061,0.575758,0.636364,0.624242,0.041105,193


In [84]:
print(clf.best_estimator_,
clf.best_params_,
clf.best_score_)

KNeighborsClassifier(algorithm='auto', leaf_size=20, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=6, p=2,
                     weights='distance') {'algorithm': 'auto', 'leaf_size': 20, 'n_neighbors': 6, 'weights': 'distance'} 0.793939393939394


Previously for KNN model, we got accuracy as Accuracy: 0.7619047619047619, 

but with GridSearchCv we have better accuracy of 0.793939393939394 

for best params {'algorithm': 'auto', 'leaf_size': 20, 'n_neighbors': 6, 'weights': 'distance'}

In [None]:
models = {'LR': LogisticRegression(), 'KNN':KNeighborsClassifier(),'DTC':DecisionTreeClassifier(),
        'svm':SVC(),'naivebayes':GaussianNB(), 'LDA':LinearDiscriminantAnalysis(),
        'AdaBC':AdaBoostClassifier(),'GraBoost':GradientBoostingClassifier(),
         'RFC':RandomForestClassifier(), 'ETC':ExtraTreesClassifier()}

In [101]:
clf = GridSearchCV(DecisionTreeClassifier(),{
    'criterion' : ["gini", "entropy"],
    'splitter' : ["best", "random"],
    'max_features' : ["int","float","auto", "sqrt", "log2"]
    }, cv=8  )

clf.fit(X_train,y_train)
df = pd.DataFrame(clf.cv_results_)
df.head()

ValueError: Invalid value for max_features. Allowed string values are 'auto', 'sqrt' or 'log2'.

ValueError: Invalid value for max_features. Allowed string values are 'auto', 'sqrt' or 'log2'.

ValueError: Invalid value for max_features. Allowed string values are 'auto', 'sqrt' or 'log2'.

ValueError: Invalid value for max_features. Allowed string values are 'auto', 'sqrt' or 'log2'.

ValueError: Invalid value for max_features. Allowed string values are 'auto', 'sqrt' or 'log2'.

ValueError: Invalid value for max_features. Allowed string values are 'auto', 'sqrt' or 'log2'.

ValueError: Invalid value for max_features. Allowed string values are 'auto', 'sqrt' or 'log2'.

ValueError: Invalid value for max_features. Allowed string values are 'auto', 'sqrt' or 'log2'.

ValueError: Invalid value for max_features. Allowed string values are 'auto', 'sqrt' or 'log2'.

ValueError: Invalid value for max_features. Allowed string values are 'auto', 'sqrt' or 'log2'.

ValueError: Invalid value for 

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_criterion,param_max_features,param_splitter,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,split5_test_score,split6_test_score,split7_test_score,mean_test_score,std_test_score,rank_test_score
0,0.001,0.0004993697,0.0,0.0,gini,int,best,"{'criterion': 'gini', 'max_features': 'int', '...",,,,,,,,,,,20
1,0.001,1.079486e-06,0.0,0.0,gini,int,random,"{'criterion': 'gini', 'max_features': 'int', '...",,,,,,,,,,,18
2,0.00075,0.0004332076,0.0,0.0,gini,float,best,"{'criterion': 'gini', 'max_features': 'float',...",,,,,,,,,,,17
3,0.000625,0.0004845052,0.0,0.0,gini,float,random,"{'criterion': 'gini', 'max_features': 'float',...",,,,,,,,,,,16
4,0.001,7.052517e-07,0.000375,0.000484,gini,auto,best,"{'criterion': 'gini', 'max_features': 'auto', ...",0.714286,0.714286,0.761905,0.619048,0.714286,0.95,0.6,0.65,0.715476,0.102519,4


In [102]:
df[['params','mean_test_score']].sort_values(by='mean_test_score', ascending=False).dropna().head(10)

Unnamed: 0,params,mean_test_score
15,"{'criterion': 'entropy', 'max_features': 'auto...",0.727976
17,"{'criterion': 'entropy', 'max_features': 'sqrt...",0.716071
14,"{'criterion': 'entropy', 'max_features': 'auto...",0.715476
4,"{'criterion': 'gini', 'max_features': 'auto', ...",0.715476
6,"{'criterion': 'gini', 'max_features': 'sqrt', ...",0.698214
7,"{'criterion': 'gini', 'max_features': 'sqrt', ...",0.697321
16,"{'criterion': 'entropy', 'max_features': 'sqrt...",0.697024
18,"{'criterion': 'entropy', 'max_features': 'log2...",0.689881
5,"{'criterion': 'gini', 'max_features': 'auto', ...",0.679464
9,"{'criterion': 'gini', 'max_features': 'log2', ...",0.672619


Likewise if we wish we can run the GridsearchCv for all the models. And find out the best performing model for a particular given problem. 

We will deploy the model

In [109]:
import pickle

In [111]:
with open('model_pickle', 'wb') as f:
    pickle.dump(model, f)

In [112]:
with open('model_pickle','rb') as f:
    mp = pickle.load(f)

In [113]:
mp

ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                     criterion='gini', max_depth=None, max_features='auto',
                     max_leaf_nodes=None, max_samples=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=1, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators=100,
                     n_jobs=None, oob_score=False, random_state=None, verbose=0,
                     warm_start=False)

In [116]:
#Lets create a dummy dataframe to test the predictio of the saved model

dummydf = pd.DataFrame(np.linspace(0,1,300).reshape(5,60))
dummydf.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,50,51,52,53,54,55,56,57,58,59
0,0.0,0.003344,0.006689,0.010033,0.013378,0.016722,0.020067,0.023411,0.026756,0.0301,...,0.167224,0.170569,0.173913,0.177258,0.180602,0.183946,0.187291,0.190635,0.19398,0.197324
1,0.200669,0.204013,0.207358,0.210702,0.214047,0.217391,0.220736,0.22408,0.227425,0.230769,...,0.367893,0.371237,0.374582,0.377926,0.381271,0.384615,0.38796,0.391304,0.394649,0.397993
2,0.401338,0.404682,0.408027,0.411371,0.414716,0.41806,0.421405,0.424749,0.428094,0.431438,...,0.568562,0.571906,0.575251,0.578595,0.58194,0.585284,0.588629,0.591973,0.595318,0.598662
3,0.602007,0.605351,0.608696,0.61204,0.615385,0.618729,0.622074,0.625418,0.628763,0.632107,...,0.769231,0.772575,0.77592,0.779264,0.782609,0.785953,0.789298,0.792642,0.795987,0.799331
4,0.802676,0.80602,0.809365,0.812709,0.816054,0.819398,0.822742,0.826087,0.829431,0.832776,...,0.9699,0.973244,0.976589,0.979933,0.983278,0.986622,0.989967,0.993311,0.996656,1.0


In [117]:
#Now we test the model with that dummy created data and see what it predicts
mp.predict(dummydf)

array(['R', 'M', 'M', 'M', 'M'], dtype=object)

Now let us try the same thing with joblib

In [119]:
import joblib

In [120]:
joblib.dump(model, 'model_joblib')

['model_joblib']

In [123]:
jlmodel = joblib.load('model_joblib')

In [124]:
jlmodel.predict(dummydf)

array(['R', 'M', 'M', 'M', 'M'], dtype=object)

As we can see both pickle and joblib do the same exact things

I hope you like my work. thank you