<a href="https://colab.research.google.com/github/hillelMerran/data-science/blob/main/Mine_identification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This project focuses on using pipeline to run multiple models at the same time, and comparing their score.

The data set here is the one used by Gorman and Sejnowski in their study
of the classification of sonar signals using a neural network.

The task is to train a network to discriminate between sonar signals bounced
off a metal cylinder and those bounced off a roughly cylindrical rock.

We'll explore different ML models with the same goal.

The dataset contains 208 patterns obtained by bouncing sonar
signals off a metal cylinder (111) and rocks (97) at various angles and under various conditions.

Each pattern is a set of 60 numbers in the range 0.0 to 1.0.  Each number
represents the energy within a particular frequency band, integrated over
a certain period of time.

The label associated with each record contains the letter "R" if the object
is a rock and "M" if it is a mine (metal cylinder).

## Imports

In [1]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import *
import xgboost as xgb

## Data exploration

In [2]:
url = 'https://raw.githubusercontent.com/hillelMerran/data-science/main/multiple_models/sonar.all-data.csv'
cols = ["freq_{}".format(k) for k in range(1,61)] + ["class"]
df = pd.read_csv(url, names=cols)
print(df.shape)
df.head()

(208, 61)


Unnamed: 0,freq_1,freq_2,freq_3,freq_4,freq_5,freq_6,freq_7,freq_8,freq_9,freq_10,...,freq_52,freq_53,freq_54,freq_55,freq_56,freq_57,freq_58,freq_59,freq_60,class
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 208 entries, 0 to 207
Data columns (total 61 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   freq_1   208 non-null    float64
 1   freq_2   208 non-null    float64
 2   freq_3   208 non-null    float64
 3   freq_4   208 non-null    float64
 4   freq_5   208 non-null    float64
 5   freq_6   208 non-null    float64
 6   freq_7   208 non-null    float64
 7   freq_8   208 non-null    float64
 8   freq_9   208 non-null    float64
 9   freq_10  208 non-null    float64
 10  freq_11  208 non-null    float64
 11  freq_12  208 non-null    float64
 12  freq_13  208 non-null    float64
 13  freq_14  208 non-null    float64
 14  freq_15  208 non-null    float64
 15  freq_16  208 non-null    float64
 16  freq_17  208 non-null    float64
 17  freq_18  208 non-null    float64
 18  freq_19  208 non-null    float64
 19  freq_20  208 non-null    float64
 20  freq_21  208 non-null    float64
 21  freq_22  208 non

No missing values.

Target field is a string.

In [4]:
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
freq_1,208.0,0.029164,0.022991,0.0015,0.01335,0.0228,0.03555,0.1371
freq_2,208.0,0.038437,0.03296,0.0006,0.01645,0.0308,0.04795,0.2339
freq_3,208.0,0.043832,0.038428,0.0015,0.01895,0.0343,0.05795,0.3059
freq_4,208.0,0.053892,0.046528,0.0058,0.024375,0.04405,0.0645,0.4264
freq_5,208.0,0.075202,0.055552,0.0067,0.03805,0.0625,0.100275,0.401
freq_6,208.0,0.10457,0.059105,0.0102,0.067025,0.09215,0.134125,0.3823
freq_7,208.0,0.121747,0.061788,0.0033,0.0809,0.10695,0.154,0.3729
freq_8,208.0,0.134799,0.085152,0.0055,0.080425,0.1121,0.1696,0.459
freq_9,208.0,0.178003,0.118387,0.0075,0.097025,0.15225,0.233425,0.6828
freq_10,208.0,0.208259,0.134416,0.0113,0.111275,0.1824,0.2687,0.7106


Features are not scaled.


We'll scale all the features and convert target field to boolean.

In [5]:
df["class"] = df["class"].map({"R":0, "M":1})

## Scaling the data and split into train/test sets

In [6]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

In [7]:
y = df['class']
X = df.drop('class', axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
scaled_X_train = scaler.fit_transform(X_train)
scaled_X_test = scaler.transform(X_test)
print(X_train.shape)
print(X_test.shape)

(166, 60)
(42, 60)


## Creating different families models

In [8]:
clf1 = RandomForestClassifier(random_state=42)
clf2 = SVC(probability=True, random_state=42)
clf3 = LogisticRegression(random_state=42)
clf4 = DecisionTreeClassifier(random_state=42)
clf5 = KNeighborsClassifier()
clf6 = GaussianNB()
clf7 = GradientBoostingClassifier(random_state=42)
clf8 = xgb.XGBClassifier(random_state=42)

### Definig hyper-parameters grid values

In [9]:
param1 = {}
param1['classifier__n_estimators'] = [10, 50, 100]
param1['classifier__max_depth'] = [5, 10, 20]
param1['classifier__class_weight'] = [None, {0:1,1:5}, {0:1,1:10}, {0:1,1:25}]
param1['classifier'] = [clf1]

param2 = {}
param2['classifier__C'] = [10**-2, 10**-1, 10**0, 10**1, 10**2]
param2['classifier__class_weight'] = [None, {0:1,1:5}, {0:1,1:10}, {0:1,1:25}]
param2['classifier'] = [clf2]

param3 = {}
param3['classifier__C'] = [10**-2, 10**-1, 10**0, 10**1, 10**2]
param3['classifier__solver'] = ['liblinear','saga']
param3['classifier__l1_ratio'] = [0,.25,.5,.75,1]
param3['classifier__penalty'] = ['l1', 'l2', 'elasticnet']
param3['classifier__class_weight'] = [None, {0:1,1:5}, {0:1,1:10}, {0:1,1:25}]
param3['classifier'] = [clf3]

param4 = {}
param4['classifier__max_depth'] = [5,10,25,None]
param4['classifier__min_samples_split'] = [2,5,10]
param4['classifier__class_weight'] = [None, {0:1,1:5}, {0:1,1:10}, {0:1,1:25}]
param4['classifier'] = [clf4]

param5 = {}
param5['classifier__n_neighbors'] = [2,5,10,25,50]
param5['classifier'] = [clf5]

param6 = {}
param6['classifier'] = [clf6]

param7 = {}
param7['classifier__n_estimators'] = [10, 50, 100, 250]
param7['classifier__max_depth'] = [5, 10, 20]
param7['classifier'] = [clf7]

param8 = {}
param8['classifier__booster'] = ['gbtree', 'dart']
param8['classifier__validate_parameters'] = [False]
param8['classifier__max_depth'] = [2,3,5,7,10]
param8['classifier__eta'] = [.01, .05, .1, .2, .3, .5, .9]
param8['classifier__objective'] = ['binary:hinge']
param8['classifier__eval_metric'] = ['auc', 'error']
param8['classifier__tree_method'] = ['exact']
param8['classifier'] = [clf8]

### Defining pipeline to tune all the models at once

In [10]:
pipeline = Pipeline([('classifier', clf1)])
params = [param1, param2, param3, param4, param5, param6, param7, param8]

In [11]:
full_cv_classifier = GridSearchCV(pipeline, params, cv=5, n_jobs=-1, scoring='roc_auc')

## Models training
We'll train the models first with the unchanged features values, and then with the scaled one.

It will allow us to highlight the models affected by non scaled features.

### Training on unscaled features

In [12]:
full_cv_classifier.fit(X_train, y_train)

500 fits failed out of a total of 4310.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
500 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/pipeline.py", line 394, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_logistic.py", line 1461, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_logistic.py", line 459, in _check_

GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('classifier',
                                        RandomForestClassifier(max_depth=10,
                                                               n_estimators=50,
                                                               random_state=42))]),
             n_jobs=-1,
             param_grid=[{'classifier': [RandomForestClassifier(max_depth=10,
                                                                n_estimators=50,
                                                                random_state=42)],
                          'classifier__class_weight': [None, {0: 1, 1: 5},
                                                       {0: 1, 1: 10},
                                                       {0: 1, 1: 25}],
                          'classifier__max_depth': [5, 10, 20],
                          'classifier__...
                         {'classifier': [XGBClassifier(random_state=42)],
                       

In [13]:
full_cv_classifier.best_params_

{'classifier': RandomForestClassifier(max_depth=10, n_estimators=50, random_state=42),
 'classifier__class_weight': None,
 'classifier__max_depth': 10,
 'classifier__n_estimators': 50}

### Models' scores comparison

In [20]:
results = pd.DataFrame(full_cv_classifier.cv_results_)
#get rid of the results of wrong logistic regression parameters (solver='liblinear' & penalty='elasticnet')
results = results.dropna(axis=0, subset=['mean_test_score'])
results.head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_classifier,param_classifier__class_weight,param_classifier__max_depth,param_classifier__n_estimators,param_classifier__C,param_classifier__l1_ratio,...,param_classifier__validate_parameters,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.028934,0.001044,0.008195,0.002653,"RandomForestClassifier(max_depth=10, n_estimat...",,5,10,,,...,,{'classifier': RandomForestClassifier(max_dept...,0.880208,0.768382,0.933333,0.946296,0.881481,0.88194,0.06274,10
1,0.125194,0.002355,0.014839,0.000682,"RandomForestClassifier(max_depth=10, n_estimat...",,5,50,,,...,,{'classifier': RandomForestClassifier(max_dept...,0.888889,0.845588,0.944444,0.933333,0.951852,0.912821,0.040093,3
2,0.254284,0.01313,0.023098,0.002273,"RandomForestClassifier(max_depth=10, n_estimat...",,5,100,,,...,,{'classifier': RandomForestClassifier(max_dept...,0.871528,0.841912,0.914815,0.944444,0.940741,0.902688,0.040008,6
3,0.030335,0.002231,0.006904,0.000266,"RandomForestClassifier(max_depth=10, n_estimat...",,10,10,,,...,,{'classifier': RandomForestClassifier(max_dept...,0.876736,0.766544,0.9,0.935185,0.896296,0.874952,0.057387,16
4,0.129633,0.008339,0.015339,0.003425,"RandomForestClassifier(max_depth=10, n_estimat...",,10,50,,,...,,{'classifier': RandomForestClassifier(max_dept...,0.887153,0.858456,0.937037,0.935185,0.955556,0.914677,0.036083,1


In [21]:
results.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 762 entries, 0 to 861
Data columns (total 29 columns):
 #   Column                                 Non-Null Count  Dtype  
---  ------                                 --------------  -----  
 0   mean_fit_time                          762 non-null    float64
 1   std_fit_time                           762 non-null    float64
 2   mean_score_time                        762 non-null    float64
 3   std_score_time                         762 non-null    float64
 4   param_classifier                       762 non-null    object 
 5   param_classifier__class_weight         453 non-null    object 
 6   param_classifier__max_depth            224 non-null    object 
 7   param_classifier__n_estimators         48 non-null     object 
 8   param_classifier__C                    520 non-null    object 
 9   param_classifier__l1_ratio             500 non-null    object 
 10  param_classifier__penalty              500 non-null    object 
 11  param_

Fixing classifier names in order to apply groupby.

In [24]:
import re
results['param_classifier'] = results['param_classifier'].apply(lambda my_str: re.sub("\(.*?\)", "", str(my_str)))

In [25]:
indexes = results[['param_classifier','mean_test_score']].groupby('param_classifier')['mean_test_score'].idxmax()

In [26]:
results.loc[indexes][['param_classifier','params','mean_test_score']].sort_values('mean_test_score', ascending=False)

Unnamed: 0,param_classifier,params,mean_test_score
4,RandomForestClassifier,{'classifier': RandomForestClassifier(max_dept...,0.914677
48,SVC,"{'classifier': SVC(probability=True, random_st...",0.886634
373,LogisticRegression,{'classifier': LogisticRegression(random_state...,0.857821
713,GradientBoostingClassifier,{'classifier': GradientBoostingClassifier(rand...,0.853115
704,KNeighborsClassifier,"{'classifier': KNeighborsClassifier(), 'classi...",0.816882
709,GaussianNB,{'classifier': GaussianNB()},0.797963
722,XGBClassifier,"{'classifier': XGBClassifier(random_state=42),...",0.789755
658,DecisionTreeClassifier,{'classifier': DecisionTreeClassifier(random_s...,0.740467


### Training on scaled features

In [23]:
full_cv_classifier.fit(scaled_X_train, y_train)

500 fits failed out of a total of 4310.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
500 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/pipeline.py", line 394, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_logistic.py", line 1461, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  File "/usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_logistic.py", line 459, in _check_

GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('classifier',
                                        RandomForestClassifier(max_depth=10,
                                                               n_estimators=50,
                                                               random_state=42))]),
             n_jobs=-1,
             param_grid=[{'classifier': [RandomForestClassifier(max_depth=10,
                                                                n_estimators=50,
                                                                random_state=42)],
                          'classifier__class_weight': [None, {0: 1, 1: 5},
                                                       {0: 1, 1: 10},
                                                       {0: 1, 1: 25}],
                          'classifier__max_depth': [5, 10, 20],
                          'classifier__...
                         {'classifier': [XGBClassifier(random_state=42)],
                       

In [27]:
results_scaled = pd.DataFrame(full_cv_classifier.cv_results_)
results_scaled = results_scaled.dropna(axis=0, subset=['mean_test_score'])
results_scaled['param_classifier'] = results_scaled['param_classifier'].apply(lambda my_str: re.sub("\(.*?\)", "", str(my_str)))
indexes_scaled = results_scaled[['param_classifier','mean_test_score']].groupby('param_classifier')['mean_test_score'].idxmax()
results_scaled.loc[indexes_scaled][['param_classifier','params','mean_test_score']].sort_values('mean_test_score', ascending=False)

Unnamed: 0,param_classifier,params,mean_test_score
4,RandomForestClassifier,{'classifier': RandomForestClassifier(max_dept...,0.913983
48,SVC,"{'classifier': SVC(probability=True, random_st...",0.909551
705,KNeighborsClassifier,"{'classifier': KNeighborsClassifier(), 'classi...",0.876702
713,GradientBoostingClassifier,{'classifier': GradientBoostingClassifier(rand...,0.853115
58,LogisticRegression,{'classifier': LogisticRegression(random_state...,0.843554
709,GaussianNB,{'classifier': GaussianNB()},0.797963
722,XGBClassifier,"{'classifier': XGBClassifier(random_state=42),...",0.789755
658,DecisionTreeClassifier,{'classifier': DecisionTreeClassifier(random_s...,0.740467


### Comparing scores with and without scaling the features
We observe KNeighbours reached a much higher score (+6%) when the features were previously scaled. This effect is comprehensive since KNeighbours computes distance between nearest neighbours so the features MUST be scaled.

The logisticRegression model score has decreased a bit with the scaled features, completely changing the optimal hyper parameters (C, L1 ratio and consequently also the penalty, and the solver).

However, best model remained the same: RandomForest.



In [28]:
results[['param_classifier','mean_test_score']].groupby('param_classifier').describe()

Unnamed: 0_level_0,mean_test_score,mean_test_score,mean_test_score,mean_test_score,mean_test_score,mean_test_score,mean_test_score,mean_test_score
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
param_classifier,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
DecisionTreeClassifier,48.0,0.660221,0.045966,0.591483,0.627173,0.652745,0.706853,0.740467
GaussianNB,1.0,0.797963,,0.797963,0.797963,0.797963,0.797963,0.797963
GradientBoostingClassifier,12.0,0.794607,0.036767,0.742239,0.777118,0.786383,0.817499,0.853115
KNeighborsClassifier,5.0,0.724721,0.066244,0.664295,0.672567,0.699719,0.770142,0.816882
LogisticRegression,500.0,0.733558,0.117216,0.5,0.680752,0.784989,0.819831,0.857821
RandomForestClassifier,36.0,0.844289,0.049063,0.722606,0.807908,0.84875,0.879331,0.914677
SVC,20.0,0.853369,0.041546,0.735703,0.854999,0.865504,0.870871,0.886634
XGBClassifier,140.0,0.752028,0.025908,0.717712,0.729518,0.757435,0.766275,0.789755


In [29]:
results_scaled[['param_classifier','mean_test_score']].groupby('param_classifier').describe()

Unnamed: 0_level_0,mean_test_score,mean_test_score,mean_test_score,mean_test_score,mean_test_score,mean_test_score,mean_test_score,mean_test_score
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
param_classifier,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
DecisionTreeClassifier,48.0,0.660221,0.045966,0.591483,0.627173,0.652745,0.706853,0.740467
GaussianNB,1.0,0.797963,,0.797963,0.797963,0.797963,0.797963,0.797963
GradientBoostingClassifier,12.0,0.794607,0.036767,0.742239,0.777118,0.786383,0.817499,0.853115
KNeighborsClassifier,5.0,0.814354,0.058222,0.728611,0.789107,0.82287,0.85448,0.876702
LogisticRegression,500.0,0.768411,0.089011,0.5,0.768785,0.791068,0.808469,0.843554
RandomForestClassifier,36.0,0.845108,0.048959,0.722606,0.809037,0.850231,0.879561,0.913983
SVC,20.0,0.888124,0.020536,0.847015,0.875842,0.882209,0.909551,0.909551
XGBClassifier,140.0,0.752028,0.025908,0.717712,0.729518,0.757435,0.766275,0.789755


## Hyper params probing to avoid overfitting

For our final model, we'll choose between the 2 best families - RandomForest and SVC.

There is no significant difference in the RandomForest scores with or without scaling the features.

However, there is a difference for SVC so we'll continue with the scaled option.

In [30]:
svc = results_scaled[['param_classifier','param_classifier__C','param_classifier__class_weight','mean_test_score']][results_scaled['param_classifier'] == 'SVC']
rf = results_scaled[['param_classifier','param_classifier__n_estimators','param_classifier__max_depth','param_classifier__class_weight','mean_test_score']][results_scaled['param_classifier'] == 'RandomForestClassifier']

In [31]:
rf.sort_values('mean_test_score', ascending=False).nlargest(5, columns='mean_test_score')

Unnamed: 0,param_classifier,param_classifier__n_estimators,param_classifier__max_depth,param_classifier__class_weight,mean_test_score
4,RandomForestClassifier,50,10,,0.913983
7,RandomForestClassifier,50,20,,0.913983
1,RandomForestClassifier,50,5,,0.913557
5,RandomForestClassifier,100,10,,0.911004
8,RandomForestClassifier,100,20,,0.911004


Optimal params of RandomForest model are n_estimator=50 and no class weight.

In [32]:
svc.sort_values('mean_test_score', ascending=False).nlargest(15, columns='mean_test_score')

Unnamed: 0,param_classifier,param_classifier__C,param_classifier__class_weight,mean_test_score
55,SVC,100.0,"{0: 1, 1: 25}",0.909551
54,SVC,100.0,"{0: 1, 1: 10}",0.909551
53,SVC,100.0,"{0: 1, 1: 5}",0.909551
52,SVC,100.0,,0.909551
51,SVC,10.0,"{0: 1, 1: 25}",0.909551
50,SVC,10.0,"{0: 1, 1: 10}",0.909551
49,SVC,10.0,"{0: 1, 1: 5}",0.909551
48,SVC,10.0,,0.909551
44,SVC,1.0,,0.888145
45,SVC,1.0,"{0: 1, 1: 5}",0.882579


Optimal param of SVC model is C=100 or 10. We'll keep it to 10 for a stronger regularization.

## Final evaluation of top 2 models

In [33]:
rf_model = RandomForestClassifier(n_estimators=50, random_state=42)
rf_model.fit(X_train,y_train)
rf_pred = rf_model.predict(X_test)

In [34]:
print(classification_report(y_test,rf_pred))

              precision    recall  f1-score   support

           0       0.85      0.85      0.85        20
           1       0.86      0.86      0.86        22

    accuracy                           0.86        42
   macro avg       0.86      0.86      0.86        42
weighted avg       0.86      0.86      0.86        42



In [35]:
svc_model = SVC(probability=True, C=10, random_state=42)
svc_model.fit(X_train,y_train)
svc_pred = svc_model.predict(X_test)

In [36]:
svc_model_scaled = SVC(probability=True, C=10, random_state=42)
svc_model_scaled.fit(scaled_X_train,y_train)
svc_pred_scaled = svc_model_scaled.predict(scaled_X_test)

In [37]:
print(classification_report(y_test,svc_pred))

              precision    recall  f1-score   support

           0       0.95      0.90      0.92        20
           1       0.91      0.95      0.93        22

    accuracy                           0.93        42
   macro avg       0.93      0.93      0.93        42
weighted avg       0.93      0.93      0.93        42



In [38]:
print(classification_report(y_test,svc_pred_scaled))

              precision    recall  f1-score   support

           0       1.00      0.85      0.92        20
           1       0.88      1.00      0.94        22

    accuracy                           0.93        42
   macro avg       0.94      0.93      0.93        42
weighted avg       0.94      0.93      0.93        42



Best model is now SVC.