# Data Mining/ML - Classification model implementation

* **2020MT10823 - Mohit Kumar Gond** <br>



#  Statlog

The database is made up of the multi-spectral values of the pixels in 3x3 neighbourhoods in a satellite image as well as the classification linked to each neighborhood's central pixel. With the help of the multi-spectral values, this classification is to be predicted. The class of a pixel is encoded as a number in the sample database.<br>

One frame of Landsat MSS imagery consists of four digital images of the same scene in different spectral bands. Two of these are in the visible region (corresponding approximately to green and red regions of the visible spectrum) and two are in the (near) infra-red. Each pixel is a 8-bit binary word, with 0 corresponding to black and 255 to white. The spatial resolution of a pixel is about 80m x 80m. Each image contains 2340 x 3380 such pixels.<br>

The database is a (tiny) sub-area of a scene, consisting of 82 x 100 pixels. Each line of data corresponds to a 3x3 square neighbourhood of pixels completely contained within the 82x100 sub-area. Each line contains the pixel values in the four spectral bands (converted to ASCII) of each of the 9 pixels in the 3x3 neighbourhood and a number indicating the classification label of the central pixel. 

The number is a code for the following classes:<br>
1. red soil<br>
2. cotton crop<br>
3. grey soil<br>
4. damp grey soil<br>
5. soil with vegetation stubble<br>
6. mixture class (all types present)<br>
7. very damp grey soil<br>



With the help of data mining , we are able to predict which class of soil is present at the location. This has widely used in space exploration research, we can identity the type of material present on extra-terrestrial body.

The data consist of 6435 instance. The data set characteristics is of multivariate type.In each line of data the four spectral values for the top-left pixel are given first followed by the four spectral values for the top-middle pixel and then those for the top-right pixel, and so on with the pixels read out in sequence left-to-right and top-to-bottom.<br>
so there are 9 * (no of pixels) * 4 (no. of spectral values) = 36 attributes for each instance of the data set. The attribute consists of integer data type and the range is 0-255<br>
* There are no missing values in the data set.<br>
* All the attributes are of int type, so there is no need for data transformation.<br>


The data consists of 
| Classtype  |   Percentage of instance |
|------------|--------------------------|
| Class1     |   23.822843822843823     |
| Class2     |   10.924630924630925     |
| Class3     |   21.103341103341103     |
| Class4     |   9.728049728049729      |
| Class5     |   10.986790986790987     |
| Class6     |   0.0                    |
| Class7     |   23.434343434343435     |

The data doesnt contain any example of class '6' :  mixture class (all types present), so we wont be able to predict the soil of class 6.<br>
The data consist of outliers,which is present in the attributes ['Column4', 'Column8', 'Column12', 'Column16', 'Column20', 'Column24', 'Column28', 'Column32', 'Column36']. We need to remove these outliers for better model preformance.<br>
The data is not standardized, which can have negative impact on model performance.<br>

Solution to outliers
Outliners can be identified by using Inter-Quartile Range method(IQR). There were of total of 681 outliers in the dataset which was removed.<br>

Solution to non-standardized
Data standarisation is done to ensure every attribute have a similar effect on decision boundary.Z-score method and range method are some of the method used for standarisation.

# Imports




In [1]:
#Data import 
import pandas as pd
import numpy as np

#Data-preprocessing (removing outliners and standardization)
from sklearn.preprocessing import StandardScaler
from numpy import percentile
from sklearn.model_selection import train_test_split

#Hyper-Parameter Tuning(Grid search cv, Randomized search cv)
from kerastuner.tuners import RandomSearch
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

#Model Impletation(Decision Tree,Random Forest,KNN classifier, Naive Bayes Classifier, SVM  and ANN)
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.naive_bayes import BernoulliNB

from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier



#K-Fold Validation for Model Training
from sklearn.model_selection import KFold

#Evaluation Metrics 
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import  classification_report,confusion_matrix
from sklearn import metrics

  from kerastuner.tuners import RandomSearch


# Data extraction and  Pre-Processing

There are various steps involved in data pre-processing. These steps generally depend on the quality of data we have.

One of the key processes in data mining is to remove the outliners present in the data. These outliners have a Negative impact on model performance. Outliners can be identified by using Inter-Quartile Range method(IQR).

Data standarisation/Scaling is also important for good results.Data standarisation is done to ensure every attribute have a similar effect on decision boundary.Z-score method and range method are some of the method used for standarisation.

In [3]:
#Data extraction
statlogdata  = pd.read_csv("statlog.csv")
features = statlogdata.columns
names = []
for x in range(36):
 names.append(features[x])

In [4]:
#implementing IQR to remove outliers from the dataset
for name in names:
 Q1 = np.percentile(statlogdata[name],25,interpolation='midpoint')
 Q3 = np.percentile(statlogdata[name],75,interpolation='midpoint')

 IQR = Q3-Q1

 

 upper = Q3+1.5*IQR
 lower = Q1-1.5*IQR
 statlogdata = statlogdata[(statlogdata[name]>=lower)&(statlogdata[name]<=upper)]

Users of the modes 'nearest', 'lower', 'higher', or 'midpoint' are encouraged to review the method they used. (Deprecated NumPy 1.22)
  Q1 = np.percentile(statlogdata[name],25,interpolation='midpoint')
Users of the modes 'nearest', 'lower', 'higher', or 'midpoint' are encouraged to review the method they used. (Deprecated NumPy 1.22)
  Q3 = np.percentile(statlogdata[name],75,interpolation='midpoint')


In [5]:
#implementing standardScalar for standardization
X = statlogdata.drop('Class',axis=1)
y = statlogdata['Class']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)
scal=StandardScaler()
scal.fit(X_train)
X_train=scal.transform(X_train)
X_test=scal.transform(X_test)

# 1) Decision Tree



In [8]:
dtc=DecisionTreeClassifier()

#### Hyper-parameter tuning 

In [9]:
max_features = ["auto", "sqrt", "log2"]
ccp_alpha = [0.1, .01, .001]
max_depth = [5, 6, 7, 8, 9]
criterion = ['gini', 'entropy']

param_grid = {'max_features': max_features,
              'ccp_alpha': ccp_alpha,
              'max_depth' : max_depth,
              'criterion' : criterion
             }

param_grid

{'max_features': ['auto', 'sqrt', 'log2'],
 'ccp_alpha': [0.1, 0.01, 0.001],
 'max_depth': [5, 6, 7, 8, 9],
 'criterion': ['gini', 'entropy']}

{'max_features': ['auto', 'sqrt', 'log2'],
 'ccp_alpha': [0.1, 0.01, 0.001],
 'max_depth': [5, 6, 7, 8, 9],
 'criterion': ['gini', 'entropy']}
 This is the best parameters

In [10]:
dtc_grid = GridSearchCV(estimator = dtc, param_grid = param_grid, cv = 5, verbose = 2)
dtc_grid.fit(X_train, y_train)
best_param = dtc_grid.best_params_
print("Best Parameters for Decision Tree after Grid Search CV: ")
best_param

Fitting 5 folds for each of 90 candidates, totalling 450 fits
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=5, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=5, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=5, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=5, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=5, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, crit



[CV] END ccp_alpha=0.1, criterion=gini, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=6, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=6, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=6, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=6, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=6, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=6, max_features=log2; total time=   0.0s



[CV] END ccp_alpha=0.1, criterion=gini, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=7, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=7, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=7, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=7, max_features=log2; total time=   0.0s



[CV] END ccp_alpha=0.1, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=9, max_features=auto; total time=   0.0s



[CV] END ccp_alpha=0.1, criterion=gini, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=gini, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=5, max_features=auto; total tim



[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=5, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=6, max_



[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=6, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=6, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=7, max_



[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_



[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.1, criterion=entropy, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=5, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=5, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=5, max_features=sqrt; t



[CV] END ccp_alpha=0.01, criterion=gini, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=7, max_features=log2; total t



[CV] END ccp_alpha=0.01, criterion=gini, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=8, max_features=log2; total t



[CV] END ccp_alpha=0.01, criterion=gini, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=gini, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=5, max_features=auto; t



[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=5, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_de



[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=7, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_de



[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_de



[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.01, criterion=entropy, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=5, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=5, max



[CV] END ccp_alpha=0.001, criterion=gini, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=6, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=6, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=6, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=6, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=6, max_features=sq



[CV] END ccp_alpha=0.001, criterion=gini, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=7, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=7, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=7, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=7, max_features=lo



[CV] END ccp_alpha=0.001, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=8, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=9, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=gini, max_depth=9, max_features=au



[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=5, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=6, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=6, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entr



[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=7, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=7, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entr



[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=8, max_features=auto; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=8, max_features=sqrt; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entr



[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=9, max_features=log2; total time=   0.0s
[CV] END ccp_alpha=0.001, criterion=entropy, max_depth=9, max_features=log2; total time=   0.0s
Best Parameters for Decision Tree after Grid Search CV: 


{'ccp_alpha': 0.001,
 'criterion': 'gini',
 'max_depth': 9,
 'max_features': 'log2'}

Best Parameters for Decision Tree after Grid Search CV: 
{'ccp_alpha': 0.001,
 'criterion': 'entropy',
 'max_depth': 9,
 'max_features': 'log2'}

In [11]:
print("Accuracy Score: ")
train_accuracy = dtc_grid.score(X_train, y_train)
test_accuracy = dtc_grid.score(X_test, y_test)
print(f"Train Accuracy : {train_accuracy}")
print(f"Test Accuracy : {test_accuracy}")
print()

print("Classification Report: ")
dtc_best = DecisionTreeClassifier(ccp_alpha=best_param['ccp_alpha'], criterion=best_param['criterion'], max_depth=best_param['max_depth'], max_features=best_param['max_features'])
dtc_best.fit(X_train, y_train)
y_predict = dtc_best.predict(X_test)
print(classification_report(y_test, y_predict))

Accuracy Score: 
Train Accuracy : 0.8649118450459399
Test Accuracy : 0.8257093225246092

Classification Report: 
              precision    recall  f1-score   support

           1       0.94      0.92      0.93       474
           2       1.00      0.62      0.77        24
           3       0.86      0.93      0.89       400
           4       0.62      0.46      0.53       189
           5       0.77      0.80      0.78       187
           7       0.82      0.87      0.84       453

    accuracy                           0.84      1727
   macro avg       0.83      0.77      0.79      1727
weighted avg       0.84      0.84      0.84      1727



Accuracy Score: 
Train Accuracy : 0.9057886557886557
Test Accuracy : 0.8391608391608392

Classification Report: 
              precision    recall  f1-score   support

           1       0.93      0.94      0.93       288
           2       0.95      0.93      0.94       138
           3       0.85      0.89      0.87       297
           4       0.58      0.45      0.50       130
           5       0.84      0.76      0.80       150
           7       0.80      0.89      0.84       284

    accuracy                           0.85      1287
   macro avg       0.83      0.81      0.82      1287
weighted avg       0.84      0.85      0.84      1287


In [12]:
# Random Search CV
max_features = ["auto", "sqrt", "log2"] # Number of features considered at each split
ccp_alpha = [0.1, .01, .001]
max_depth = [5, 6, 7, 8, 9] # Max height of Trees
criterion = ['gini', 'entropy']

param_grid = {'max_features': max_features,
              'ccp_alpha': ccp_alpha,
              'max_depth' : max_depth,
              'criterion' : criterion
             }

param_grid

{'max_features': ['auto', 'sqrt', 'log2'],
 'ccp_alpha': [0.1, 0.01, 0.001],
 'max_depth': [5, 6, 7, 8, 9],
 'criterion': ['gini', 'entropy']}

{'max_features': ['auto', 'sqrt', 'log2'],
 'ccp_alpha': [0.1, 0.01, 0.001],
 'max_depth': [5, 6, 7, 8, 9],
 'criterion': ['gini', 'entropy']}

In [13]:
dtc_random = RandomizedSearchCV(estimator = dtc, param_distributions = param_grid, cv = 5)
dtc_random.fit(X_train, y_train)



RandomizedSearchCV(cv=5, estimator=DecisionTreeClassifier(),
                   param_distributions={'ccp_alpha': [0.1, 0.01, 0.001],
                                        'criterion': ['gini', 'entropy'],
                                        'max_depth': [5, 6, 7, 8, 9],
                                        'max_features': ['auto', 'sqrt',
                                                         'log2']})

In [14]:
best_param = dtc_random.best_params_
print("Best Parameters for Decision Tree after Random Search CV: ")
best_param


Best Parameters for Decision Tree after Random Search CV: 


{'max_features': 'log2',
 'max_depth': 9,
 'criterion': 'gini',
 'ccp_alpha': 0.001}

Best Parameters for Decision Tree after Random Search CV: 
{'max_features': 'log2',
 'max_depth': 8,
 'criterion': 'gini',
 'ccp_alpha': 0.001}

In [15]:
print("Accuracy Score: ")
train_accuracy = dtc_random.score(X_train, y_train)
test_accuracy = dtc_random.score(X_test, y_test)
print(f"Train Accuracy : {train_accuracy}")
print(f"Test Accuracy : {test_accuracy}")
print()

print("Classification Report: ")
dtc_best = DecisionTreeClassifier(ccp_alpha=best_param['ccp_alpha'], criterion=best_param['criterion'], max_depth=best_param['max_depth'], max_features=best_param['max_features'])
dtc_best.fit(X_train, y_train)
y_predict = dtc_best.predict(X_test)
print(classification_report(y_test, y_predict))
print()

Accuracy Score: 
Train Accuracy : 0.8862676930717656
Test Accuracy : 0.8575564562825709

Classification Report: 
              precision    recall  f1-score   support

           1       0.94      0.92      0.93       474
           2       0.85      0.71      0.77        24
           3       0.85      0.94      0.89       400
           4       0.65      0.44      0.53       189
           5       0.79      0.75      0.77       187
           7       0.80      0.88      0.84       453

    accuracy                           0.84      1727
   macro avg       0.81      0.77      0.79      1727
weighted avg       0.83      0.84      0.83      1727




Accuracy Score: 
Train Accuracy : 0.8712016574585635
Test Accuracy : 0.8351477449455676

Classification Report: 
              precision    recall  f1-score   support

           1       0.93      0.92      0.92       367
           2       0.58      1.00      0.73        15
           3       0.90      0.86      0.88       104
           4       0.27      0.57      0.37        21
           5       0.67      0.41      0.51        68
           7       0.71      0.71      0.71        68

    accuracy                           0.82       643
   macro avg       0.67      0.74      0.69       643
weighted avg       0.84      0.82      0.83       643

## K-fold validation 

In [17]:
scores=[]
acc = []
from sklearn.model_selection import cross_val_score
kf = KFold(n_splits=10,random_state=None,shuffle=False)
for train_index,test_index in kf.split(X):
    # print(test_index.shape)
    # print(train_index.shape)
    X_train,X_test,y_train,y_test=X.iloc[train_index],X.iloc[test_index],y.iloc[train_index],y.iloc[test_index]
    dtc_best.fit(X_train,y_train)
    y_predict = dtc_best.predict(X_test)
    acc.append(accuracy_score(y_test,y_predict))
    scores.append(dtc_best.score(X_test,y_test))

print(f"Mean Accuracy Score: {np.mean(acc)}")
print()
print("Cross Validation Score: ")
cross_val_score(dtc_best,X,y,cv=10)

Mean Accuracy Score: 0.8230902777777779

Cross Validation Score: 


array([0.85416667, 0.83680556, 0.80381944, 0.84375   , 0.84869565,
       0.77913043, 0.74086957, 0.85043478, 0.8573913 , 0.74086957])

# 2) Random Forest

Random forest classifier is a clasifier based on multiple Decision tree classifiers which are trained on different fraction of training data. In case of classification the majority vote of those decision trees is taken as final output.


#### **sklearn.ensemble.RandomForestClassifier**:
The corresponding **Hyper Parameters** are:
- **n_estimators** : Numbers of decision trees.
- **max_features** : Size of random subsets of features to consider when splitting a node.
- **max_depth** : Maximum depth of the tree.
- **min_samples_split** : The minimum number of samples required to split a node.
- **bootstrap** : Whole dataset will be used when bootstrap=false.

In [18]:
# Random Forest Classifier
rfc = RandomForestClassifier()

Search space for Hyperparameters Tuning is as follows:
> n_estimators = [x for x in range(25, 40, 2)] \
> max_features = ["auto", "sqrt"] \
> max_depth = [5, 6, 7] \
> min_samples_split = [2, 5, 10] \ 
> min_samples_leaf = [1, 2, 4] \
> bootstrap = [True, False] 

## Hyper Parameter Tuning using **Grid Search CV**:

In [19]:
# Grid Search CV
n_estimators = [x for x in range(25, 40, 2)]
max_features = ["auto", "sqrt"]
max_depth = [5, 6, 7] 
min_samples_split = [2, 5, 10] 
min_samples_leaf = [1, 2, 4] 
bootstrap = [True, False]

param_grid = {
    "n_estimators" : n_estimators,
    "max_features" : max_features,
    "max_depth" : max_depth,
    "min_samples_split" : min_samples_split,
    "min_samples_leaf" : min_samples_leaf,
    "bootstrap" : bootstrap
}

param_grid

{'n_estimators': [25, 27, 29, 31, 33, 35, 37, 39],
 'max_features': ['auto', 'sqrt'],
 'max_depth': [5, 6, 7],
 'min_samples_split': [2, 5, 10],
 'min_samples_leaf': [1, 2, 4],
 'bootstrap': [True, False]}

In [20]:
rfc_grid = GridSearchCV(estimator = rfc, param_grid = param_grid, cv = 3, verbose = 2, n_jobs = 4)
rfc_grid.fit(X_train, y_train)

Fitting 3 folds for each of 864 candidates, totalling 2592 fits


  warn(


In [21]:
best_param = rfc_grid.best_params_
print("Best Parameters for Decision Tree after Grid Search CV: ")
best_param

Best Parameters for Decision Tree after Grid Search CV: 


{'bootstrap': False,
 'max_depth': 7,
 'max_features': 'auto',
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'n_estimators': 37}

#### Model Quality: 

In [22]:
print("Accuracy Score: ")
train_accuracy = rfc_grid.score(X_train, y_train)
test_accuracy = rfc_grid.score(X_test, y_test)
print(f"Train Accuracy : {train_accuracy}")
print(f"Test Accuracy : {test_accuracy}")
print()

rfc_best = RandomForestClassifier(bootstrap=best_param['bootstrap'], max_depth=best_param['max_depth'], max_features=best_param['max_features'], min_samples_leaf=best_param['min_samples_leaf'], min_samples_split=best_param['min_samples_split'], n_estimators=best_param['n_estimators'])
rfc_best.fit(X_train, y_train)
print("Classification Report: ")
y_predict = rfc_best.predict(X_test)
print(classification_report(y_test, y_predict))

Accuracy Score: 
Train Accuracy : 0.9260474995172814
Test Accuracy : 0.8921739130434783



  warn(


Classification Report: 
              precision    recall  f1-score   support

           1       0.93      0.99      0.96       344
           3       0.87      0.83      0.85        96
           4       0.22      0.55      0.32        11
           5       0.95      0.59      0.73        61
           7       0.84      0.68      0.75        63

    accuracy                           0.88       575
   macro avg       0.76      0.73      0.72       575
weighted avg       0.90      0.88      0.88       575



### K-Fold CV on 'Best' model obtained after Grid Search CV :

In [24]:
scores=[]
acc = []
from sklearn.model_selection import cross_val_score
kf = KFold(n_splits=10,random_state=None,shuffle=False)
for train_index,test_index in kf.split(X):
    # print(test_index.shape)
    # print(train_index.shape)
    X_train,X_test,y_train,y_test=X.iloc[train_index],X.iloc[test_index],y.iloc[train_index],y.iloc[test_index]
    rfc_best.fit(X_train,y_train)
    y_predict = rfc_best.predict(X_test)
    acc.append(accuracy_score(y_test,y_predict))
    scores.append(rfc_best.score(X_test,y_test))

print(f"Mean Accuracy Score: {np.mean(acc)}")
print()
print("Cross Validation Score: ")
cross_val_score(rfc_best,X,y,cv=10)

  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(


Mean Accuracy Score: 0.8632451690821256

Cross Validation Score: 


  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(


array([0.91319444, 0.88194444, 0.80555556, 0.88368056, 0.89043478,
       0.84347826, 0.81913043, 0.89217391, 0.88521739, 0.84173913])

## Hyper Parameter Tuning using **Random Search CV**:

In [25]:
# Random Search CV:
n_estimators = [x for x in range(25, 40, 2)] # Number of trees
max_features = ["auto", "sqrt"] # Number of features considered at each split
max_depth = [3, 4, 5] # Max height of Trees
min_samples_split = [2, 5, 10] # Minimun number of samples required for splitting a node
min_samples_leaf = [1, 2], 4 # Minimum number of samples required at each leaf node
bootstrap = [True, False] # Method of selecting samples

param_grid = {
    "n_estimators" : n_estimators,
    "max_features" : max_features,
    "max_depth" : max_depth,
    "min_samples_split" : min_samples_split,
    "min_samples_leaf" : min_samples_leaf,
    "bootstrap" : bootstrap
}

param_grid

{'n_estimators': [25, 27, 29, 31, 33, 35, 37, 39],
 'max_features': ['auto', 'sqrt'],
 'max_depth': [3, 4, 5],
 'min_samples_split': [2, 5, 10],
 'min_samples_leaf': ([1, 2], 4),
 'bootstrap': [True, False]}

In [26]:
rfc_random = GridSearchCV(estimator = rfc, param_grid = param_grid, cv = 3, verbose = 2, n_jobs = 4)
rfc_random.fit(X_train, y_train)

Fitting 3 folds for each of 576 candidates, totalling 1728 fits


864 fits failed out of a total of 1728.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
864 fits failed with the following error:
Traceback (most recent call last):
  File "C:\Users\Mohit\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\model_selection\_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\Mohit\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\ensemble\_forest.py", line 340, in fit
    self._validate_params()
  File "C:\Users\Mohit\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\base.py", line 600, in _validate_params
    validate_parameter_constraints(
  File "C:\Users\Mohit\AppData\Local\Programs\Python\Python31

In [27]:
best_param = rfc_random.best_params_
print("Best Parameters for Decision Tree after Random Search CV: ")
best_param

Best Parameters for Decision Tree after Random Search CV: 


{'bootstrap': False,
 'max_depth': 5,
 'max_features': 'auto',
 'min_samples_leaf': 4,
 'min_samples_split': 2,
 'n_estimators': 37}

#### Model Quality

In [28]:
print("Accuracy Score: ")
train_accuracy = rfc_random.score(X_train, y_train)
test_accuracy = rfc_random.score(X_test, y_test)
print(f"Train Accuracy : {train_accuracy}")
print(f"Test Accuracy : {test_accuracy}")
print()

print("Classification Report: ")
rfc_best = RandomForestClassifier(bootstrap=best_param['bootstrap'], max_depth=best_param['max_depth'], max_features=best_param['max_features'], min_samples_leaf=best_param['min_samples_leaf'], min_samples_split=best_param['min_samples_split'], n_estimators=best_param['n_estimators'])
rfc_best.fit(X_train, y_train)
y_predict = rfc_best.predict(X_test)
print(classification_report(y_test, y_predict))
print()

Accuracy Score: 
Train Accuracy : 0.8798995945163159
Test Accuracy : 0.8678260869565217

Classification Report: 


  warn(


              precision    recall  f1-score   support

           1       0.91      0.99      0.95       344
           2       0.00      0.00      0.00         0
           3       0.87      0.89      0.88        96
           4       0.19      0.36      0.25        11
           5       1.00      0.48      0.64        61
           7       0.83      0.70      0.76        63

    accuracy                           0.87       575
   macro avg       0.63      0.57      0.58       575
weighted avg       0.89      0.87      0.87       575




  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


### K-Fold CV on 'Best' model obtained after Random Search CV :

In [30]:
scores=[]
acc = []
# print(X.shape)
# print(Y.shape)
from sklearn.model_selection import cross_val_score
kf = KFold(n_splits=10,random_state=None,shuffle=False)
for train_index,test_index in kf.split(X):
    # print(test_index.shape)
    # print(train_index.shape)
    X_train,X_test,y_train,y_test=X.iloc[train_index],X.iloc[test_index],y.iloc[train_index],y.iloc[test_index]
    rfc_best.fit(X_train,y_train)
    y_predict = rfc_best.predict(X_test)
    acc.append(accuracy_score(y_test,y_predict))
    scores.append(rfc_best.score(X_test,y_test))

print(f"Mean Accuracy Score: {np.mean(acc)}")
print()
print("Cross Validation Score: ")
cross_val_score(rfc_best,X,y,cv=10)

  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(


Mean Accuracy Score: 0.8451636473429952

Cross Validation Score: 


  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(
  warn(


array([0.90277778, 0.87673611, 0.79340278, 0.87326389, 0.86956522,
       0.82608696, 0.79130435, 0.87826087, 0.87826087, 0.77391304])

# 3) Naive Bayes 

In [31]:
from sklearn.naive_bayes import GaussianNB
gnb= GaussianNB()

HyperParamater tunning with GridSearchCV

In [32]:
params_grid = {'var_smoothing': np.logspace(0,-9, num=100)}
grid = GridSearchCV(gnb,params_grid, refit = True,verbose=3)
grid.fit(X_train,y_train)
grid.best_score_


Fitting 5 folds for each of 100 candidates, totalling 500 fits
[CV 1/5] END .................var_smoothing=1.0;, score=0.729 total time=   0.0s
[CV 2/5] END .................var_smoothing=1.0;, score=0.740 total time=   0.0s
[CV 3/5] END .................var_smoothing=1.0;, score=0.746 total time=   0.0s
[CV 4/5] END .................var_smoothing=1.0;, score=0.676 total time=   0.0s
[CV 5/5] END .................var_smoothing=1.0;, score=0.733 total time=   0.0s
[CV 1/5] END ..var_smoothing=0.8111308307896871;, score=0.749 total time=   0.0s
[CV 2/5] END ..var_smoothing=0.8111308307896871;, score=0.750 total time=   0.0s
[CV 3/5] END ..var_smoothing=0.8111308307896871;, score=0.745 total time=   0.0s
[CV 4/5] END ..var_smoothing=0.8111308307896871;, score=0.681 total time=   0.0s
[CV 5/5] END ..var_smoothing=0.8111308307896871;, score=0.736 total time=   0.0s
[CV 1/5] END ...var_smoothing=0.657933224657568;, score=0.773 total time=   0.0s
[CV 2/5] END ...var_smoothing=0.65793322465756

[CV 3/5] END .var_smoothing=0.01519911082952933;, score=0.752 total time=   0.0s
[CV 4/5] END .var_smoothing=0.01519911082952933;, score=0.693 total time=   0.0s
[CV 5/5] END .var_smoothing=0.01519911082952933;, score=0.805 total time=   0.0s
[CV 1/5] END var_smoothing=0.012328467394420659;, score=0.859 total time=   0.0s
[CV 2/5] END var_smoothing=0.012328467394420659;, score=0.791 total time=   0.0s
[CV 3/5] END var_smoothing=0.012328467394420659;, score=0.752 total time=   0.0s
[CV 4/5] END var_smoothing=0.012328467394420659;, score=0.693 total time=   0.0s
[CV 5/5] END var_smoothing=0.012328467394420659;, score=0.806 total time=   0.0s
[CV 1/5] END ................var_smoothing=0.01;, score=0.859 total time=   0.0s
[CV 2/5] END ................var_smoothing=0.01;, score=0.791 total time=   0.0s
[CV 3/5] END ................var_smoothing=0.01;, score=0.752 total time=   0.0s
[CV 4/5] END ................var_smoothing=0.01;, score=0.692 total time=   0.0s
[CV 5/5] END ...............

[CV 1/5] END var_smoothing=0.0001873817422860383;, score=0.860 total time=   0.0s
[CV 2/5] END var_smoothing=0.0001873817422860383;, score=0.788 total time=   0.0s
[CV 3/5] END var_smoothing=0.0001873817422860383;, score=0.752 total time=   0.0s
[CV 4/5] END var_smoothing=0.0001873817422860383;, score=0.692 total time=   0.0s
[CV 5/5] END var_smoothing=0.0001873817422860383;, score=0.811 total time=   0.0s
[CV 1/5] END var_smoothing=0.0001519911082952933;, score=0.860 total time=   0.0s
[CV 2/5] END var_smoothing=0.0001519911082952933;, score=0.788 total time=   0.0s
[CV 3/5] END var_smoothing=0.0001519911082952933;, score=0.752 total time=   0.0s
[CV 4/5] END var_smoothing=0.0001519911082952933;, score=0.692 total time=   0.0s
[CV 5/5] END var_smoothing=0.0001519911082952933;, score=0.811 total time=   0.0s
[CV 1/5] END var_smoothing=0.0001232846739442066;, score=0.860 total time=   0.0s
[CV 2/5] END var_smoothing=0.0001232846739442066;, score=0.788 total time=   0.0s
[CV 3/5] END var

[CV 1/5] END var_smoothing=2.310129700083158e-06;, score=0.860 total time=   0.0s
[CV 2/5] END var_smoothing=2.310129700083158e-06;, score=0.788 total time=   0.0s
[CV 3/5] END var_smoothing=2.310129700083158e-06;, score=0.752 total time=   0.0s
[CV 4/5] END var_smoothing=2.310129700083158e-06;, score=0.691 total time=   0.0s
[CV 5/5] END var_smoothing=2.310129700083158e-06;, score=0.812 total time=   0.0s
[CV 1/5] END var_smoothing=1.873817422860383e-06;, score=0.860 total time=   0.0s
[CV 2/5] END var_smoothing=1.873817422860383e-06;, score=0.788 total time=   0.0s
[CV 3/5] END var_smoothing=1.873817422860383e-06;, score=0.752 total time=   0.0s
[CV 4/5] END var_smoothing=1.873817422860383e-06;, score=0.691 total time=   0.0s
[CV 5/5] END var_smoothing=1.873817422860383e-06;, score=0.812 total time=   0.0s
[CV 1/5] END var_smoothing=1.519911082952933e-06;, score=0.860 total time=   0.0s
[CV 2/5] END var_smoothing=1.519911082952933e-06;, score=0.788 total time=   0.0s
[CV 3/5] END var

[CV 1/5] END var_smoothing=2.310129700083158e-08;, score=0.860 total time=   0.0s
[CV 2/5] END var_smoothing=2.310129700083158e-08;, score=0.788 total time=   0.0s
[CV 3/5] END var_smoothing=2.310129700083158e-08;, score=0.752 total time=   0.0s
[CV 4/5] END var_smoothing=2.310129700083158e-08;, score=0.691 total time=   0.0s
[CV 5/5] END var_smoothing=2.310129700083158e-08;, score=0.812 total time=   0.0s
[CV 1/5] END var_smoothing=1.873817422860383e-08;, score=0.860 total time=   0.0s
[CV 2/5] END var_smoothing=1.873817422860383e-08;, score=0.788 total time=   0.0s
[CV 3/5] END var_smoothing=1.873817422860383e-08;, score=0.752 total time=   0.0s
[CV 4/5] END var_smoothing=1.873817422860383e-08;, score=0.691 total time=   0.0s
[CV 5/5] END var_smoothing=1.873817422860383e-08;, score=0.812 total time=   0.0s
[CV 1/5] END var_smoothing=1.519911082952933e-08;, score=0.860 total time=   0.0s
[CV 2/5] END var_smoothing=1.519911082952933e-08;, score=0.788 total time=   0.0s
[CV 3/5] END var

0.7806584223975528

0.7823169522730492

In [33]:
grid.best_params_

{'var_smoothing': 0.005336699231206307}

{'var_smoothing': 0.0023101297000831605}

Hyperparameter with RandomizedSearchCV


In [34]:
# random search
from sklearn.model_selection import RandomizedSearchCV
gridr = {'var_smoothing': np.logspace(0,-9, num=100)}
gridr = RandomizedSearchCV(gnb,gridr, refit = True,verbose=3)
gridr=gridr.fit(X_train,y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END var_smoothing=0.0006579332246575676;, score=0.860 total time=   0.0s
[CV 2/5] END var_smoothing=0.0006579332246575676;, score=0.788 total time=   0.0s
[CV 3/5] END var_smoothing=0.0006579332246575676;, score=0.753 total time=   0.0s
[CV 4/5] END var_smoothing=0.0006579332246575676;, score=0.691 total time=   0.0s
[CV 5/5] END var_smoothing=0.0006579332246575676;, score=0.811 total time=   0.0s
[CV 1/5] END var_smoothing=2.310129700083158e-05;, score=0.860 total time=   0.0s
[CV 2/5] END var_smoothing=2.310129700083158e-05;, score=0.788 total time=   0.0s
[CV 3/5] END var_smoothing=2.310129700083158e-05;, score=0.752 total time=   0.0s
[CV 4/5] END var_smoothing=2.310129700083158e-05;, score=0.691 total time=   0.0s
[CV 5/5] END var_smoothing=2.310129700083158e-05;, score=0.812 total time=   0.0s
[CV 1/5] END .var_smoothing=0.43287612810830584;, score=0.822 total time=   0.0s
[CV 2/5] END .var_smoothing=0.43287612

In [35]:
gridr.best_score_

0.7804655587264282

0.7823169522730492

In [36]:
gridr.best_params_

{'var_smoothing': 2.310129700083158e-05}

{'var_smoothing': 5.336699231206302e-09}


In [37]:
gnb= GaussianNB(var_smoothing= 0.0023101297000831605)

In [38]:
kf = KFold(n_splits=3,random_state=None,shuffle=False)
scores=[]
for train_index,test_index in kf.split(X):
    X_train,X_test,y_train,y_test=X.iloc[train_index],X.iloc[test_index],y.iloc[train_index],y.iloc[test_index]
    gnb.fit(X_train,y_train)
    y_pred=gnb.predict(X_test)
    scores.append(accuracy_score(y_test,y_pred)) 
np.mean(scores)  

0.7777198470629126

0.7777198470629126

In [39]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           1       0.92      0.80      0.85       547
           2       0.58      0.83      0.68        23
           3       0.90      0.87      0.88       421
           4       0.49      0.67      0.56       219
           5       0.58      0.69      0.63       238
           7       0.84      0.74      0.79       470

    accuracy                           0.77      1918
   macro avg       0.72      0.77      0.73      1918
weighted avg       0.80      0.77      0.78      1918



  precision    recall  f1-score   support

           1       0.92      0.80      0.85       547
           2       0.58      0.83      0.68        23
           3       0.90      0.87      0.88       421
           4       0.49      0.67      0.56       219
           5       0.58      0.69      0.63       238
           7       0.84      0.74      0.79       470

    accuracy                           0.77      1918
   macro avg       0.72      0.77      0.73      1918
weighted avg       0.80      0.77      0.78      1918


In [40]:
print(confusion_matrix(y_test,y_pred))

[[436   1  17   0  93   0]
 [  1  19   0   1   1   1]
 [  3   0 368  47   1   2]
 [  3   0  22 146   5  43]
 [ 30  13   1   9 164  21]
 [  2   0   3  95  20 350]]


[[436   1  17   0  93   0]
 [  1  19   0   1   1   1]
 [  3   0 368  47   1   2]
 [  3   0  22 146   5  43]
 [ 30  13   1   9 164  21]
 [  2   0   3  95  20 350]]

## multinomial

HyperParamater tunning with GridSearchCV

In [41]:
mnb=MultinomialNB()
param_grid = {'alpha': [0.01, 0.1, 1.0, 10.0],'force_alpha' : [True,False],'fit_prior':[True,False],}
grid = GridSearchCV(mnb,param_grid, refit = True,verbose=3)
grid.fit(X_train,y_train)

Fitting 5 folds for each of 16 candidates, totalling 80 fits
[CV 1/5] END alpha=0.01, fit_prior=True, force_alpha=True;, score=0.760 total time=   0.0s
[CV 2/5] END alpha=0.01, fit_prior=True, force_alpha=True;, score=0.817 total time=   0.0s
[CV 3/5] END alpha=0.01, fit_prior=True, force_alpha=True;, score=0.799 total time=   0.0s
[CV 4/5] END alpha=0.01, fit_prior=True, force_alpha=True;, score=0.711 total time=   0.0s
[CV 5/5] END alpha=0.01, fit_prior=True, force_alpha=True;, score=0.653 total time=   0.0s
[CV 1/5] END alpha=0.01, fit_prior=True, force_alpha=False;, score=0.760 total time=   0.0s
[CV 2/5] END alpha=0.01, fit_prior=True, force_alpha=False;, score=0.817 total time=   0.0s
[CV 3/5] END alpha=0.01, fit_prior=True, force_alpha=False;, score=0.799 total time=   0.0s
[CV 4/5] END alpha=0.01, fit_prior=True, force_alpha=False;, score=0.711 total time=   0.0s
[CV 5/5] END alpha=0.01, fit_prior=True, force_alpha=False;, score=0.653 total time=   0.0s
[CV 1/5] END alpha=0.01,

In [42]:
grid.best_score_

0.7484327466318992

0.7484327466318992

In [43]:
grid.best_params_

{'alpha': 10.0, 'fit_prior': True, 'force_alpha': True}

{'alpha': 10.0, 'fit_prior': True, 'force_alpha': True}

In [44]:
mnb=MultinomialNB(alpha= 10.0, fit_prior= True, force_alpha= True)
mnb.fit(X_train, y_train)
y_pred2=mnb.predict(X_test)
y_pred_train=mnb.predict(X_train)
acc1=accuracy_score(y_train,y_pred_train)
print(f"accuracy on train with best param by randomsearch {acc1:0.4f}")
acc=accuracy_score(y_test,y_pred)
print(f"accuracy on test with best param by randomSearch  {acc:0.4f} ")

accuracy on train with best param by randomsearch 0.7719
accuracy on test with best param by randomSearch  0.7732 


accuracy on train with best param by randomsearch 0.7719
accuracy on test with best param by randomSearch  0.7732

In [45]:
kf = KFold(n_splits=3,random_state=None,shuffle=False)
scores=[]
for train_index,test_index in kf.split(X):
    X_train,X_test,y_train,y_test=X.iloc[train_index],X.iloc[test_index],y.iloc[train_index],y.iloc[test_index]
    mnb.fit(X_train,y_train)
    y_pred=mnb.predict(X_test)
    scores.append(accuracy_score(y_test,y_pred))

np.mean(scores)

0.740875912408759

0.740875912408759

In [46]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           1       0.97      0.98      0.98       547
           2       0.59      0.70      0.64        23
           3       0.66      0.83      0.73       421
           4       0.21      0.03      0.05       219
           5       0.82      0.70      0.76       238
           7       0.66      0.81      0.73       470

    accuracy                           0.76      1918
   macro avg       0.65      0.67      0.65      1918
weighted avg       0.72      0.76      0.72      1918



 precision    recall  f1-score   support

           1       0.97      0.98      0.98       547
           2       0.59      0.70      0.64        23
           3       0.66      0.83      0.73       421
           4       0.21      0.03      0.05       219
           5       0.82      0.70      0.76       238
           7       0.66      0.81      0.73       470

    accuracy                           0.76      1918
   macro avg       0.65      0.67      0.65      1918
weighted avg       0.72      0.76      0.72      1918


In [47]:
print(confusion_matrix(y_test,y_pred))

[[537   0   4   0   6   0]
 [  0  16   2   0   5   0]
 [  2   0 348   8   2  61]
 [  1   0 102   6  12  98]
 [ 14  11   6   3 167  37]
 [  0   0  68  11  11 380]]


[[537   0   4   0   6   0]
 [  0  16   2   0   5   0]
 [  2   0 348   8   2  61]
 [  1   0 102   6  12  98]
 [ 14  11   6   3 167  37]
 [  0   0  68  11  11 380]]

Mean Accuracy Score: 0.8329537880471006

Cross Validation Score: 
array([0.84627329, 0.85093168, 0.82142857, 0.86490683, 0.86024845,
       0.7962675 , 0.75427683, 0.84292379, 0.88802488, 0.77293935])

# 4) KNN

KNN (K-Nearest Neighbors) is a non-parametric machine learning algorithm used for classification and regression problems. It works by calculating the distance between a test data point and all other data points in the dataset to identify the K nearest neighbors. The majority label or average value of the K nearest neighbors is then used to make a prediction for the test data point. The value of K is an important hyperparameter that affects the performance of the algorithm. KNN is a simple and effective algorithm but can be computationally expensive for large datasets

#### Hyper Parameters of KNN Classifier are:

- n_neighbors : Number of neighbors to use. Can be any positive integer.<br>
- weights : Weight function used in prediction. Can be {'uniform', 'distance'}.<br>
- algorithm : Algorithm used to compute the nearest neighbors. Can be {'auto', 'ball_tree', 'kd_tree', 'brute'}.<br>
- leaf_size : Leaf size passed to BallTree or KDTree. Can be any positive integer.<br>
- p : Power parameter for the Minkowski metric. Can be any positive integer.<br>

So, we will tune all these hyper parameters using Grid Search and Random Search. Our search space for Random Search and Grid Search is :
- 'n_neighbors' : [3,4,5,6,7,8,9,10,11,12,13,14,15]
- 'weights'     : ['uniform', 'distance']
- 'algorithm'   : ['auto', 'ball_tree','kd_tree', 'brute']
- 'p'           : [1]
- 'leaf_size'   : [12,20,30,40]

In [48]:
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier()

In [49]:
from sklearn.model_selection import GridSearchCV
param_grid = {'n_neighbors' : [3,4,5,6,7,8,9,10,11,12,13,14,15],
              'weights': ['uniform', 'distance'],
              'algorithm' : ['auto', 'ball_tree','kd_tree', 'brute'],
              'p':[1],'leaf_size':[12,20,30,40]}
grid = GridSearchCV(neigh,param_grid, refit = True,verbose=3)
grid=grid.fit(X_train,y_train)

Fitting 5 folds for each of 416 candidates, totalling 2080 fits
[CV 1/5] END algorithm=auto, leaf_size=12, n_neighbors=3, p=1, weights=uniform;, score=0.852 total time=   0.1s
[CV 2/5] END algorithm=auto, leaf_size=12, n_neighbors=3, p=1, weights=uniform;, score=0.829 total time=   0.0s
[CV 3/5] END algorithm=auto, leaf_size=12, n_neighbors=3, p=1, weights=uniform;, score=0.866 total time=   0.0s
[CV 4/5] END algorithm=auto, leaf_size=12, n_neighbors=3, p=1, weights=uniform;, score=0.828 total time=   0.0s
[CV 5/5] END algorithm=auto, leaf_size=12, n_neighbors=3, p=1, weights=uniform;, score=0.803 total time=   0.0s
[CV 1/5] END algorithm=auto, leaf_size=12, n_neighbors=3, p=1, weights=distance;, score=0.850 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=12, n_neighbors=3, p=1, weights=distance;, score=0.828 total time=   0.0s
[CV 3/5] END algorithm=auto, leaf_size=12, n_neighbors=3, p=1, weights=distance;, score=0.863 total time=   0.0s
[CV 4/5] END algorithm=auto, leaf_siz

[CV 4/5] END algorithm=auto, leaf_size=12, n_neighbors=10, p=1, weights=uniform;, score=0.829 total time=   0.0s
[CV 5/5] END algorithm=auto, leaf_size=12, n_neighbors=10, p=1, weights=uniform;, score=0.821 total time=   0.0s
[CV 1/5] END algorithm=auto, leaf_size=12, n_neighbors=10, p=1, weights=distance;, score=0.870 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=12, n_neighbors=10, p=1, weights=distance;, score=0.832 total time=   0.0s
[CV 3/5] END algorithm=auto, leaf_size=12, n_neighbors=10, p=1, weights=distance;, score=0.879 total time=   0.0s
[CV 4/5] END algorithm=auto, leaf_size=12, n_neighbors=10, p=1, weights=distance;, score=0.837 total time=   0.0s
[CV 5/5] END algorithm=auto, leaf_size=12, n_neighbors=10, p=1, weights=distance;, score=0.815 total time=   0.0s
[CV 1/5] END algorithm=auto, leaf_size=12, n_neighbors=11, p=1, weights=uniform;, score=0.872 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=12, n_neighbors=11, p=1, weights=uniform;, score=0.8

[CV 3/5] END algorithm=auto, leaf_size=20, n_neighbors=4, p=1, weights=distance;, score=0.864 total time=   0.0s
[CV 4/5] END algorithm=auto, leaf_size=20, n_neighbors=4, p=1, weights=distance;, score=0.836 total time=   0.0s
[CV 5/5] END algorithm=auto, leaf_size=20, n_neighbors=4, p=1, weights=distance;, score=0.799 total time=   0.0s
[CV 1/5] END algorithm=auto, leaf_size=20, n_neighbors=5, p=1, weights=uniform;, score=0.853 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=20, n_neighbors=5, p=1, weights=uniform;, score=0.836 total time=   0.0s
[CV 3/5] END algorithm=auto, leaf_size=20, n_neighbors=5, p=1, weights=uniform;, score=0.881 total time=   0.0s
[CV 4/5] END algorithm=auto, leaf_size=20, n_neighbors=5, p=1, weights=uniform;, score=0.834 total time=   0.0s
[CV 5/5] END algorithm=auto, leaf_size=20, n_neighbors=5, p=1, weights=uniform;, score=0.806 total time=   0.0s
[CV 1/5] END algorithm=auto, leaf_size=20, n_neighbors=5, p=1, weights=distance;, score=0.857 total t

[CV 1/5] END algorithm=auto, leaf_size=20, n_neighbors=12, p=1, weights=uniform;, score=0.876 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=20, n_neighbors=12, p=1, weights=uniform;, score=0.828 total time=   0.0s
[CV 3/5] END algorithm=auto, leaf_size=20, n_neighbors=12, p=1, weights=uniform;, score=0.885 total time=   0.0s
[CV 4/5] END algorithm=auto, leaf_size=20, n_neighbors=12, p=1, weights=uniform;, score=0.823 total time=   0.0s
[CV 5/5] END algorithm=auto, leaf_size=20, n_neighbors=12, p=1, weights=uniform;, score=0.824 total time=   0.0s
[CV 1/5] END algorithm=auto, leaf_size=20, n_neighbors=12, p=1, weights=distance;, score=0.875 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=20, n_neighbors=12, p=1, weights=distance;, score=0.832 total time=   0.0s
[CV 3/5] END algorithm=auto, leaf_size=20, n_neighbors=12, p=1, weights=distance;, score=0.880 total time=   0.0s
[CV 4/5] END algorithm=auto, leaf_size=20, n_neighbors=12, p=1, weights=distance;, score=0.83

[CV 1/5] END algorithm=auto, leaf_size=30, n_neighbors=6, p=1, weights=distance;, score=0.865 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=30, n_neighbors=6, p=1, weights=distance;, score=0.836 total time=   0.0s
[CV 3/5] END algorithm=auto, leaf_size=30, n_neighbors=6, p=1, weights=distance;, score=0.877 total time=   0.0s
[CV 4/5] END algorithm=auto, leaf_size=30, n_neighbors=6, p=1, weights=distance;, score=0.831 total time=   0.0s
[CV 5/5] END algorithm=auto, leaf_size=30, n_neighbors=6, p=1, weights=distance;, score=0.807 total time=   0.0s
[CV 1/5] END algorithm=auto, leaf_size=30, n_neighbors=7, p=1, weights=uniform;, score=0.861 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=30, n_neighbors=7, p=1, weights=uniform;, score=0.828 total time=   0.0s
[CV 3/5] END algorithm=auto, leaf_size=30, n_neighbors=7, p=1, weights=uniform;, score=0.879 total time=   0.0s
[CV 4/5] END algorithm=auto, leaf_size=30, n_neighbors=7, p=1, weights=uniform;, score=0.834 total 

[CV 4/5] END algorithm=auto, leaf_size=30, n_neighbors=13, p=1, weights=distance;, score=0.833 total time=   0.0s
[CV 5/5] END algorithm=auto, leaf_size=30, n_neighbors=13, p=1, weights=distance;, score=0.833 total time=   0.0s
[CV 1/5] END algorithm=auto, leaf_size=30, n_neighbors=14, p=1, weights=uniform;, score=0.878 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=30, n_neighbors=14, p=1, weights=uniform;, score=0.832 total time=   0.0s
[CV 3/5] END algorithm=auto, leaf_size=30, n_neighbors=14, p=1, weights=uniform;, score=0.884 total time=   0.0s
[CV 4/5] END algorithm=auto, leaf_size=30, n_neighbors=14, p=1, weights=uniform;, score=0.829 total time=   0.0s
[CV 5/5] END algorithm=auto, leaf_size=30, n_neighbors=14, p=1, weights=uniform;, score=0.832 total time=   0.0s
[CV 1/5] END algorithm=auto, leaf_size=30, n_neighbors=14, p=1, weights=distance;, score=0.882 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=30, n_neighbors=14, p=1, weights=distance;, score=0.83

[CV 4/5] END algorithm=auto, leaf_size=40, n_neighbors=8, p=1, weights=uniform;, score=0.829 total time=   0.0s
[CV 5/5] END algorithm=auto, leaf_size=40, n_neighbors=8, p=1, weights=uniform;, score=0.824 total time=   0.0s
[CV 1/5] END algorithm=auto, leaf_size=40, n_neighbors=8, p=1, weights=distance;, score=0.862 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=40, n_neighbors=8, p=1, weights=distance;, score=0.832 total time=   0.0s
[CV 3/5] END algorithm=auto, leaf_size=40, n_neighbors=8, p=1, weights=distance;, score=0.876 total time=   0.0s
[CV 4/5] END algorithm=auto, leaf_size=40, n_neighbors=8, p=1, weights=distance;, score=0.840 total time=   0.0s
[CV 5/5] END algorithm=auto, leaf_size=40, n_neighbors=8, p=1, weights=distance;, score=0.819 total time=   0.0s
[CV 1/5] END algorithm=auto, leaf_size=40, n_neighbors=9, p=1, weights=uniform;, score=0.868 total time=   0.0s
[CV 2/5] END algorithm=auto, leaf_size=40, n_neighbors=9, p=1, weights=uniform;, score=0.825 total 

[CV 5/5] END algorithm=auto, leaf_size=40, n_neighbors=15, p=1, weights=distance;, score=0.837 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=3, p=1, weights=uniform;, score=0.850 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=3, p=1, weights=uniform;, score=0.832 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=3, p=1, weights=uniform;, score=0.866 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=3, p=1, weights=uniform;, score=0.827 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=3, p=1, weights=uniform;, score=0.801 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=3, p=1, weights=distance;, score=0.849 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=3, p=1, weights=distance;, score=0.831 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=3, p=

[CV 5/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=9, p=1, weights=distance;, score=0.823 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=10, p=1, weights=uniform;, score=0.871 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=10, p=1, weights=uniform;, score=0.832 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=10, p=1, weights=uniform;, score=0.885 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=10, p=1, weights=uniform;, score=0.828 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=10, p=1, weights=uniform;, score=0.820 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=10, p=1, weights=distance;, score=0.871 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=12, n_neighbors=10, p=1, weights=distance;, score=0.834 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=12, n_neig

[CV 1/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=4, p=1, weights=uniform;, score=0.858 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=4, p=1, weights=uniform;, score=0.837 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=4, p=1, weights=uniform;, score=0.879 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=4, p=1, weights=uniform;, score=0.825 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=4, p=1, weights=uniform;, score=0.811 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=4, p=1, weights=distance;, score=0.848 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=4, p=1, weights=distance;, score=0.827 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=4, p=1, weights=distance;, score=0.862 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=4

[CV 2/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=11, p=1, weights=uniform;, score=0.828 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=11, p=1, weights=uniform;, score=0.883 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=11, p=1, weights=uniform;, score=0.834 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=11, p=1, weights=uniform;, score=0.825 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=11, p=1, weights=distance;, score=0.872 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=11, p=1, weights=distance;, score=0.829 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=11, p=1, weights=distance;, score=0.880 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=20, n_neighbors=11, p=1, weights=distance;, score=0.837 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=20, n_ne

[CV 2/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=5, p=1, weights=uniform;, score=0.836 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=5, p=1, weights=uniform;, score=0.881 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=5, p=1, weights=uniform;, score=0.829 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=5, p=1, weights=uniform;, score=0.804 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=5, p=1, weights=distance;, score=0.857 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=5, p=1, weights=distance;, score=0.837 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=5, p=1, weights=distance;, score=0.877 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=5, p=1, weights=distance;, score=0.828 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=

[CV 3/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=12, p=1, weights=uniform;, score=0.885 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=12, p=1, weights=uniform;, score=0.824 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=12, p=1, weights=uniform;, score=0.825 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=12, p=1, weights=distance;, score=0.874 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=12, p=1, weights=distance;, score=0.831 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=12, p=1, weights=distance;, score=0.880 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=12, p=1, weights=distance;, score=0.831 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=30, n_neighbors=12, p=1, weights=distance;, score=0.828 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=30, n_n

[CV 3/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=6, p=1, weights=uniform;, score=0.881 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=6, p=1, weights=uniform;, score=0.833 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=6, p=1, weights=uniform;, score=0.815 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=6, p=1, weights=distance;, score=0.865 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=6, p=1, weights=distance;, score=0.836 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=6, p=1, weights=distance;, score=0.877 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=6, p=1, weights=distance;, score=0.832 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=6, p=1, weights=distance;, score=0.810 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=40, n_neighbors

[CV 5/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=13, p=1, weights=uniform;, score=0.828 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=13, p=1, weights=distance;, score=0.884 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=13, p=1, weights=distance;, score=0.837 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=13, p=1, weights=distance;, score=0.879 total time=   0.0s
[CV 4/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=13, p=1, weights=distance;, score=0.834 total time=   0.0s
[CV 5/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=13, p=1, weights=distance;, score=0.833 total time=   0.0s
[CV 1/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=14, p=1, weights=uniform;, score=0.880 total time=   0.0s
[CV 2/5] END algorithm=ball_tree, leaf_size=40, n_neighbors=14, p=1, weights=uniform;, score=0.832 total time=   0.0s
[CV 3/5] END algorithm=ball_tree, leaf_size=40, n_n

[CV 1/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=7, p=1, weights=distance;, score=0.866 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=7, p=1, weights=distance;, score=0.828 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=7, p=1, weights=distance;, score=0.877 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=7, p=1, weights=distance;, score=0.832 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=7, p=1, weights=distance;, score=0.814 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=8, p=1, weights=uniform;, score=0.865 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=8, p=1, weights=uniform;, score=0.827 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=8, p=1, weights=uniform;, score=0.884 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=8, p=1, weights=u

[CV 4/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=14, p=1, weights=distance;, score=0.833 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=14, p=1, weights=distance;, score=0.829 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=15, p=1, weights=uniform;, score=0.880 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=15, p=1, weights=uniform;, score=0.832 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=15, p=1, weights=uniform;, score=0.883 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=15, p=1, weights=uniform;, score=0.833 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=15, p=1, weights=uniform;, score=0.838 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=15, p=1, weights=distance;, score=0.882 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=12, n_neighbors=15, p=1, we

[CV 1/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=9, p=1, weights=uniform;, score=0.870 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=9, p=1, weights=uniform;, score=0.827 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=9, p=1, weights=uniform;, score=0.881 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=9, p=1, weights=uniform;, score=0.840 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=9, p=1, weights=uniform;, score=0.821 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=9, p=1, weights=distance;, score=0.872 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=9, p=1, weights=distance;, score=0.828 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=9, p=1, weights=distance;, score=0.880 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=9, p=1, weights=dis

[CV 3/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=3, p=1, weights=uniform;, score=0.867 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=3, p=1, weights=uniform;, score=0.825 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=3, p=1, weights=uniform;, score=0.801 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=3, p=1, weights=distance;, score=0.850 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=3, p=1, weights=distance;, score=0.827 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=3, p=1, weights=distance;, score=0.864 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=3, p=1, weights=distance;, score=0.829 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=3, p=1, weights=distance;, score=0.803 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=4, p=1, weights=u

[CV 4/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=10, p=1, weights=uniform;, score=0.833 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=10, p=1, weights=uniform;, score=0.819 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=10, p=1, weights=distance;, score=0.872 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=10, p=1, weights=distance;, score=0.833 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=10, p=1, weights=distance;, score=0.880 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=10, p=1, weights=distance;, score=0.840 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=10, p=1, weights=distance;, score=0.814 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=11, p=1, weights=uniform;, score=0.874 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=30, n_neighbors=11, p=1, 

[CV 5/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=4, p=1, weights=uniform;, score=0.811 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=4, p=1, weights=distance;, score=0.848 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=4, p=1, weights=distance;, score=0.828 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=4, p=1, weights=distance;, score=0.864 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=4, p=1, weights=distance;, score=0.834 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=4, p=1, weights=distance;, score=0.798 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=5, p=1, weights=uniform;, score=0.858 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=5, p=1, weights=uniform;, score=0.837 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=5, p=1, weights=u

[CV 3/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=11, p=1, weights=distance;, score=0.881 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=11, p=1, weights=distance;, score=0.836 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=11, p=1, weights=distance;, score=0.825 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=12, p=1, weights=uniform;, score=0.875 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=12, p=1, weights=uniform;, score=0.828 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=12, p=1, weights=uniform;, score=0.885 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=12, p=1, weights=uniform;, score=0.824 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=12, p=1, weights=uniform;, score=0.824 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=40, n_neighbors=12, p=1, we

[CV 5/5] END algorithm=brute, leaf_size=12, n_neighbors=5, p=1, weights=distance;, score=0.808 total time=   0.0s
[CV 1/5] END algorithm=brute, leaf_size=12, n_neighbors=6, p=1, weights=uniform;, score=0.871 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=12, n_neighbors=6, p=1, weights=uniform;, score=0.831 total time=   0.0s
[CV 3/5] END algorithm=brute, leaf_size=12, n_neighbors=6, p=1, weights=uniform;, score=0.883 total time=   0.0s
[CV 4/5] END algorithm=brute, leaf_size=12, n_neighbors=6, p=1, weights=uniform;, score=0.831 total time=   0.0s
[CV 5/5] END algorithm=brute, leaf_size=12, n_neighbors=6, p=1, weights=uniform;, score=0.816 total time=   0.0s
[CV 1/5] END algorithm=brute, leaf_size=12, n_neighbors=6, p=1, weights=distance;, score=0.865 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=12, n_neighbors=6, p=1, weights=distance;, score=0.836 total time=   0.0s
[CV 3/5] END algorithm=brute, leaf_size=12, n_neighbors=6, p=1, weights=distance;, score=0.87

[CV 4/5] END algorithm=brute, leaf_size=12, n_neighbors=13, p=1, weights=uniform;, score=0.831 total time=   0.0s
[CV 5/5] END algorithm=brute, leaf_size=12, n_neighbors=13, p=1, weights=uniform;, score=0.827 total time=   0.0s
[CV 1/5] END algorithm=brute, leaf_size=12, n_neighbors=13, p=1, weights=distance;, score=0.880 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=12, n_neighbors=13, p=1, weights=distance;, score=0.833 total time=   0.0s
[CV 3/5] END algorithm=brute, leaf_size=12, n_neighbors=13, p=1, weights=distance;, score=0.879 total time=   0.0s
[CV 4/5] END algorithm=brute, leaf_size=12, n_neighbors=13, p=1, weights=distance;, score=0.833 total time=   0.0s
[CV 5/5] END algorithm=brute, leaf_size=12, n_neighbors=13, p=1, weights=distance;, score=0.833 total time=   0.0s
[CV 1/5] END algorithm=brute, leaf_size=12, n_neighbors=14, p=1, weights=uniform;, score=0.878 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=12, n_neighbors=14, p=1, weights=uniform;, 

[CV 1/5] END algorithm=brute, leaf_size=20, n_neighbors=7, p=1, weights=distance;, score=0.865 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=20, n_neighbors=7, p=1, weights=distance;, score=0.828 total time=   0.0s
[CV 3/5] END algorithm=brute, leaf_size=20, n_neighbors=7, p=1, weights=distance;, score=0.877 total time=   0.0s
[CV 4/5] END algorithm=brute, leaf_size=20, n_neighbors=7, p=1, weights=distance;, score=0.832 total time=   0.0s
[CV 5/5] END algorithm=brute, leaf_size=20, n_neighbors=7, p=1, weights=distance;, score=0.814 total time=   0.0s
[CV 1/5] END algorithm=brute, leaf_size=20, n_neighbors=8, p=1, weights=uniform;, score=0.863 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=20, n_neighbors=8, p=1, weights=uniform;, score=0.827 total time=   0.0s
[CV 3/5] END algorithm=brute, leaf_size=20, n_neighbors=8, p=1, weights=uniform;, score=0.884 total time=   0.0s
[CV 4/5] END algorithm=brute, leaf_size=20, n_neighbors=8, p=1, weights=uniform;, score=0.8

[CV 4/5] END algorithm=brute, leaf_size=20, n_neighbors=14, p=1, weights=distance;, score=0.833 total time=   0.0s
[CV 5/5] END algorithm=brute, leaf_size=20, n_neighbors=14, p=1, weights=distance;, score=0.831 total time=   0.0s
[CV 1/5] END algorithm=brute, leaf_size=20, n_neighbors=15, p=1, weights=uniform;, score=0.880 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=20, n_neighbors=15, p=1, weights=uniform;, score=0.834 total time=   0.0s
[CV 3/5] END algorithm=brute, leaf_size=20, n_neighbors=15, p=1, weights=uniform;, score=0.881 total time=   0.0s
[CV 4/5] END algorithm=brute, leaf_size=20, n_neighbors=15, p=1, weights=uniform;, score=0.833 total time=   0.0s
[CV 5/5] END algorithm=brute, leaf_size=20, n_neighbors=15, p=1, weights=uniform;, score=0.838 total time=   0.0s
[CV 1/5] END algorithm=brute, leaf_size=20, n_neighbors=15, p=1, weights=distance;, score=0.882 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=20, n_neighbors=15, p=1, weights=distance;, s

[CV 3/5] END algorithm=brute, leaf_size=30, n_neighbors=9, p=1, weights=uniform;, score=0.881 total time=   0.0s
[CV 4/5] END algorithm=brute, leaf_size=30, n_neighbors=9, p=1, weights=uniform;, score=0.841 total time=   0.0s
[CV 5/5] END algorithm=brute, leaf_size=30, n_neighbors=9, p=1, weights=uniform;, score=0.819 total time=   0.0s
[CV 1/5] END algorithm=brute, leaf_size=30, n_neighbors=9, p=1, weights=distance;, score=0.872 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=30, n_neighbors=9, p=1, weights=distance;, score=0.828 total time=   0.0s
[CV 3/5] END algorithm=brute, leaf_size=30, n_neighbors=9, p=1, weights=distance;, score=0.880 total time=   0.0s
[CV 4/5] END algorithm=brute, leaf_size=30, n_neighbors=9, p=1, weights=distance;, score=0.844 total time=   0.0s
[CV 5/5] END algorithm=brute, leaf_size=30, n_neighbors=9, p=1, weights=distance;, score=0.824 total time=   0.0s
[CV 1/5] END algorithm=brute, leaf_size=30, n_neighbors=10, p=1, weights=uniform;, score=0.

[CV 1/5] END algorithm=brute, leaf_size=40, n_neighbors=3, p=1, weights=distance;, score=0.850 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=40, n_neighbors=3, p=1, weights=distance;, score=0.828 total time=   0.0s
[CV 3/5] END algorithm=brute, leaf_size=40, n_neighbors=3, p=1, weights=distance;, score=0.863 total time=   0.0s
[CV 4/5] END algorithm=brute, leaf_size=40, n_neighbors=3, p=1, weights=distance;, score=0.832 total time=   0.0s
[CV 5/5] END algorithm=brute, leaf_size=40, n_neighbors=3, p=1, weights=distance;, score=0.806 total time=   0.0s
[CV 1/5] END algorithm=brute, leaf_size=40, n_neighbors=4, p=1, weights=uniform;, score=0.854 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=40, n_neighbors=4, p=1, weights=uniform;, score=0.838 total time=   0.0s
[CV 3/5] END algorithm=brute, leaf_size=40, n_neighbors=4, p=1, weights=uniform;, score=0.880 total time=   0.0s
[CV 4/5] END algorithm=brute, leaf_size=40, n_neighbors=4, p=1, weights=uniform;, score=0.8

[CV 1/5] END algorithm=brute, leaf_size=40, n_neighbors=11, p=1, weights=uniform;, score=0.872 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=40, n_neighbors=11, p=1, weights=uniform;, score=0.829 total time=   0.0s
[CV 3/5] END algorithm=brute, leaf_size=40, n_neighbors=11, p=1, weights=uniform;, score=0.883 total time=   0.0s
[CV 4/5] END algorithm=brute, leaf_size=40, n_neighbors=11, p=1, weights=uniform;, score=0.834 total time=   0.0s
[CV 5/5] END algorithm=brute, leaf_size=40, n_neighbors=11, p=1, weights=uniform;, score=0.827 total time=   0.0s
[CV 1/5] END algorithm=brute, leaf_size=40, n_neighbors=11, p=1, weights=distance;, score=0.874 total time=   0.0s
[CV 2/5] END algorithm=brute, leaf_size=40, n_neighbors=11, p=1, weights=distance;, score=0.831 total time=   0.0s
[CV 3/5] END algorithm=brute, leaf_size=40, n_neighbors=11, p=1, weights=distance;, score=0.880 total time=   0.0s
[CV 4/5] END algorithm=brute, leaf_size=40, n_neighbors=11, p=1, weights=distance;, s

Best parameter by grid search


In [51]:
grid.best_params_


{'algorithm': 'auto',
 'leaf_size': 12,
 'n_neighbors': 15,
 'p': 1,
 'weights': 'uniform'}

{'algorithm': 'ball_tree',
 'leaf_size': 12,
 'n_neighbors': 15,
 'p': 1,
 'weights': 'uniform'}
 The best score is 0.8645687645687646.

In [52]:
from sklearn.model_selection import RandomizedSearchCV
param_gridR={'n_neighbors' : [3,4,5,6,7,8,9,10,11,12,13,14,15],'weights': ['uniform', 'distance'],'algorithm' : ['auto', 'ball_tree','kd_tree', 'brute']}
gridr = RandomizedSearchCV(neigh,param_grid, refit = True,verbose=3)
gridr=gridr.fit(X_train,y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=13, p=1, weights=distance;, score=0.884 total time=   0.1s
[CV 2/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=13, p=1, weights=distance;, score=0.836 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=13, p=1, weights=distance;, score=0.880 total time=   0.0s
[CV 4/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=13, p=1, weights=distance;, score=0.832 total time=   0.0s
[CV 5/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=13, p=1, weights=distance;, score=0.831 total time=   0.0s
[CV 1/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=8, p=1, weights=uniform;, score=0.865 total time=   0.0s
[CV 2/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=8, p=1, weights=uniform;, score=0.827 total time=   0.0s
[CV 3/5] END algorithm=kd_tree, leaf_size=20, n_neighbors=8, p=1, weights=uniform;, score=0.884 total time=   0.0s
[CV 4/5] 

best parameter by random search 

In [53]:
gridr.best_params_

{'weights': 'distance',
 'p': 1,
 'n_neighbors': 13,
 'leaf_size': 20,
 'algorithm': 'kd_tree'}

{'weights': 'uniform',
 'p': 1,
 'n_neighbors': 15,
 'leaf_size': 30,
 'algorithm': 'auto'}
 the best score is 0.8627039627039628

####    K-Fold Validation
K-fold cross-validation is a technique used in machine learning to evaluate the performance of a model by dividing the available data into K subsets or folds. In this technique, the model is trained and evaluated K times, with each of the K folds used as a testing set exactly once, and the remaining K-1 folds used as the training data. The performance of the model is then calculated by averaging the evaluation results from each of the K iterations. K-fold cross-validation helps to mitigate the issue of overfitting by using all available data for both training and testing, and provides a more accurate estimate of the model's performance on unseen data.

The value of nsplits is set to 10,which determines the number of splits
shuffle is set to false 
random_state=none

In [55]:
neigh2 = KNeighborsClassifier()

In [56]:
for train_index,test_index in kf.split(X):
    X_train,X_test,y_train,y_test=X.iloc[train_index],X.iloc[test_index],y.iloc[train_index],y.iloc[test_index]
    neigh2.fit(X_train,y_train)
    y_pred=neigh2.predict(X_test)
    scores.append(accuracy_score(y_test,y_pred))

np.mean(scores)

0.8096976016684044

0.8826728826728827

### accuracy scores, ,matrix

In [58]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           1       0.96      0.99      0.97       547
           2       1.00      0.74      0.85        23
           3       0.90      0.93      0.91       421
           4       0.69      0.67      0.68       219
           5       0.91      0.85      0.88       238
           7       0.87      0.87      0.87       470

    accuracy                           0.89      1918
   macro avg       0.89      0.84      0.86      1918
weighted avg       0.89      0.89      0.89      1918



              precision    recall  f1-score   support

           1       0.95      0.99      0.97       556
           2       0.97      0.97      0.97       224
           3       0.89      0.93      0.91       422
           4       0.69      0.68      0.68       219
           5       0.92      0.79      0.85       254
           7       0.87      0.86      0.86       470

    accuracy                           0.89      2145
   macro avg       0.88      0.87      0.87      2145
weighted avg       0.89      0.89      0.89      2145

In [60]:
print(confusion_matrix(y_test,y_pred))

[[540   0   3   2   2   0]
 [  0  17   1   1   2   2]
 [  2   0 390  24   0   5]
 [  4   0  28 147   2  38]
 [ 16   0   0   3 202  17]
 [  0   0  11  36  14 409]]


# 5) Support Vector machine

Support Vector Machines (SVM) is a type of machine learning algorithm where we find the hyperplane that best separates the data points of different classes in a high-dimensional space. The hyperplane with the maximum margin from the nearest data points of different classes is selected as the optimal hyperplane.
If the data is not linearly separable, SVM uses a kernel function to transform the data into a higher-dimensional space where it can be linearly separable.



#### Hyper-Parameter Tuning 

#####    Hyperparameter of SVM:
C: controls the trade-off between maximizing the margin and minimizing the classification error.<br>
gamma : affects the smoothness of the decision boundary.<br>Smaller values gives a smooth boundary.
kernel : The kernel function determines the shape of the decision boundary.<br>
<br>
So, we will tune all these hyper parameters using Grid Search and Random Search. Our search space for Random Search and grid Search�is:<br>
  
  
C : [0.1, 1, 10, 100, 1000]<br>
gamma: [1, 0.1, 0.01, 0.001, 0.0001]<br>
kernel : ['rbf','poly','sigmoid','linear']<br>




In [None]:
# Grid Search CV

param_grid = {'C' : [0.1, 1, 10, 100, 1000],
              'gamma':[1, 0.1, 0.01, 0.001, 0.0001],
              'kernel':['rbf','poly','sigmoid','linear']}

grid = GridSearchCV(SVC(),param_grid, refit = True,verbose=3)

grid.fit(X_train,y_train)

In [None]:
best_param = grid.best_params_

{'C': 10, 'gamma': 0.1, 'kernel': 'rbf'} is the result for the best parameters given by Grid search CV which had a score of 0.908

In [None]:
# random Search CV

param_grid = {'C' : [0.1, 1, 10, 100, 1000],
          'gamma':[1, 0.1, 0.01, 0.001, 0.0001],
          'kernel':['rbf','poly','sigmoid','linear']}

randgrid = RandomizedSearchCV(SVC(),param_grid, refit = True,verbose=3)

randgrid.fit(X_train,y_train)

{'kernel': 'poly', 'gamma': 0.1, 'C': 1} is the result for the best parameters given by random search which had a score of 0.885

####    K-Fold Validation
K-fold cross-validation is a technique used in machine learning to evaluate the performance of a model by dividing the available data into K subsets or folds. In this technique, the model is trained and evaluated K times, with each of the K folds used as a testing set exactly once, and the remaining K-1 folds used as the training data. The performance of the model is then calculated by averaging the evaluation results from each of the K iterations. K-fold cross-validation helps to mitigate the issue of overfitting by using all available data for both training and testing, and provides a more accurate estimate of the model's performance on unseen data.

The value of nsplits is set to 10,which determines the number of splits
shuffle is set to false 
random_state=none 

In [None]:
scores=[]
cross = []
svm = SVC(C = best_param['C'], gamma = best_param['gamma'], kernel = best_param['kernel'])
kf = KFold(n_splits=10,random_state=None,shuffle=False)
for train_index,test_index in kf.split(X):
    X_train,X_test,y_train,y_test=X.iloc[train_index],X.iloc[test_index],y.iloc[train_index],y.iloc[test_index]
    svm.fit(X_train,y_train)
    y_pred=svm.predict(X_test)
    scores.append(accuracy_score(y_test,y_pred))
    cross.append(svm.score(X_test,y_test))
    
np.mean(scores)
np.mean(cross)

0.8726313405797101 is the mean accuracy_score of the model. 
0.8726313405797101 is the neam cross_val_score of the model. 
 precision    recall  f1-score   support

           1       1.00      0.77      0.87      1527
           2       1.00      1.00      1.00        77
           3       1.00      0.93      0.96      1354
           4       1.00      0.98      0.99       623
           5       1.00      0.91      0.95       667
           7       0.75      1.00      0.85      1506

    accuracy                           0.91      5754
   macro avg       0.96      0.93      0.94      5754
weighted avg       0.93      0.91      0.91      5754


# 6) Artificial neural network

Model consists of multiple layers of neurons, including input, hidden, and output layers. The input layer receives the data, the hidden layers process it, and the output layer produces the final result. Each neuron in the hidden layers applies a mathematical function to its inputs, and the output of each neuron is fed into the next layer until the final output is produced.

#####    Hyperparameter of ANN:
num_layers: the number of hidden layers in the neural network <br>
dense : number of neurons in each hidden layer.<br>
Learning rate : a hyper-parameter that controls the weights of our neural network with respect to the loss gradient<br>
<br>
So, we will tune all these hyper parameters using Grid Search and Random Search. Our search space for Random Search and�grid�Search�is:<br>
  
  
num_layers : [2,75]<br>
dense: [32,512] in steps of 32<br>
learning rate : 1e-2, 1e-3, 1e-4 <br>




In [None]:
def build_model(hp):
    model = keras.Sequential()
    for i in range(hp.Int('num_layers', 2, 75)):
        model.add(layers.Dense(units=hp.Int('units_' + str(i),
                                            min_value=32,
                                            max_value=512,
                                            step=32),
                               activation='relu'))
    model.add(layers.Dense(1, activation='softmax'))
    model.compile(
        optimizer=keras.optimizers.Adam(
            hp.Choice('learning_rate', [1e-2, 1e-3, 1e-4])),
        loss=keras.losses.CategoricalCrossentropy(),
        metrics=['accuracy'])
    return model

In [None]:
tuner = RandomSearch(
    build_model,
    objective='accuracy',
    max_trials=5,
    overwrite=True,
    executions_per_trial=3,
    directory='project',
    project_name='Statlog')

In [None]:
tuner.search(X_train, y_train,
             epochs=20,
             validation_data=(X_test, y_test))

In [None]:
nnclassifier  = MLPClassifier(solver='lbfgs',hidden_layer_sizes=(512,352,224,128,448,256,96,128,416,96,96,416,480,128,320,480,480,192,160,128,512,96,448,480,256,64,256,256,416,96,64,128,480),alpha=0.0001, random_state=1)
nnclassifier.max_iter=1000
nnclassifier.fit(X_train,y_train)
y_predict = nnclassifier.predict(X_test)
print(classification_report(y_test,y_predict))

#####    K-fold validation:


In [None]:
scores=[]
cross = []
nn = MLPClassifier()
kf = KFold(n_splits=10,random_state=None,shuffle=False)
for train_index,test_index in kf.split(X):
    X_train,X_test,y_train,y_test=X.iloc[train_index],X.iloc[test_index],y.iloc[train_index],y.iloc[test_index]
    nn.fit(X_train,y_train)
    y_pred=nn.predict(X_test)
    scores.append(accuracy_score(y_test,y_pred))
    cross.append(svm.score(X_test,y_test))
    
np.mean(scores)

0.7805989973242661 is the mean accuracy score.
0.7731453396829691 is the mean cross_val_score.

In [None]:
y_pred= nn.predict(X_test)
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))

    precision    recall  f1-score   support

           1       0.96      0.38      0.54       455
           2       0.64      0.08      0.14       199
           3       0.00      0.00      0.00       399
           4       0.00      0.00      0.00       196
           5       0.23      0.39      0.29       231
           7       0.32      0.91      0.47       451

    accuracy                           0.36      1931
   macro avg       0.36      0.29      0.24      1931
weighted avg       0.39      0.36      0.29      1931
[[172   9   0  34  95 145]
 [  0  16   0   2 154  27]
 [  0   0   0   0  11 388]
 [  0   0   0   0   3 193]
 [  6   0   0   0  90 135]
 [  1   0   0   0  40 410]]

