# Problem Statement

Dataset: Social advertisement data for a product. It contain users and there data (like gender, age, salary) and whether they purchesed the product or not.



1. What type of problem ?
Classification because, we have to predict for the product will be purched by the user or not. We have two classes, purchesed or not.

Target column consist of two discrete values

1. 

In [47]:
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

In [65]:
filename = "social_advertising.csv"

df = pd.read_csv(filename)
df.shape


(400, 5)

In [66]:
df.head()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


In [67]:
df.isnull().sum() # is used to count the number of missing (NaN) values in each column of a DataFrame

User ID            0
Gender             0
Age                0
EstimatedSalary    0
Purchased          0
dtype: int64

In [68]:
df.dtypes

User ID             int64
Gender             object
Age                 int64
EstimatedSalary     int64
Purchased           int64
dtype: object

In [69]:
# target column is Purchesed
# check for possible target values in dataset

print(f'Unique values are {df["Purchased"].unique()}')
print(f'Total unique value is {df["Purchased"].nunique()}') # so only two categories are there

Unique values are [0 1]
Total unique value is 2


In [70]:
# In classical machine learning we have to convert string values in feature into numbers. 
# In deep learning we have text classification where texts are also acceptable.

df["Gender"].replace('Male', 1, inplace=True)
df["Gender"].replace('Female', 0, inplace=True)
df.head()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,1,19,19000,0
1,15810944,1,35,20000,0
2,15668575,0,26,43000,0
3,15603246,0,27,57000,0
4,15804002,1,19,76000,0


In [71]:
# now the features are 
x = df[["Gender", "Age", "EstimatedSalary"]]

# and the target
y = df["Purchased"]

In [72]:
# not splitting the data into train and test set

# test_size: percent of data will be used for model test
# random_state: 
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=32)


In [56]:
# Getting the model
knn_model = KNeighborsClassifier(n_neighbors = 3)

# Train the data
knn_model.fit(x_train, y_train)

# Predict
y_pred = knn_model.predict(x_test)

# Checking accuracy
accuracy = accuracy_score(y_test, y_pred)

print(accuracy)

0.825


In [75]:
for i in range(0, 10):
    k = 2 * i + 1;
    
    # Getting the model
    knn_model = KNeighborsClassifier(n_neighbors = k)

    # Train the data
    knn_model.fit(x_train, y_train)

    # Predict
    y_pred = knn_model.predict(x_test)

    # Checking accuracy
    accuracy = accuracy_score(y_test, y_pred)

    print(f'For k={k}, \t accuracy is {accuracy}')

For k=1, 	 accuracy is 0.7875
For k=3, 	 accuracy is 0.825
For k=5, 	 accuracy is 0.7875
For k=7, 	 accuracy is 0.775
For k=9, 	 accuracy is 0.825
For k=11, 	 accuracy is 0.8125
For k=13, 	 accuracy is 0.825
For k=15, 	 accuracy is 0.8
For k=17, 	 accuracy is 0.8125
For k=19, 	 accuracy is 0.8125


Now we can see the data is not uniform because the Gender and Age values are very low (magnitude) 
as compared to Estimated Salary, So this model is more biased towards Estimated Salary

So we need scaling in data
ML model can perform better if the features are relatively in a similar scale
KNN, SVM, Neural Network, PCA, LDA these generally need normalised data
 
Different Scalling techniques are there
1. [Standard Scaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
1. [Min Max Scaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) 





In [60]:
scalar = StandardScaler()
x_train = scalar.fit_transform(x_train)       # here fit -> understand the data and transform -> scale the data
x_test = scalar.transform(x_test)             # here we already fit the train data so it understand the data


In [61]:
x_train

array([[-1.00000000e+00,  1.60494768e-01,  8.81554170e-02],
       [-1.00000000e+00,  1.43708796e+00, -9.52245711e-01],
       [-1.00000000e+00,  1.92808533e+00,  1.54471700e+00],
       [ 1.00000000e+00, -1.21429789e+00,  2.66509896e-01],
       [ 1.00000000e+00,  1.14248953e+00,  1.17881163e-01],
       [-1.00000000e+00,  1.63348691e+00,  1.12855654e+00],
       [ 1.00000000e+00, -7.23300515e-01,  1.09883080e+00],
       [-1.00000000e+00,  6.51492148e-01,  2.05005469e+00],
       [ 1.00000000e+00,  2.22268376e+00, -1.07114870e+00],
       [-1.00000000e+00,  8.47891100e-01, -1.13060019e+00],
       [ 1.00000000e+00,  1.14248953e+00, -1.24950318e+00],
       [ 1.00000000e+00, -2.32303135e-01, -1.48730915e+00],
       [ 1.00000000e+00, -1.70529527e+00,  4.74590122e-01],
       [ 1.00000000e+00, -1.50889632e+00, -1.54676064e+00],
       [-1.00000000e+00, -2.32303135e-01,  2.07058403e-01],
       [-1.00000000e+00, -8.21499991e-01,  3.85412882e-01],
       [-1.00000000e+00,  6.22952925e-02

In [62]:
# Now again train the model with scaled data and see the accuracy
# Getting the model
knn_model = KNeighborsClassifier(n_neighbors = 3)

# Train the data
knn_model.fit(x_train, y_train)

# Predict
y_pred = knn_model.predict(x_test)

# Checking accuracy
accuracy = accuracy_score(y_test, y_pred)

print(accuracy) # much higher accuracy

0.9125


### Now we can create a pipeline
Sequentially apply a list of transforms and a final estimator.
https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

In [73]:
from sklearn.pipeline import Pipeline

# here we are defining the steps for transforming the data and the estimator
pipeline = Pipeline([('scaler', StandardScaler()), ('knn', KNeighborsClassifier(n_neighbors = 3))])
pipeline

In [74]:
# we don't need to pass the scaled data to pipeline

pipeline.fit(x_train, y_train)
pipeline.predict(x_test)
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)

0.9125


In [99]:
## We can predict for best k value using optuna
# !pip install optuna
# !pip install plotly
# !pip install cufflinks

import optuna # hyper parameter tuning algorithms

# https://optuna.readthedocs.io/en/stable/reference/index.html

In [100]:
df = pd.read_csv(filename)
df["Gender"].replace('Male', 1, inplace=True)
df["Gender"].replace('Female', 0, inplace=True)

x = df[["Gender", "Age", "EstimatedSalary"]]
y = df["Purchased"]

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=32)
scalar = StandardScaler()
x_train = scalar.fit_transform(x_train)
x_test = scalar.transform(x_test)


In [101]:
def objective(trail):
    k = trail.suggest_int('k', 1, 50)
    model = KNeighborsClassifier(n_neighbors = k)
    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    accuracy = accuracy_score(y_test, y_pred)
    return accuracy

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=1000)
print(f'Best k = {study.best_params}')
print(f'accuracy = {study.best_value}')

[I 2023-12-29 21:33:27,533] A new study created in memory with name: no-name-9e70ca53-7372-4de0-a1f2-cbcc6fba07d7
[I 2023-12-29 21:33:27,549] Trial 0 finished with value: 0.8875 and parameters: {'k': 1}. Best is trial 0 with value: 0.8875.
[I 2023-12-29 21:33:27,555] Trial 1 finished with value: 0.85 and parameters: {'k': 40}. Best is trial 0 with value: 0.8875.
[I 2023-12-29 21:33:27,559] Trial 2 finished with value: 0.925 and parameters: {'k': 15}. Best is trial 2 with value: 0.925.
[I 2023-12-29 21:33:27,562] Trial 3 finished with value: 0.9 and parameters: {'k': 35}. Best is trial 2 with value: 0.925.
[I 2023-12-29 21:33:27,564] Trial 4 finished with value: 0.925 and parameters: {'k': 21}. Best is trial 2 with value: 0.925.
[I 2023-12-29 21:33:27,566] Trial 5 finished with value: 0.925 and parameters: {'k': 20}. Best is trial 2 with value: 0.925.
[I 2023-12-29 21:33:27,569] Trial 6 finished with value: 0.925 and parameters: {'k': 20}. Best is trial 2 with value: 0.925.
[I 2023-12-2

[I 2023-12-29 21:33:27,859] Trial 64 finished with value: 0.925 and parameters: {'k': 28}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:27,864] Trial 65 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:27,868] Trial 66 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:27,873] Trial 67 finished with value: 0.9125 and parameters: {'k': 26}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:27,878] Trial 68 finished with value: 0.9 and parameters: {'k': 35}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:27,882] Trial 69 finished with value: 0.925 and parameters: {'k': 21}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:27,887] Trial 70 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:27,891] Trial 71 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 wi

[I 2023-12-29 21:33:28,174] Trial 129 finished with value: 0.9 and parameters: {'k': 35}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,180] Trial 130 finished with value: 0.925 and parameters: {'k': 13}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,185] Trial 131 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,190] Trial 132 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,195] Trial 133 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,200] Trial 134 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,205] Trial 135 finished with value: 0.9125 and parameters: {'k': 26}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,210] Trial 136 finished with value: 0.925 and parameters: {'k': 28}. Best is tr

[I 2023-12-29 21:33:28,516] Trial 193 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,523] Trial 194 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,528] Trial 195 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,535] Trial 196 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,540] Trial 197 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,548] Trial 198 finished with value: 0.925 and parameters: {'k': 28}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,555] Trial 199 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,561] Trial 200 finished with value: 0.925 and parameters: {'k': 33}. Best is

[I 2023-12-29 21:33:28,906] Trial 257 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,912] Trial 258 finished with value: 0.925 and parameters: {'k': 27}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,919] Trial 259 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,925] Trial 260 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,931] Trial 261 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,937] Trial 262 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,943] Trial 263 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:28,949] Trial 264 finished with value: 0.9375 and parameters: {'k': 29}. Best i

[I 2023-12-29 21:33:29,328] Trial 321 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,335] Trial 322 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,342] Trial 323 finished with value: 0.9125 and parameters: {'k': 26}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,349] Trial 324 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,356] Trial 325 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,363] Trial 326 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,369] Trial 327 finished with value: 0.8625 and parameters: {'k': 45}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,377] Trial 328 finished with value: 0.9375 and parameters: {'k': 31}. Best 

[I 2023-12-29 21:33:29,794] Trial 385 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,801] Trial 386 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,809] Trial 387 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,817] Trial 388 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,824] Trial 389 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,832] Trial 390 finished with value: 0.925 and parameters: {'k': 28}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,840] Trial 391 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:29,848] Trial 392 finished with value: 0.9125 and parameters: {'k': 5}. Best is

[I 2023-12-29 21:33:30,297] Trial 449 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,304] Trial 450 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,313] Trial 451 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,321] Trial 452 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,329] Trial 453 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,337] Trial 454 finished with value: 0.925 and parameters: {'k': 28}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,346] Trial 455 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,354] Trial 456 finished with value: 0.9375 and parameters: {'k': 29}. Best i

[I 2023-12-29 21:33:30,820] Trial 513 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,828] Trial 514 finished with value: 0.925 and parameters: {'k': 17}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,836] Trial 515 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,844] Trial 516 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,853] Trial 517 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,860] Trial 518 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,868] Trial 519 finished with value: 0.9 and parameters: {'k': 35}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:30,876] Trial 520 finished with value: 0.9375 and parameters: {'k': 30}. Best is t

[I 2023-12-29 21:33:31,345] Trial 577 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,353] Trial 578 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,361] Trial 579 finished with value: 0.925 and parameters: {'k': 28}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,369] Trial 580 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,378] Trial 581 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,386] Trial 582 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,394] Trial 583 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,404] Trial 584 finished with value: 0.9375 and parameters: {'k': 30}. Best i

[I 2023-12-29 21:33:31,901] Trial 641 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,910] Trial 642 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,919] Trial 643 finished with value: 0.9 and parameters: {'k': 34}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,929] Trial 644 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,937] Trial 645 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,946] Trial 646 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,955] Trial 647 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:31,964] Trial 648 finished with value: 0.9375 and parameters: {'k': 30}. Best is

[I 2023-12-29 21:33:32,505] Trial 705 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:32,515] Trial 706 finished with value: 0.925 and parameters: {'k': 28}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:32,524] Trial 707 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:32,534] Trial 708 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:32,543] Trial 709 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:32,553] Trial 710 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:32,562] Trial 711 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:32,571] Trial 712 finished with value: 0.925 and parameters: {'k': 33}. Best i

[I 2023-12-29 21:33:33,119] Trial 769 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,129] Trial 770 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,139] Trial 771 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,148] Trial 772 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,158] Trial 773 finished with value: 0.925 and parameters: {'k': 28}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,168] Trial 774 finished with value: 0.8875 and parameters: {'k': 36}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,179] Trial 775 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,188] Trial 776 finished with value: 0.9375 and parameters: {'k': 31}. Best 

[I 2023-12-29 21:33:33,762] Trial 833 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,772] Trial 834 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,782] Trial 835 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,791] Trial 836 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,801] Trial 837 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,811] Trial 838 finished with value: 0.925 and parameters: {'k': 28}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,821] Trial 839 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:33,831] Trial 840 finished with value: 0.9375 and parameters: {'k': 31}. Best i

[I 2023-12-29 21:33:34,425] Trial 897 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:34,436] Trial 898 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:34,447] Trial 899 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:34,457] Trial 900 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:34,468] Trial 901 finished with value: 0.9125 and parameters: {'k': 26}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:34,478] Trial 902 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:34,488] Trial 903 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:34,499] Trial 904 finished with value: 0.9375 and parameters: {'k': 30}. Best 

[I 2023-12-29 21:33:35,174] Trial 961 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:35,186] Trial 962 finished with value: 0.925 and parameters: {'k': 33}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:35,198] Trial 963 finished with value: 0.9375 and parameters: {'k': 31}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:35,211] Trial 964 finished with value: 0.9375 and parameters: {'k': 30}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:35,222] Trial 965 finished with value: 0.9375 and parameters: {'k': 32}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:35,235] Trial 966 finished with value: 0.9375 and parameters: {'k': 29}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:35,246] Trial 967 finished with value: 0.9 and parameters: {'k': 35}. Best is trial 8 with value: 0.9375.
[I 2023-12-29 21:33:35,257] Trial 968 finished with value: 0.925 and parameters: {'k': 28}. Best is t

Best k = {'k': 31}
accuracy = 0.9375


In [102]:
optuna.visualization.plot_slice(study)

ImportError: Tried to import 'plotly' but failed. Please make sure that the package is installed correctly to use this feature. Actual error: No module named 'plotly'.

Summary
1. KNN memorising the data points, when ever new sample comes it projecting and calucating the distances
2. Then it is going to take K nearest surroundings and ans will be the majority voting assigned to the label
3. Scaling in required, if we are not scale the data and apply KNN then accuracy will be less and in future for predict if we don't use scaled data then it will be mis classification
4. We can use pipeline, and define intermediate processors 
5. Optimise hyper parameters using optuna
