# K- Nearest Neighbors(KNN)

* Predictions are made on the basis of similarity of observations

![alt text](https://cambridgecoding.files.wordpress.com/2016/01/knn2.jpg)

## 1-)MODEL

In [1]:
import numpy as np
import pandas as pd 
from sklearn.model_selection import train_test_split

In [2]:
diabetes = pd.read_csv("diabetes.csv")
df = diabetes.copy()
df = df.dropna()
y = df["Outcome"]
X = df.drop(['Outcome'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.30, 
                                                    random_state=42)



In [3]:
from sklearn.neighbors import KNeighborsClassifier

In [4]:
knn = KNeighborsClassifier()
knn_model = knn.fit(X_train, y_train)
knn_model

KNeighborsClassifier()

In [11]:
knn_model.n_neighbors# default value of n_neighbors is 5

5

## 2-)Prediction

In [5]:
y_pred = knn_model.predict(X_test)
y_pred[0:5]

array([1, 0, 0, 0, 0], dtype=int64)

In [7]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

In [8]:
accuracy_score(y_test, y_pred) # before model tuning

0.6883116883116883

In [9]:
confusion_matrix(y_test, y_pred) # before model tuning

array([[114,  37],
       [ 35,  45]], dtype=int64)

In [10]:
print(classification_report(y_test, y_pred)) # before model tuning

              precision    recall  f1-score   support

           0       0.77      0.75      0.76       151
           1       0.55      0.56      0.56        80

    accuracy                           0.69       231
   macro avg       0.66      0.66      0.66       231
weighted avg       0.69      0.69      0.69       231



## 3-) Model tuning

* In this section, we will try to determine the optimum **n_neighbors**  with the GridSearchCV method.


* GridSearchCV: Grid Search Cross Validation Methode



* Then , we will create the most optimum model by using optimum **n_neighbors** .





* **n_neighbors** are the hyperparameters that we will determine according to ourselves and we want it to be the most optimum.



* But instead of relying on our own feeling and sense in order to find the  optimum value of these hyperparameters   , we will find the optimum value of these hyperparameters   by using the gridsearch method.


* Default value of **n_neighbors** is 5


In [12]:
from sklearn.model_selection import GridSearchCV

In [13]:
knn_params = {"n_neighbors": np.arange(1,50)}

In [14]:
knn1= KNeighborsClassifier()
knn_cv = GridSearchCV(knn1, knn_params, cv=10)
knn_cv.fit(X_train, y_train)

GridSearchCV(cv=10, estimator=KNeighborsClassifier(),
             param_grid={'n_neighbors': array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])})

In [15]:
knn_cv.best_params_# optimum number of n_neighbors

{'n_neighbors': 11}

### 3.1-) Tuned Model

In [16]:
knn3 = KNeighborsClassifier(11)
knn_tuned = knn3.fit(X_train, y_train)

In [18]:
y_pred1 = knn_tuned.predict(X_test)
y_pred1[0:10]

array([1, 0, 0, 0, 1, 0, 0, 0, 0, 1], dtype=int64)

In [19]:
accuracy_score(y_test, y_pred1)# after model tuning

0.7316017316017316

In [20]:
confusion_matrix(y_test, y_pred1)# after model tuning

array([[123,  28],
       [ 34,  46]], dtype=int64)

In [21]:
print(classification_report(y_test, y_pred1))# after model tuning

              precision    recall  f1-score   support

           0       0.78      0.81      0.80       151
           1       0.62      0.57      0.60        80

    accuracy                           0.73       231
   macro avg       0.70      0.69      0.70       231
weighted avg       0.73      0.73      0.73       231

