# Develop k-Nearest Neighbors Classifier in Python with Libraries

<font color='green'> 
I implemented k-Nearest Neighbors Classification Algorithm in python with `KNeighborsClassifier` library using [diabetes.csv](https://www.kaggle.com/saurabh00007/diabetescsv) dataset.
</font>

#### Kaynaklar: 

- [Vahit Keskin - (50 Saat) Python A-Z™: Veri Bilimi ve Machine Learning - 342. K-En Yakın Komşu (KNN) - Teori](https://www.udemy.com/course/python-egitimi/learn/lecture/14622114#overview)

- [Vahit Keskin - (50 Saat) Python A-Z™: Veri Bilimi ve Machine Learning - 343. KNN - Model & Tahmin](https://www.udemy.com/course/python-egitimi/learn/lecture/14622116#overview)

- [Vahit Keskin - (50 Saat) Python A-Z™: Veri Bilimi ve Machine Learning - 344. KNN - Model Tuning](https://www.udemy.com/course/python-egitimi/learn/lecture/14622118#overview)

In [38]:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score, classification_report

### <font color='blue'> Loading Dataset </font>

In [29]:
diabetes = pd.read_csv("diabetes.csv")
df = diabetes.copy()
df = df.dropna()
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


## Step1: Model & Tahmin

#### <font color='blue'> Modele Besleyeceğimiz train, test değerlerini ayarlamak</font>

In [17]:
y = df["Outcome"]
X = df.drop(['Outcome'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.30, 
                                                    random_state=42)

In [18]:
knn_params = {"n_neighbors": np.arange(1,50)} 
# k komşu sayılarını kendisi otomatik olarak bulacak Grid SearchCV sayesinde

#### <font color='blue'> Modeli Oluşturmak ve Tahmin Yapmak</font>

In [32]:
knn = KNeighborsClassifier()
knn_model = knn.fit(X_train,y_train)
knn_model 

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

n_neighbors'un ön tanımlı değeri 5'miş. 

In [35]:
y_pred = knn_model.predict(X_test)
y_pred

array([1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
       1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
       0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1,
       0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0,
       0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0,
       0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int64)

In [36]:
accuracy_score(y_test, y_pred)

0.6883116883116883

In [40]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.77      0.75      0.76       151
           1       0.55      0.56      0.56        80

    accuracy                           0.69       231
   macro avg       0.66      0.66      0.66       231
weighted avg       0.69      0.69      0.69       231



## Step2: Model Tuning

#### <font color='blue'> Modeli oluşturmak ve GridSearchCV kullanarak en iyi parametreleri tespit etmek</font>

In [19]:
knn = KNeighborsClassifier()
knn_cv = GridSearchCV(knn, knn_params, cv=10) # cv: cross validation yöntemi
knn_cv.fit(X_train,y_train)

GridSearchCV(cv=10, error_score=nan,
             estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30,
                                            metric='minkowski',
                                            metric_params=None, n_jobs=None,
                                            n_neighbors=5, p=2,
                                            weights='uniform'),
             iid='deprecated', n_jobs=None,
             param_grid={'n_neighbors': array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

In [21]:
print("En iyi skor:" + str(knn_cv.best_score_))
print("En iyi parametreler "+ str(knn_cv.best_params_))

En iyi skor:0.748637316561845
En iyi parametreler {'n_neighbors': 11}


#### <font color='blue'> En iyi parametrelere göre modeli yeniden eğitmek ve test skorunu almak</font>

In [22]:
knn = KNeighborsClassifier(11) 
knn_tuned = knn.fit(X_train, y_train)

In [26]:
y_pred = knn_tuned.predict(X_test)

In [27]:
accuracy_score(y_test, y_pred)

0.7316017316017316

In [31]:
knn_tuned.score(X_test, y_test) # yukarıdaki predict + accuracy_score ile aynı işlemi yapıyor.

0.7316017316017316