# Develop k-Nearest Neighbors Classifier in Python From Scratch

<font color='green'> 
I implemented k-Nearest Neighbors Classification Algorithm in python from scratch using [iris.csv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv) dataset.
</font>

#### Kaynaklar: 

- [Develop k-Nearest Neighbors in Python From Scratch](https://machinelearningmastery.com/tutorial-to-implement-k-nearest-neighbors-in-python-from-scratch/)

## Giriş

In [3]:
import pandas as pd
import math 
from sklearn.model_selection import train_test_split

## Loading Dataset

In [4]:
iris = pd.read_csv("iris.csv")

In [5]:
df = iris.copy()

In [6]:
df.head()

Unnamed: 0,5.1,3.5,1.4,0.2,Iris-setosa
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa
3,5.0,3.6,1.4,0.2,Iris-setosa
4,5.4,3.9,1.7,0.4,Iris-setosa


## Data Processing

In [7]:
def column_rename(df,column_list):
    df.loc[-1] = df.columns.values
    df = df.sort_index()
    df = df.reset_index(drop=True)
    df = df.rename({df.columns.values[0]:column_list[0], df.columns.values[1]:column_list[1], df.columns.values[2]:column_list[2], df.columns.values[3]:column_list[3], df.columns.values[4]:column_list[4]},
            axis=1)
    df.iloc[0][0:-1] = df.iloc[0][0:4].apply(float)
    
    return df

## KNN 

In [8]:
def euclidean_distance(row1, row2):
    distance = 0.0
    for i in range(len(row1)-1): # çiçek türünü göz ardı ettik
        distance = distance + ((row1[i]-row2[i])**2)
        
    return math.sqrt(distance)

In [9]:
def get_neighbors(train, test_row, num_neighbors): # test_rowu fonksiyon içine aldık
    distances_list = list()
    for row in train:
        distance = euclidean_distance(test_row, row)
        distances_list.append((row, distance)) 
    
    distances_list.sort(key=lambda x: x[1]) # önce listeyi sıraladık
    neighbors = list()
    for i in range(num_neighbors): # sadece K tanesini aldık farklı bir listeye ekledik
        neighbors.append(distances_list[i][0]) 
    
    return neighbors

In [10]:
def predict_classification(train, test_row, num_neighbors):
    neighbors = get_neighbors(train, test_row, num_neighbors)
    output_values = [neighbor[-1] for neighbor in neighbors]
    prediction = max(set(output_values), key=output_values.count) # en fazla görülen değeri döndürüyor
    return prediction

## Evaluate 

In [11]:
def accuracy(train, num_neighbors):
    l = list()
    
    for row in train:
        prediction = predict_classification(train, row, num_neighbors) 
    
        if row[-1] == prediction:
            l.append(True)
        
        else:
            l.append(False)
            
    
    return sum(l)/len(train)

In [12]:
column_list = ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm", "Species"]
iris_df = column_rename(df,column_list)
iris_v = iris_df.values
print(accuracy(iris_v, 5))

0.9666666666666667
