## k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems
#### A classification problem has a discrete value as its output. For example, "Person interested in insurance(1) or not(0)"  based on his age and location are discrete. There is no middle value between 0 and 1.
#### KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other.

##### The KNN Algorithm
    1.  Load the data
    2.  Initialize K to your chosen number of neighbors
    3.  For each example in the data
    4.  Calculate the distance between each data point from our feature parameters.
    5.  Add the distance and the index of the example to an ordered collection
    6.  Sort the ordered collection of distances and indices from smallest to largest (in ascending order) by the distances
    7.  Pick the first K entries from the sorted collection
    8.  Get the labels of the selected K entries
    9.  If regression, return the mean of the K labels
    10. If classification, return the mode of the K labels

In [77]:
# Importing essential libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statistics

In [78]:
# Reading data into dataframe using pandas read_csv method
# Load the data
df=pd.read_csv("insurance.csv")
df

Unnamed: 0,State,Age,Insurance Interest
0,Bangalore,43,1
1,Hyderabad,29,0
2,Chennai,23,0
3,Mumbai,49,1
4,Chennai,50,1
5,Mumbai,35,0
6,Kolkata,67,1
7,Delhi,89,1
8,Hyderabad,19,0
9,Bangalore,27,0


In [79]:
# data preprocessing
df=df.replace({
    "Bangalore":0,
    "Hyderabad":1,
    "Mumbai":2,
    "Chennai":3,
    "Kolkata":4,
    "Trivendram":5,
    "Delhi":6
})

In [80]:
# Getting values from dataframe
data=df.values

# Initialize K to your chosen number of neighbors
k=3

In [81]:
# Function that calculates distance from our datapoints to each sample
def distance(arr1,OurIndependendValues):
    SumOfDistances=0
    for i in range(len(arr1)):
        SumOfDistances+=abs(arr1[i]-OurIndependendValues[i])
    return SumOfDistances

In [82]:
# Implementing KNN algorithm

def KNN(data,OurIndependentValues,k):
    
    # If not enough nearest neighbour's then return info that relates..    
    if(k>len(data)):
        return f"K({k}) should not be greater than count of total samples({len(data)})"
    
    # Initilizing array of tuples that holds our data
    DistanceIndexes=[]
    
    # For each sample in a given dataset, calculate distance and remember index of sample..
    for index,sample in enumerate(data):
        
        Distance= distance(sample[:-1], OurIndependentValues)
        
        # Recording distance and it's respective index into an array
        DistanceIndexes.append((Distance,index))
    
    # Sort recorded array and slice it down to k nearest neighbour's
    # get label's of indexes in sliced array
    # Because it is a classification problem, use mode instead od mean
    return statistics.mode([data[index][-1] for distance,index in sorted(DistanceIndexes)[:k]])

In [83]:
KNN(data,[4,67],3)

1