# k-nearest neighbors (KNN)

The k-nearest neighbors (KNN) algorithm is a simple, supervised machine learning algorithm that can be used to solve both classification and regression problems. It's easy to implement and understand, but has a major drawback of becoming significantly slows as the size of that data in use grows.

> it mainly depends upon 4 factors;
- point 
- k value 
- jamhoriyat
- rishtydari


~~~
- k= number of neighbors 
- k should not be low nor too high
- predict the response value based on the neighbors which is nearest and more in numbers (minkowski distance)
- can also be used for numerical data/ regression
~~~

 k-nearest neighbor accuracy measurement
   1. jaccard index
   2. F1_score
   3. log loss
   4. some others also
      1. classification accuracy
      2. confusion matrix
      3. area under curve
      4. mean absolute error
      5. mean squared error

- accuracy_score can be replaced by 
  - precision _score
  - recall_score
  - f1_score

## pros of KNN
- training phase is faster 
- instance based learning algorithm
- can be used with non linear data
## cons of KNN
- testing phase is slower
- costly for memory and computation
- not suitable for large dimensions
## how to improve:
- data wrangling and scaling
- missing value
- normalization on same scale for everything (-1-0-1)
- reduce dimensions to improve performance

_**lets get hands on!**_

In [1]:
import pandas as pd 
# dataset path
file_path = 'data/multivarBiryani.csv'

# reading our dataset as a dataframe
df= pd.read_csv(file_path)

# taking a first look on our data
df.head()

# df = pd.read_csv("multivarBiryani.csv")
df['gender']= df['gender'].replace('Male',1)
df['gender']= df['gender'].replace('Female',0)
df.tail()

Unnamed: 0,age,height,weight,gender,likeness
240,31,160.0,60.0,1,Pakora
241,26,172.0,70.0,1,Biryani
242,40,178.0,80.0,1,Biryani
243,25,5.7,65.0,1,Biryani
244,33,157.0,56.0,0,Samosa


In [3]:
#selection of input and output vars
X = df[["weight","gender"]]
y = df["likeness"]

In [4]:
X.head()

Unnamed: 0,weight,gender
0,76.0,1
1,70.0,1
2,80.0,1
3,102.0,1
4,67.0,1


In [6]:
y.tail()

240     Pakora
241    Biryani
242    Biryani
243    Biryani
244     Samosa
Name: likeness, dtype: object

In [7]:
#model and prediction
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=5)

#train the model using the training set
model.fit(X,y)
#predict output
predicted= model.predict([[70,1]])
predicted

array(['Biryani'], dtype=object)

In [8]:
model.predict(X)

array(['Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Samosa', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Samosa', 'Samosa', 'Pakora', 'Biryani',
       'Samosa', 'Biryani', 'Biryani', 'Samosa', 'Biryani', 'Samosa',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Pakora', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani

In [9]:
#metrices for evaluation
##split data into train and test (80/20)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
X_train, X_test, y_train, y_test =train_test_split(X,y,test_size=0.2)

In [13]:
#creat a model
model = KNeighborsClassifier()
#fitting a  model
model.fit(X_train, y_train)

predicted_values = model.predict(X_test)
predicted_values
#checking score
score = accuracy_score(y_test,predicted_values)
print("the accuracy score of the model is = ",score)

the accuracy score of the model is =  0.7551020408163265
