# => K Nearest Neighbor

# Ml algorithms
1. Simple Linear Regression
2. Multiple Linear Regression
3. Polynomial Feature(Regression)

4. Lasso Regression
5. Ridge Regression

6. Logistic Regression
    - Binary classification
    - Multi class classification

7. Decision Tree
8. Random Forest
9. SVM
10. K means clustering
# 11. K-Nearest Neighbor Algorthim

## 1 K-Nearest Neighbor Algorithm
- K-Nearest Neighbor is a supervised learning technique
# - K-NN algorithm follow one basic rule is similar things are near to each other
- It is also called a lazy learner algorithm because it does not learn from the training set immediately
  - At the time of training phase this algorithm just stores the dataset
  - Whenever we get new data point then it classifies the category 

## 2 How it works.?
- It simply calculates the distance of new data point to all other training data points
- The distance can be of any type e.g Euclidean or Manhattan etc.
- It selects the K-nearest data points
- Finally it assigns the data points to the class to which the majority of the K data points belong

## 3 Scenario
- Suppose you have a dataset with two variables , which when plotted looks like the one in the following figure.
- Our task is to classify a new data points with 'X' into "Blue" class or "Red" class.
- suppose the value of K is 3
- The KNN algorithm starts by calculating the distance of points X from all the points
- If then finds the 3 nearest points with least distance to points X.
- This is shown in the figure below; the three nearest points have been encircled

## 4 Use case 
- Assuming that Abhi had a hobby which is interested in distinguishing the species of some iris flower that he has found
- He has collectes some measurements associated with each iris , which are
 - The length and width of the petals
 - The length and width of the sepals , all measured in centimetres
- She also has the measurements of some irises that have been previously identifired to the species
  - setosa
  - versicolor
  - virgincia
- The goal is to create a machine learning model that can learn from the measurements of these irises whose species are already known
- So that we can predict the species for the new irises that she has found 

In [1]:
# Loading the Dataset

from sklearn.datasets import load_iris

iris=load_iris()

print(dir(iris))

['DESCR', 'data', 'data_module', 'feature_names', 'filename', 'frame', 'target', 'target_names']


In [3]:
# Displaying feature name

from sklearn.datasets import load_iris

iris=load_iris()

print(iris.feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [4]:
# Displaying target name 

from sklearn.datasets import load_iris

iris=load_iris()

print(iris.target_names)

['setosa' 'versicolor' 'virginica']


In [5]:
# Data 

from sklearn.datasets import load_iris

iris=load_iris()

print(iris.data)

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.6 1.4 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 4.

In [6]:
# Length of the data


from sklearn.datasets import load_iris

iris=load_iris()

print(len(iris.data))

150


In [8]:
# creating dataframe 


import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()
df=pd.DataFrame(iris.data,columns=iris.feature_names)

print(df)

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                  5.1               3.5                1.4               0.2
1                  4.9               3.0                1.4               0.2
2                  4.7               3.2                1.3               0.2
3                  4.6               3.1                1.5               0.2
4                  5.0               3.6                1.4               0.2
..                 ...               ...                ...               ...
145                6.7               3.0                5.2               2.3
146                6.3               2.5                5.0               1.9
147                6.5               3.0                5.2               2.0
148                6.2               3.4                5.4               2.3
149                5.9               3.0                5.1               1.8

[150 rows x 4 columns]


In [15]:
# WE use this one in realtime

import pandas as pd 

def load_dataset (p) :
    df1=pd.read_csv(p)
    return df1

def util(p):
    print(p.head())
    print()
    print(p.tail())
    print()
    print(p.info())    
    
f="/Users/apple/Downloads/TV_Final.csv"
df2=load_dataset(f)
util(df2)

print()
print(df2.head())


     Brand     Resolution  Size   Selling Price  Original Price  \
0  TOSHIBA   Ultra HD LED     55          37999           54990   
1     TCL   QLED Ultra HD     55          52999          129990   
2  realme          HD LED     32          13999           17999   
3      Mi          HD LED     32          14999           19999   
4  realme          HD LED     32          12999           21999   

  Operating System  Rating  
0            VIDAA     4.3  
1          Android     4.4  
2          Android     4.3  
3          Android     4.4  
4          Android     4.3  

     Brand    Resolution  Size   Selling Price  Original Price  \
907  SONY    Full HD LED     43          44999           57900   
908  SONY    Full HD LED     40          41499           51900   
909  SONY   Ultra HD LED     65         149990          184990   
910  SONY         HD LED     32          32900           32900   
911  SONY    Full HD LED     43          56900           56900   

    Operating System  Rat

In [16]:
# tareget name 

import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df['target']=iris.target

print(df.head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0       0  
1       0  
2       0  
3       0  
4       0  


In [18]:
## Entering all the setosa

import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

print(df[df.target==0].head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0       0  
1       0  
2       0  
3       0  
4       0  


In [19]:
import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

print(df[df.target==1].head())

    sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
50                7.0               3.2                4.7               1.4   
51                6.4               3.2                4.5               1.5   
52                6.9               3.1                4.9               1.5   
53                5.5               2.3                4.0               1.3   
54                6.5               2.8                4.6               1.5   

    target  
50       1  
51       1  
52       1  
53       1  
54       1  


In [20]:
import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

print(df[df.target==2].head())

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
100                6.3               3.3                6.0               2.5   
101                5.8               2.7                5.1               1.9   
102                7.1               3.0                5.9               2.1   
103                6.3               2.9                5.6               1.8   
104                6.5               3.0                5.8               2.2   

     target  
100       2  
101       2  
102       2  
103       2  
104       2  


In [22]:
import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

print(len(df[df.target==0]))

50


In [23]:
import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

df['folower_name']=df.target.apply(lambda x: iris.target_names[x])
print(df)

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                  5.1               3.5                1.4               0.2   
1                  4.9               3.0                1.4               0.2   
2                  4.7               3.2                1.3               0.2   
3                  4.6               3.1                1.5               0.2   
4                  5.0               3.6                1.4               0.2   
..                 ...               ...                ...               ...   
145                6.7               3.0                5.2               2.3   
146                6.3               2.5                5.0               1.9   
147                6.5               3.0                5.2               2.0   
148                6.2               3.4                5.4               2.3   
149                5.9               3.0                5.1               1.8   

     target folower_name  


In [26]:
import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

df['folower_name']=df.target.apply(lambda x: iris.target_names[x])


setosa_50=df[:50]
print(setosa_50)

    sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                 5.1               3.5                1.4               0.2   
1                 4.9               3.0                1.4               0.2   
2                 4.7               3.2                1.3               0.2   
3                 4.6               3.1                1.5               0.2   
4                 5.0               3.6                1.4               0.2   
5                 5.4               3.9                1.7               0.4   
6                 4.6               3.4                1.4               0.3   
7                 5.0               3.4                1.5               0.2   
8                 4.4               2.9                1.4               0.2   
9                 4.9               3.1                1.5               0.1   
10                5.4               3.7                1.5               0.2   
11                4.8               3.4 

In [30]:
import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

df['folower_name']=df.target.apply(lambda x: iris.target_names[x])


versicolor_50=df[50:100]
print(versicolor_50)

    sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
50                7.0               3.2                4.7               1.4   
51                6.4               3.2                4.5               1.5   
52                6.9               3.1                4.9               1.5   
53                5.5               2.3                4.0               1.3   
54                6.5               2.8                4.6               1.5   
55                5.7               2.8                4.5               1.3   
56                6.3               3.3                4.7               1.6   
57                4.9               2.4                3.3               1.0   
58                6.6               2.9                4.6               1.3   
59                5.2               2.7                3.9               1.4   
60                5.0               2.0                3.5               1.0   
61                5.9               3.0 

In [32]:
import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

df['folower_name']=df.target.apply(lambda x: iris.target_names[x])


virginica_50=df[100:]
print(virginica_50)

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
100                6.3               3.3                6.0               2.5   
101                5.8               2.7                5.1               1.9   
102                7.1               3.0                5.9               2.1   
103                6.3               2.9                5.6               1.8   
104                6.5               3.0                5.8               2.2   
105                7.6               3.0                6.6               2.1   
106                4.9               2.5                4.5               1.7   
107                7.3               2.9                6.3               1.8   
108                6.7               2.5                5.8               1.8   
109                7.2               3.6                6.1               2.5   
110                6.5               3.2                5.1               2.0   
111                6.4      

In [38]:
# training the model

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import  train_test_split

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

df['folower_name']=df.target.apply(lambda x: iris.target_names[x])

#print(df.head())

X=df.drop(["folower_name","target"],axis="columns")
y=df.target

X_trian,X_test,y_train,y_test=train_test_split(X
                                               ,y
                                               ,train_size=0.2
                                               ,random_state=2)

print("Splitting the dataset")


Splitting the dataset


In [40]:
# model creation

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import  train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

df['folower_name']=df.target.apply(lambda x: iris.target_names[x])

#print(df.head())

X=df.drop(["folower_name","target"],axis="columns")
y=df.target

X_trian,X_test,y_train,y_test=train_test_split(X
                                               ,y
                                               ,train_size=0.2
                                               ,random_state=2)


model=KNeighborsClassifier(n_neighbors=5)

In [42]:
# model training

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import  train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

df['folower_name']=df.target.apply(lambda x: iris.target_names[x])

#print(df.head())

X=df.drop(["folower_name","target"],axis="columns")
y=df.target

X_train,X_test,y_train,y_test=train_test_split(X
                                               ,y
                                               ,train_size=0.2
                                               ,random_state=2)


model=KNeighborsClassifier(n_neighbors=5)

model.fit(X_train,y_train)

0,1,2
,"n_neighbors  n_neighbors: int, default=5 Number of neighbors to use by default for :meth:`kneighbors` queries.",5
,"weights  weights: {'uniform', 'distance'}, callable or None, default='uniform' Weight function used in prediction. Possible values: - 'uniform' : uniform weights. All points in each neighborhood  are weighted equally. - 'distance' : weight points by the inverse of their distance.  in this case, closer neighbors of a query point will have a  greater influence than neighbors which are further away. - [callable] : a user-defined function which accepts an  array of distances, and returns an array of the same shape  containing the weights. Refer to the example entitled :ref:`sphx_glr_auto_examples_neighbors_plot_classification.py` showing the impact of the `weights` parameter on the decision boundary.",'uniform'
,"algorithm  algorithm: {'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto' Algorithm used to compute the nearest neighbors: - 'ball_tree' will use :class:`BallTree` - 'kd_tree' will use :class:`KDTree` - 'brute' will use a brute-force search. - 'auto' will attempt to decide the most appropriate algorithm  based on the values passed to :meth:`fit` method. Note: fitting on sparse input will override the setting of this parameter, using brute force.",'auto'
,"leaf_size  leaf_size: int, default=30 Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.",30
,"p  p: float, default=2 Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. This parameter is expected to be positive.",2
,"metric  metric: str or callable, default='minkowski' Metric to use for distance computation. Default is ""minkowski"", which results in the standard Euclidean distance when p = 2. See the documentation of `scipy.spatial.distance `_ and the metrics listed in :class:`~sklearn.metrics.pairwise.distance_metrics` for valid metric values. If metric is ""precomputed"", X is assumed to be a distance matrix and must be square during fit. X may be a :term:`sparse graph`, in which case only ""nonzero"" elements may be considered neighbors. If metric is a callable function, it takes two arrays representing 1D vectors as inputs and must return one value indicating the distance between those vectors. This works for Scipy's metrics, but is less efficient than passing the metric name as a string.",'minkowski'
,"metric_params  metric_params: dict, default=None Additional keyword arguments for the metric function.",
,"n_jobs  n_jobs: int, default=None The number of parallel jobs to run for neighbors search. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details. Doesn't affect :meth:`fit` method.",


In [43]:
# checking the scores

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import  train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

df['folower_name']=df.target.apply(lambda x: iris.target_names[x])

#print(df.head())

X=df.drop(["folower_name","target"],axis="columns")
y=df.target

X_train,X_test,y_train,y_test=train_test_split(X
                                               ,y
                                               ,train_size=0.2
                                               ,random_state=2)


model=KNeighborsClassifier(n_neighbors=5)

model.fit(X_train,y_train)

model.score(X_test,y_test)

0.9

In [46]:
# making thr prediction 

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import  train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

df['folower_name']=df.target.apply(lambda x: iris.target_names[x])

#print(df.head())

X=df.drop(["folower_name","target"],axis="columns").values
y=df.target.values

X_train,X_test,y_train,y_test=train_test_split(X
                                               ,y
                                               ,train_size=0.2
                                               ,random_state=2)


model=KNeighborsClassifier(n_neighbors=5)

model.fit(X_train,y_train)

model.predict([[4.3,3.0,1.5,0.3]])

array([0])

In [47]:
# model prediction

# checking the scores

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import  train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

df['folower_name']=df.target.apply(lambda x: iris.target_names[x])

#print(df.head())

X=df.drop(["folower_name","target"],axis="columns")
y=df.target

X_train,X_test,y_train,y_test=train_test_split(X
                                               ,y
                                               ,train_size=0.2
                                               ,random_state=2)


model=KNeighborsClassifier(n_neighbors=5)

model.fit(X_train,y_train)

y_pred=model.predict(X_test)
print(y_pred)

[0 0 1 0 0 2 0 2 2 0 0 0 0 0 1 1 0 1 2 1 2 1 2 1 1 0 0 1 0 2 1 0 1 2 1 0 2
 1 1 2 1 1 2 1 0 2 0 1 0 0 0 1 2 2 0 2 2 2 1 0 0 1 1 1 2 1 1 0 1 0 2 1 1 0
 1 1 1 2 0 1 0 1 2 0 1 0 0 0 2 2 0 0 1 2 1 2 1 1 2 0 2 2 2 0 1 0 0 1 2 1 2
 1 1 2 1 1 1 1 1 1]


In [52]:
# Assistant
# Model evalution

# checking the scores

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix  # Added import for confusion_matrix

iris=load_iris()

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

df['folower_name']=df.target.apply(lambda x: iris.target_names[x])

#print(df.head())

X=df.drop(["folower_name","target"],axis="columns")
y=df.target

X_train,X_test,y_train,y_test=train_test_split(X
                                               ,y
                                               ,train_size=0.2
                                               ,random_state=2)


model=KNeighborsClassifier(n_neighbors=5)

model.fit(X_train,y_train)

y_pred=model.predict(X_test)

print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

[[39  0  0]
 [ 0 38  1]
 [ 0 11 31]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        39
           1       0.78      0.97      0.86        39
           2       0.97      0.74      0.84        42

    accuracy                           0.90       120
   macro avg       0.91      0.90      0.90       120
weighted avg       0.92      0.90      0.90       120

