## KNN
The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning algorithm used for classification and regression. However, it is more commonly used for classification tasks.

KNN classifies a new data point based on the majority class among its k nearest neighbors in the training data.



#### HOW IT WORKS 

Choose K (number of neighbors).

Calculate distance (e.g., Euclidean) from the new point to all training points.

Select K closest points (neighbors).

Vote for the most common label (classification) or average the values (regression).

Assign the result to the new data point.



## tips dataset

In [7]:
import numpy as np 
import pandas as pd 


In [8]:
df = pd.read_csv("C:\\Users\\HP\\OneDrive\\Desktop\\DATASET\\covid_toy - covid_toy.csv")

In [9]:
df.head(2)

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,Male,103.0,Mild,Kolkata,No
1,27,Male,100.0,Mild,Delhi,Yes


In [10]:
df.isnull().sum()

age           0
gender        0
fever        10
cough         0
city          0
has_covid     0
dtype: int64

In [13]:
df = df.dropna()

In [14]:
df.isnull().sum()

age          0
gender       0
fever        0
cough        0
city         0
has_covid    0
dtype: int64

In [15]:
from sklearn.preprocessing import LabelEncoder 

In [16]:
lb = LabelEncoder()

In [17]:
df['gender'] = lb.fit_transform(df['gender'])
df['cough'] = lb.fit_transform(df['cough'])
df['city'] = lb.fit_transform(df['city'])
df['has_covid'] = lb.fit_transform(df['has_covid'])

In [18]:
df.head(2)

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,1,103.0,0,2,0
1,27,1,100.0,0,1,1


In [20]:
x = df.drop(columns = ['has_covid'])
y = df['has_covid']

In [22]:
from sklearn.model_selection import train_test_split

In [23]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)

In [56]:
from sklearn.neighbors import KNeighborsClassifier ## Importing query for knn

In [57]:
knn = KNeighborsClassifier(n_neighbors = 5)  ## train the data 
knn.fit(x_train , y_train)

In [27]:
y_pred = knn.predict(x_test) 

In [58]:
from sklearn.metrics import accuracy_score  # accuracy check 


In [29]:
accuracy_score(y_test , y_pred)

0.5

In [30]:
df.shape

(90, 6)

## social network dataset 

In [34]:
df = pd.read_csv("C:\\Users\\HP\\OneDrive\\Desktop\\DATASET\\Social_Network_Ads - Social_Network_Ads.csv")

In [35]:
df.head(2)

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0


In [36]:
df.dropna()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0
...,...,...,...,...,...
395,15691863,Female,46,41000,1
396,15706071,Male,51,23000,1
397,15654296,Female,50,20000,1
398,15755018,Male,36,33000,0


In [37]:
df.isnull().sum()

User ID            0
Gender             0
Age                0
EstimatedSalary    0
Purchased          0
dtype: int64

In [42]:
from sklearn.preprocessing import LabelEncoder 

In [43]:
lb = LabelEncoder()

In [46]:
df['Gender'] = lb.fit_transform(df['Gender'])


In [47]:
df.head(2)

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,1,19,19000,0
1,15810944,1,35,20000,0


In [48]:
x = df.drop(columns = ['Purchased'])
y = df['Purchased']

In [49]:
from sklearn.model_selection import train_test_split

In [50]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)

In [51]:
from sklearn.neighbors import KNeighborsClassifier

In [52]:
knn = KNeighborsClassifier(n_neighbors = 5)
knn.fit(x_train , y_train)

In [53]:
y_pred = knn.predict(x_test)

In [54]:
from sklearn.metrics import accuracy_score 


In [55]:
accuracy_score(y_test , y_pred)

0.725