# K-Nearest Neighbors (KNN)
The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning algorithm that can be used for both classification and regression tasks. It works by finding the k nearest data points (neighbors) to a given query point and using them to make predictions.

# Workings of KNN algorithm

# Example Dataset

In [5]:
from sklearn import datasets
import pandas as pd
iris = datasets.load_iris()
df = pd.DataFrame(data=iris.data, columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])
X = df[['sepal_length','petal_length']].values  # Features
y = iris.target 

iris_df = pd.DataFrame(data=X, columns=['Sepal Length','petal_length'])
iris_df['Species'] = y

# Let's take 'Setosa' vs. 'Non-setosa' for binary classification
iris_df['Binary Species'] = iris_df['Species'].apply(lambda x: 1 if x == 0 else 0)
iris_df.drop(columns=['Species'], inplace=True)
iris_df.head()

Unnamed: 0,Sepal Length,petal_length,Binary Species
0,5.1,1.4,1
1,4.9,1.4,1
2,4.7,1.3,1
3,4.6,1.5,1
4,5.0,1.4,1


## 1.Data Preparation
A dataset where each data point has features (input variables) and a corresponding label (output variable).

__Example:__ we have a dataset of flowers with two features `sepal_length`, `petal_length`. The labels are types of species like `0` and `1`.

## 2.Choose the value of `K`
`k` is the number of nearest neighbors you consider for making predictions. Select the nearest `odd` number of \$ \sqrt{n} \$ as `k`.

__For instance,__ let's set `k = 5`, which means we will look at the 5 nearest data points to classify a new point.

## 3.Calculate the Distance
To predict the label of a new data point, calculate the distance between this point and every point in the training dataset. Common distance metrics include Euclidean distance (most common), Manhattan distance, etc.

__Example:__ We have a new prediction data \$(X, Y)\$, and we had a dataset of \$(x_1, y_1), (x_2, y_2), (x_3, y_3), \ldots, (x_n, y_n)\$

So, now calculate the distance between \$(X, Y)\$ and each of the data point.

## 4.Sort the distance
Sort the distances in ascending order and identify the first `k` th point 

## 5.Vote for the Majority Class
In classification, each of the `k` nearest neighbors has a label. To determine the label for the new point, take a vote among the labels of these `k` neighbors.

__Example:__
- (5.1, 1.4) labeled 1
- (4.9, 1.4) labeled 1
- (5.0, 1.4) labeled 1
- (6.7,	5.2) labeled 0
- (6.3,	5.0) labeled 0

__Voting:__
- `1`: 3 vote
- `0`: 2 vote

### 5.1 Average the neighbors' values
For regression, the prediction is typically the average of the values (target variable) of the 𝑘 nearest neighbors.

## 6.Assign the Label to the New Data Point

The label with the majority of votes is selected as the predicted label.

__Example:__ The predicted value is `1`