<a href="https://colab.research.google.com/github/koushik2299/Data-Science/blob/main/KNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# K-Nearest Neighbors

K-Nearest Neighbors (KNN) is a non-parametric machine learning model, used for both classification and regression tasks. It predicts the outcome for a new data point based on the outcomes of its 'k' nearest neighbors.

## Pros

- **Non-Parametric Model**: KNN makes no assumptions about the underlying data distribution, making it suitable for complex real-world scenarios.
- **Flexibility with Data Distribution**: The algorithm does not require prior knowledge of data distribution, accommodating a wide variety of data types.
- **Handles Non-Linear Relationships**: KNN can effectively capture non-linear relationships between the target and independent variables by relying on the proximity between data points.

## Cons

- **Computationally Expensive**: KNN's need to compute distances to all training instances during prediction makes it inefficient for large datasets.
- **Lazy Model**: The algorithm performs all computations at prediction time, leading to potential delays in generating outcomes.
- **Sensitivity to the Scale of Data**: The performance of KNN can be significantly affected by the scale of features, necessitating normalization or standardization.
- **Memory Intensive**: Storing the entire dataset for prediction purposes can lead to high memory usage, especially with large datasets.
- **Sensitive to Irrelevant Features**: KNN might perform poorly if the dataset contains irrelevant or redundant features, as it does not inherently differentiate between feature relevancies.

## Example Usage

- **Disease Prediction**: Utilizing KNN for predicting the presence of a disease based on similarities to historical cases.
- **Movie Recommendation**: Recommending movies by finding either similar users or similar movies based on preferences or ratings.
- **Credit Card Transactions**: Detecting fraudulent credit card activity by identifying transactions that are similar to known cases of fraud.


In [7]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

In [2]:
import pandas as pd

df = pd.read_csv("/content/Classified Data (1).txt")

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,WTT,PTI,EQW,SBI,LQE,QWG,FDJ,PJF,HQE,NXJ,TARGET CLASS
0,0,0.913917,1.162073,0.567946,0.755464,0.780862,0.352608,0.759697,0.643798,0.879422,1.231409,1
1,1,0.635632,1.003722,0.535342,0.825645,0.924109,0.64845,0.675334,1.013546,0.621552,1.492702,0
2,2,0.72136,1.201493,0.92199,0.855595,1.526629,0.720781,1.626351,1.154483,0.957877,1.285597,0
3,3,1.234204,1.386726,0.653046,0.825624,1.142504,0.875128,1.409708,1.380003,1.522692,1.153093,1
4,4,1.279491,0.94975,0.62728,0.668976,1.232537,0.703727,1.115596,0.646691,1.463812,1.419167,1


In [5]:
df.drop("Unnamed: 0",axis=1,inplace=True)

In [6]:
df.head()

Unnamed: 0,WTT,PTI,EQW,SBI,LQE,QWG,FDJ,PJF,HQE,NXJ,TARGET CLASS
0,0.913917,1.162073,0.567946,0.755464,0.780862,0.352608,0.759697,0.643798,0.879422,1.231409,1
1,0.635632,1.003722,0.535342,0.825645,0.924109,0.64845,0.675334,1.013546,0.621552,1.492702,0
2,0.72136,1.201493,0.92199,0.855595,1.526629,0.720781,1.626351,1.154483,0.957877,1.285597,0
3,1.234204,1.386726,0.653046,0.825624,1.142504,0.875128,1.409708,1.380003,1.522692,1.153093,1
4,1.279491,0.94975,0.62728,0.668976,1.232537,0.703727,1.115596,0.646691,1.463812,1.419167,1


In [9]:
X = df.drop('TARGET CLASS',axis=1)
y = df['TARGET CLASS']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=23)

In [11]:
from sklearn.preprocessing import StandardScaler
std = StandardScaler()

In [12]:
X_train = std.fit_transform(X_train)
X_test = std.fit(X_test)