# KNN for Regression – Implementation Guidelines  

## Steps  

1. **Import Dataset**  
   - Load the dataset into Python.  

2. **Separate Features and Target**  
   - Features (X): `Gender`, `Height`  
   - Target (Y): `Weight`  

3. **Split Dataset**  
   - Training set: **70%**  
   - Testing set: **30%**  

4. **Apply Linear Regression**  
   - Train the model on the training set.  
   - Predict on the testing set.  

5. **Evaluate Linear Regression Model**  
   - Training Accuracy  
   - Testing Accuracy  
   - Mean Squared Error (MSE) for testing set  

6. **Apply KNN Regressor**  
   - Use **[Scikit-Learn’s](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html) KNN Regressor** implementation.  
   - Train the model on the training set.  
   - Predict on the testing set.  

7. **Evaluate KNN Model**  
   - Training Accuracy  
   - Testing Accuracy  
   - Mean Squared Error (MSE) for testing set  

8. **Compare Models**  
   - Compare **KNN Regressor** and **Linear Regression** results.  
   - Discuss which model performs better.  


In [90]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import warnings as wr
wr.filterwarnings('ignore')

In [92]:
df=pd.read_csv('weight-height.csv')
df.head()

Unnamed: 0,Gender,Height,Weight
0,Male,73.847017,241.893563
1,Male,68.781904,162.310473
2,Male,74.110105,212.740856
3,Male,71.730978,220.04247
4,Male,69.881796,206.349801


In [94]:
encoder = LabelEncoder()
df['Gender'] = encoder.fit_transform(df['Gender'])
df

Unnamed: 0,Gender,Height,Weight
0,1,73.847017,241.893563
1,1,68.781904,162.310473
2,1,74.110105,212.740856
3,1,71.730978,220.042470
4,1,69.881796,206.349801
...,...,...,...
8550,0,60.483946,110.565497
8551,0,63.423372,129.921671
8552,0,65.584057,155.942671
8553,0,67.429971,151.678405


In [96]:
x=df.drop(['Weight'], axis=1)
x.head()

Unnamed: 0,Gender,Height
0,1,73.847017
1,1,68.781904
2,1,74.110105
3,1,71.730978
4,1,69.881796


In [98]:
y = df[['Weight']]
y.head()

Unnamed: 0,Weight
0,241.893563
1,162.310473
2,212.740856
3,220.04247
4,206.349801


In [100]:
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
reg=linear_model.LinearRegression()

In [102]:
x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.3, random_state=42)  

In [104]:
reg.fit(x_train, y_train)

In [106]:
y_predRg = reg.predict(x_test)


In [108]:
reg.score(x_train, y_train)

0.8973793060969246

In [110]:
reg.score(x_test, y_test)

0.905911242442266

In [112]:
mse = mean_squared_error(y_test, y_predRg)
print('MSE:', mse)

MSE: 96.83734437830608


In [114]:
from sklearn.neighbors import KNeighborsRegressor
neigh = KNeighborsRegressor(n_neighbors=5)

In [116]:
neigh.fit(x_train, y_train)

In [118]:
ypredKNN=neigh.predict(x_test)
ypredKNN

array([[142.1445657 ],
       [181.38276924],
       [187.07075266],
       ...,
       [101.79654426],
       [192.43702356],
       [145.3632227 ]])

In [120]:
mse = mean_squared_error(y_test, ypredKNN)
print('MSE:', mse)

MSE: 121.33528273624482


In [122]:
neigh.score(x_train, y_train)

0.9172142723737918

In [176]:
if mean_squared_error(y_test, y_predRg) < mean_squared_error(y_test, ypredKNN):
    print("\nLinear Regression performed better.")
else:
    print("\nKNN Regressor performed better.")


Linear Regression performed better.
