# Regression dengan KNN (K Nearest Neighbors)
- Tidak hanya dapat digunakan di classification tasks, KNN juga dapat digunakan untuk regression tasks

## Sample Dataset
- Tinggi badan dan jenis kelamin akan berperan sebagai feature
- Berat badan akan berperan sebagai target

In [1]:
import pandas as pd
sensus = {'tinggi':[158,170,183,191,155,163,180,158,170],
         'jk':['pria','pria','pria','pria','wanita','wanita','wanita','wanita','wanita'],
         'berat':[64,86,84,80,49,59,67,54,67]}
sensus_df = pd.DataFrame(sensus)
sensus_df

Unnamed: 0,tinggi,jk,berat
0,158,pria,64
1,170,pria,86
2,183,pria,84
3,191,pria,80
4,155,wanita,49
5,163,wanita,59
6,180,wanita,67
7,158,wanita,54
8,170,wanita,67


## Features dan Target

In [3]:
import numpy as np
X_train = np.array(sensus_df[['tinggi','jk']])
y_train = np.array(sensus_df['berat'])

print(f'X_train : {X_train}')
print(f'y_train : {y_train}')

X_train : [[158 'pria']
 [170 'pria']
 [183 'pria']
 [191 'pria']
 [155 'wanita']
 [163 'wanita']
 [180 'wanita']
 [158 'wanita']
 [170 'wanita']]
y_train : [64 86 84 80 49 59 67 54 67]


## Preprocess Dataset : Konversi label menjadi numerik biner
- Proses transposed pada dasarnya akan mengubah posisi baris menjadi kolom dan kolom menjadi baris
- Flatten() digunakan untuk mengubah data 2 dimensi menjadi data 1 dimensi
- Pada kasus kali ini nilai 0 merepresentasikan jk 'pria' dan nilai 1 merepresentasikan nilai 'wanita'

In [4]:
X_train_transposed = np.transpose(X_train)
print(f'X_train : {X_train}')
print(f'X_train_transposed : {X_train_transposed}')

X_train : [[158 'pria']
 [170 'pria']
 [183 'pria']
 [191 'pria']
 [155 'wanita']
 [163 'wanita']
 [180 'wanita']
 [158 'wanita']
 [170 'wanita']]
X_train_transposed : [[158 170 183 191 155 163 180 158 170]
 ['pria' 'pria' 'pria' 'pria' 'wanita' 'wanita' 'wanita' 'wanita'
  'wanita']]


In [5]:
from sklearn.preprocessing import LabelBinarizer
lb = LabelBinarizer()
jk_binarised = lb.fit_transform(X_train_transposed[1])

print(f'jk : {X_train_transposed[1]}')
print(f'jk_binarised : {jk_binarised}')

jk : ['pria' 'pria' 'pria' 'pria' 'wanita' 'wanita' 'wanita' 'wanita' 'wanita']
jk_binarised : [[0]
 [0]
 [0]
 [0]
 [1]
 [1]
 [1]
 [1]
 [1]]


In [6]:
jk_binarised = jk_binarised.flatten()
jk_binarised

array([0, 0, 0, 0, 1, 1, 1, 1, 1])

In [7]:
X_train_transposed[1] = jk_binarised
X_train = X_train_transposed.transpose()

print(f'X_train_transposed : {X_train_transposed}')
print(f'X_train : {X_train}')

X_train_transposed : [[158 170 183 191 155 163 180 158 170]
 [0 0 0 0 1 1 1 1 1]]
X_train : [[158 0]
 [170 0]
 [183 0]
 [191 0]
 [155 1]
 [163 1]
 [180 1]
 [158 1]
 [170 1]]


## Training KNN Regression Model
- Model machine learning yang digunakan untuk regression sering kali dinamakan regressor
- Model machine learning yang digunakna untuk classification sering kali dinamakan classifier

In [10]:
from sklearn.neighbors import KNeighborsRegressor
K =3
model = KNeighborsRegressor(n_neighbors=K)
model.fit(X_train,y_train)

KNeighborsRegressor(n_neighbors=3)

## Prediksi Berat Badan 

In [15]:
X_new = np.array([155,1]).reshape(1,-1)
X_new

array([[155,   1]])

In [16]:
y_pred = model.predict(X_new)
y_pred

array([55.66666667])

## Evaluasi KNN Regression Model

In [17]:
X_test = np.array([[168,0],[180,0],[160,1],[169,1]])
y_test = np.array([65,96,52,67])
print(f'X_test : {X_test}')
print(f'y_test : {y_test}')

X_test : [[168   0]
 [180   0]
 [160   1]
 [169   1]]
y_test : [65 96 52 67]


In [18]:
y_pred = model.predict(X_test)
y_pred

array([70.66666667, 79.        , 59.        , 70.66666667])

## Coefficient of Determination ($R^{2}$)

In [19]:
from sklearn.metrics import r2_score
r_squared = r2_score(y_test,y_pred)
print(f'R_Squared : {r_squared}')

R_Squared : 0.6290565226735438
