<a href="https://colab.research.google.com/github/cornflake15/data-course/blob/mining/data-mining/classification/k-nn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classification

## K-Nearest Neighbors

Kode program berikut mengimplementasikan algoritma K-NN dengan library [scikit-learn](https://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html#sphx-glr-auto-examples-neighbors-plot-classification-py).

### 1. Import library yang diperlukan

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets
from sklearn.neighbors import DistanceMetric

### 2. Load Dataset

Di sini kita akan menggunakan dataset bunga Iris.

![Iris](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Machine+Learning+R/iris-machinelearning.png). 

In [None]:
iris = datasets.load_iris()

Penjelasan mengenai dataset Iris

In [None]:
iris.DESCR

### 3. Pisahkan data untuk training dan untuk testing.

Pada bagian ini kita hanya menggunakan dua fitur dari bunga Iris saja.

In [None]:
X = iris.data[:, :2]
y = iris.target

#### 3.1 Fitur

In [None]:
print(X[:5])   # Tampilkan lima baris pertama

#### 3.2 Label

In [None]:
print(y[:5])    # Tampilkan lima baris pertama

### 4. Inisialisasi hyper-parameter

In [None]:
jumlah_neighbors = 15
weights = ['uniform', 'distance']
metrics = ['euclidean', 'manhattan', 'chebyshev', 'minkowski', 
          'wminkowski', 'seuclidean', 'mahalanobis']

### 5. Modeling dengan K-NN

K-NN diimplementasikan dengan kelas `neighbors.KNeighborsClassifier()` pada scikit-learn.

In [None]:
knn_classifier = neighbors.KNeighborsClassifier(jumlah_neighbors, 
                                                weights=weights[0], # weights = 'uniform'
                                                metric=metrics[0])  # metrics = 'euclidian'
knn_classifier.fit(X, y)    # Fit data training dan testing ke classifier

In [None]:
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
h = .02
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Lakukan prediksi
Z = knn_classifier.predict(np.c_[xx.ravel(), yy.ravel()])

Visualisasi

In [None]:
cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ListedColormap(['darkorange', 'c', 'darkblue'])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, shading='auto', cmap=cmap_light)

# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold, edgecolor='k', s=20)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title('3-Class classification')

plt.show()

Bagaimana jika menggunakan `weight='distance'` pada saat pendefinisian `neighbors.KNeighborsClassifier()`.

In [None]:
X = iris.data[:, :2]
y = iris.target

knn_classifier = neighbors.KNeighborsClassifier(jumlah_neighbors, 
                                                weights=weights[1], # weights = 'distance'
                                                metric=metrics[0])  # metrics = 'euclidian'
knn_classifier.fit(X, y)

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = knn_classifier.predict(np.c_[xx.ravel(), yy.ravel()])

Visualisasi

In [None]:
cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ListedColormap(['darkorange', 'c', 'darkblue'])
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, shading='auto', cmap=cmap_light)

# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold, edgecolor='k', s=20)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title('3-Class classification')

plt.show()

Penjelasan mengenai visualisasi.

![viz](img/knn-1.png)

### 6. Menggunakan seluruh fitur dalam dataset Iris.

Jika sebelumnya kita hanya menggunakan dua fitur, sedangkan di bagian ini kita akan menggunakan seluruh fitur yang ada.

In [None]:
X = iris.data
y = iris.target
h = .02

#### 6.1 DataFrame Format

Kita masukkan dataset dari scikit-learn ke dalam format struktur data DataFrame dari library **pandas**.

In [None]:
import pandas as pd

X = pd.DataFrame(iris.data, columns=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'])
y = pd.DataFrame(iris.target, columns=['Class'])

In [None]:
X.head()    # Tampilkan sebagian data

In [None]:
y.head()

Seperti terlihat pada tabel di atas, dataset disajikan dalam bentuk tabel daripada bentuk array/list sehingga lebih mudah dibaca dan diinterpretasikan.

#### 6.2 Bagi dataset menjadi data train dan data test

Proses pembagian dataset menggunakan method `train_test_split()` dari library scikit-learn.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X[['sepal length (cm)', 'sepal width (cm)', 
                                                       'petal length (cm)', 'petal width (cm)']],
                                                    y['Class'], random_state=111)
print('X_train shape: {} \n y_train shape: {}'.format(X_train.shape, y_train.shape))
print('X_test shape: {} \n y_test shape: {}'.format(X_test.shape, y_test.shape))

#### 6.3 Training Data

In [None]:
knn_classifier = neighbors.KNeighborsClassifier(n_neighbors=jumlah_neighbors)
knn_classifier.fit(X_train, y_train)

#### 6.4 Prediksi pada data test

In [None]:
y_pred = knn_classifier.predict(X_test)

#### 6.5 Tampilkan hasil prediksi dalam tabel

In [None]:
pd.concat([X_test, y_test, pd.Series(y_pred, name='Predicted', index=X_test.index)],
          ignore_index=False, axis=1).head()

#### 6.6 Nilai Prediksi

In [None]:
print('Nilai prediksi: {}'.format(knn_classifier.score(X_test, y_test)))