# Chapter 3: Classification

Notebook ini merupakan hasil reproduksi dan penjelasan teori dari **Bab 3 - Classification** dari buku *Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd Edition)* oleh Aurélien Géron.

📌 Fokus utama bab ini adalah membangun classifier menggunakan dataset MNIST dan menjelaskan metrik evaluasi klasifikasi seperti precision, recall, confusion matrix, dan ROC.

---


In [None]:
from sklearn.datasets import fetch_openml
import numpy as np

mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X, y = mnist["data"], mnist["target"]
y = y.astype(np.uint8)
X.shape, y.shape

In [None]:
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]

In [None]:
from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train)

In [None]:
some_digit = X[0]
sgd_clf.predict([some_digit])

In [None]:
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

sgd_clf.fit(X_train, y_train_5)
sgd_clf.predict([some_digit])

In [None]:
from sklearn.model_selection import cross_val_score

cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy")

In [None]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix

y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)
confusion_matrix(y_train_5, y_train_pred)

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

print("Precision:", precision_score(y_train_5, y_train_pred))
print("Recall:", recall_score(y_train_5, y_train_pred))
print("F1 Score:", f1_score(y_train_5, y_train_pred))

## Multiclass Classification (One-vs-All)

SGDClassifier secara default menangani klasifikasi multikelas dengan strategi One-vs-All.
Kita akan melihat contoh klasifikasi angka dan evaluasinya.


In [None]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix

sgd_clf.fit(X_train, y_train)
y_train_pred = cross_val_predict(sgd_clf, X_train, y_train, cv=3)
conf_mx = confusion_matrix(y_train, y_train_pred)
conf_mx

In [None]:
import matplotlib.pyplot as plt
import numpy as np

plt.matshow(conf_mx, cmap=plt.cm.gray)
plt.title("Confusion Matrix")
plt.colorbar()
plt.show()

## Multilabel Classification

Kita dapat memberikan lebih dari satu label per instance. Contohnya, kita klasifikasikan:
- apakah angka besar atau kecil (>=7)
- apakah angka ganjil

Contoh: angka 5 → [False, True]


In [None]:
from sklearn.neighbors import KNeighborsClassifier

y_train_large = (y_train >= 7)
y_train_odd = (y_train % 2 == 1)
y_multilabel = np.c_[y_train_large, y_train_odd]

knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_multilabel)

knn_clf.predict([some_digit])

## ROC Curve dan AUC

Untuk binary classifier, kita bisa plot ROC curve dan hitung AUC score.


In [None]:
from sklearn.metrics import roc_curve, roc_auc_score

y_scores = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3, method="decision_function")
fpr, tpr, thresholds = roc_curve(y_train_5, y_scores)

plt.plot(fpr, tpr)
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.grid()
plt.show()

roc_auc_score(y_train_5, y_scores)