# 📘 Chapter 3: Classification
This notebook is a structured implementation and theoretical explanation based on Chapter 3 of *Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow*. It introduces classification tasks using the MNIST dataset.

## 📦 Import Libraries & Load Dataset

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X, y = mnist["data"], mnist["target"]
y = y.astype(np.uint8)  # convert target to integer
X.shape, y.shape


## 🔍 Visualize Sample Digits
**Theory**: Visualizing data is a critical step in understanding the task and ensuring data integrity. Here we reshape one of the flat 784-length feature vectors into a 28x28 image.

In [None]:
some_digit = X[0]
some_digit_image = some_digit.reshape(28, 28)

plt.imshow(some_digit_image, cmap="binary")
plt.axis("off")
plt.title(f"Label: {y[0]}")
plt.show()


## 🧮 Training a Binary Classifier
**Theory**: We will train a classifier to detect only the digit 5 (vs. not 5). This is a binary classification problem.

In [None]:
from sklearn.linear_model import SGDClassifier

X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train_5)
sgd_clf.predict([some_digit])


## 📈 Evaluating Accuracy with Cross-Validation
**Theory**: Accuracy can be misleading in imbalanced datasets. We begin by measuring cross-validation accuracy.

In [None]:
from sklearn.model_selection import cross_val_score

cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy")


## 🤖 Dummy Classifier (Never 5)
**Theory**: A naive baseline helps reveal the limitations of simple accuracy as a metric.

In [None]:
from sklearn.base import BaseEstimator

class Never5Classifier(BaseEstimator):
    def fit(self, X, y=None): return self
    def predict(self, X): return np.zeros((len(X),), dtype=bool)

never_5_clf = Never5Classifier()
cross_val_score(never_5_clf, X_train, y_train_5, cv=3, scoring="accuracy")


## 🔢 Confusion Matrix and Precision/Recall
**Theory**: A confusion matrix provides detailed insight into the types of prediction errors. Precision and recall offer better measures of performance for imbalanced data.

In [None]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score

y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)
confusion_matrix(y_train_5, y_train_pred)
precision_score(y_train_5, y_train_pred), recall_score(y_train_5, y_train_pred), f1_score(y_train_5, y_train_pred)
