### **Step 1:** Load MNIST Dataset

We’re using the classic MNIST handwritten digits dataset (70,000 grayscale 28x28 images).

- `X`: Features (784-pixel flattened vectors)
- `y`: Labels (strings: '0'–'9')

In [2]:
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X, y = mnist.data, mnist.target

### **Step 2:** Train-Test Split

We stratify the split to keep digit proportions consistent between train and test.

- `X_train`, `y_train`: Used for training our multilabel model
- `X_test`, `y_test`: For evaluation later


In [3]:
from sklearn.model_selection import train_test_split

# Convert labels to strings (just in case) and split
y = y.astype(str)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=10000, random_state=42, stratify=y)

### **Step 3:** Define Multilabel Targets

Each digit should have *two* labels:
- `Is large?`: 1 if digit ≥ 7 (large), else 0
- `Is odd?`: 1 if digit is odd, else 0

We combine them into a 2D array `y_multilabel`:
- Shape: (n_samples, 2)
- Example row: `[True, False]` → Large but even

In [5]:
import numpy as np

# Label 1: "Is large?" (True for 7, 8, 9)
y_train_large = (y_train.astype(int) >= 7)

# Label 2: "Is odd?" (True for 1, 3, 5, 7, 9)
y_train_odd = (y_train.astype(int) % 2 == 1)

# Combine into multilabel targets
y_multilabel = np.c_[y_train_large, y_train_odd]

### **Step 4:** Train a Multilabel KNN Classifier

We use K-Nearest Neighbors (KNN) which supports multilabel classification natively.
- It treats each label independently and performs majority voting for each.

In [6]:
from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_multilabel)

0,1,2
,n_neighbors,5
,weights,'uniform'
,algorithm,'auto'
,leaf_size,30
,p,2
,metric,'minkowski'
,metric_params,
,n_jobs,


### **Step 5:** Predict Multilabel Outputs

We pass a single digit (e.g., `X_train[0]`) to the trained KNN classifier.

Expected output:
- `[False, True]` → Not large, but odd (e.g., digit 5)

In [18]:
some_digit = X_train[0]
some_digit2 = X_train[1]

print(f"Prediction no.1: {knn_clf.predict([some_digit])}")
print(f"Prediction no.2: {knn_clf.predict([some_digit2])}")

Prediction no.1: [[False False]]
Prediction no.2: [[ True  True]]


### **Step 6:** Evaluate with F1 Score (Macro Average)

F1-score balances precision and recall. With multilabel, we compute it per label, then average.

- `average="macro"`: All labels are weighted equally.
- Useful when labels have different frequencies.

In [8]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import f1_score

# Get cross-validated predictions
y_train_knn_pred = cross_val_predict(knn_clf, X_train, y_multilabel, cv=3)

# Macro F1 Score (equal weight to both labels)
f1_score(y_multilabel, y_train_knn_pred, average="macro")

0.9770714128542545

### **Step 7:** Use ClassifierChain

`ClassifierChain` lets you:
- Train one binary classifier per label
- Feed earlier predictions as features to the next label model

This helps when labels are **correlated**, e.g. large digits are more often odd (like 7 or 9).

In [25]:
from sklearn.multioutput import ClassifierChain
from sklearn.svm import SVC

chain_clf = ClassifierChain(SVC(), random_state=42)
chain_clf.fit(X_train[:2000], y_multilabel[:2000])

0,1,2
,estimator,SVC()
,order,
,cv,
,chain_method,'predict'
,random_state,42
,verbose,False
,base_estimator,'deprecated'

0,1,2
,C,1.0
,kernel,'rbf'
,degree,3
,gamma,'scale'
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


### **Step 8:** Predict with Chain Classifier

This will also return `[0., 1.]` or similar, but decisions may differ from KNN because:
- SVC uses decision boundaries
- ClassifierChain models label relationships

In [24]:
print(f"Prediction no.1: {chain_clf.predict([some_digit])}")
print(f"Prediction no.2: {chain_clf.predict([some_digit2])}")

Prediction no.1: [[0. 0.]]
Prediction no.2: [[1. 0.]]
