### **Load and Test-Train Split the Data**

In [2]:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', as_frame=False)
X, y = mnist.data, mnist.target
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

---

### Step 1: Train a Multiclass Classifier
Use a classifier like `SGDClassifier` which supports **OvR** by default:
> At this point, 10 binary classifiers are trained internally — one for each digit.

In [3]:
from sklearn.linear_model import SGDClassifier
sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train)

0,1,2
,loss,'hinge'
,penalty,'l2'
,alpha,0.0001
,l1_ratio,0.15
,fit_intercept,True
,max_iter,1000
,tol,0.001
,shuffle,True
,verbose,0
,epsilon,0.1


### Step 2: Make Predictions
Choose a sample and predict its class:

> This will return a string from '0' to '9'.

In [19]:
# input sample
some_digit = X_train[0]

# actual label of this digit
actual_label = y_train[0]

# prediction using the SGD classifier
predicted_label = sgd_clf.predict([some_digit])[0]

# Compare
print(f"Predicted: {predicted_label}, Actual: {actual_label}")
print("✅ Correct!" if predicted_label == actual_label else "❌ Incorrect.")


Predicted: 3, Actual: 5
❌ Incorrect.


See the **raw decision** scores for all 10 classes:

> Higher scores mean more confidence. The class with the highest score is the prediction (in this case, its `3`).

In [6]:
sgd_clf.decision_function([some_digit])

array([[-31893.03095419, -34419.69069632,  -9530.63950739,
          1823.73154031, -22320.14822878,  -1385.80478895,
        -26188.91070951, -16147.51323997,  -4604.35491274,
        -12050.767298  ]])

### Step 3: Evaluate Using Cross-Validation
* Check model performance across folds:
> Accuracy works well here because MNIST is balanced across all 10 digits.

In [8]:
from sklearn.model_selection import cross_val_score
cross_val_score(sgd_clf, X_train, y_train, cv=3, scoring="accuracy")

array([0.87365, 0.85835, 0.8689 ])

### Step 4: Improve Accuracy with Feature Scaling
> Scaling improves convergence and accuracy. We see an improvement in performance here.

In [9]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train.astype("float64"))
# Evaluate again using scaled data
cross_val_score(sgd_clf, X_train_scaled, y_train, cv=3, scoring="accuracy")

array([0.8983, 0.891 , 0.9018])

### Step 5: Try One-vs-One (OvO) Strategy with SVC
- **OvO** trains a classifier for every pair of classes (0 vs 1, 0 vs 2, ..., 8 vs 9).

> That’s 45 classifiers total for 10 classes.

In [10]:
from sklearn.multiclass import OneVsOneClassifier
from sklearn.svm import SVC

# SVC uses OvO internally by default, but we explicitly wrap it here
ovo_clf = OneVsOneClassifier(SVC(random_state=42))

# Use a subset (2000 samples) since SVC is slower
ovo_clf.fit(X_train[:2000], y_train[:2000])

0,1,2
,estimator,SVC(random_state=42)
,n_jobs,

0,1,2
,C,1.0
,kernel,'rbf'
,degree,3
,gamma,'scale'
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


###  Step 6: Make a Prediction with the OvO Classifier
> This time, all 45 classifiers vote, and the digit with the most wins is chosen.

In [18]:
# actual label of the digit
actual_label = y_train[0]

# predicted label from OvO classifier
predicted_label = ovo_clf.predict([some_digit])[0]

# Compare
print(f"Predicted: {predicted_label}, Actual: {actual_label}")
print("✅ Correct!" if predicted_label == actual_label else "❌ Incorrect.")

Predicted: 5, Actual: 5
✅ Correct!


### Step 7: Count How Many Classifiers Were Trained
> Confirms that we’ve trained one classifier per digit pair.

In [12]:
len(ovo_clf.estimators_)

45