**Exercise: train an SVM classifier on the MNIST dataset. Since SVM classifiers are binary classifiers, you will need to use one-versus-all to classify all 10 digits. You may want to tune the hyperparameters using small validation sets to speed up the process. What accuracy can you reach?**

Goal: Reach 97% of accuracy on test set (accuracy reached by the author of the book in this exercise, we will use this as a benchmark).

Note: SVM automatically uses One-v-All, no need to manually do that.

### Get and parse data

In [1]:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version= 1, as_frame= False)


In [2]:
import numpy as np

X = mnist["data"]
y = mnist["target"].astype(np.uint8)

X_train = X[:60000]
y_train = y[:60000]
X_test = X[60000:]
y_test = y[60000:]

In [3]:
np.unique(y_train)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8)

### Train Support Vector Machine

In [4]:
from sklearn.svm import LinearSVC

lin_svc = LinearSVC()

lin_svc.fit(X_train, y_train)



LinearSVC()

Let's test the data. Here we will test against the train data itself, just to get a grasp of how the model works out in a cycle of data.

In [5]:
y_pred_1 = lin_svc.predict(X_train)

In [6]:
from sklearn.metrics import accuracy_score

accuracy_score(y_train, y_pred_1)

0.8748

87% of accuracy on first test.

Now, try scaling the dataset, since the SVM is extremely sensitive to scaling.

In [7]:
# Try on scaled set
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)

In [8]:
lin_svc = LinearSVC()
lin_svc.fit(X_train_scaled, y_train)



LinearSVC()

In [9]:
y_pred_2 = lin_svc.predict(X_train_scaled)
accuracy_score(y_train, y_pred_2)

0.9197333333333333

92% accuracy, better but not good enough

Now, try with SVC method.

In [10]:
from sklearn.svm import SVC
svc = SVC()
svc.fit(X_train_scaled, y_train)

SVC()

In [11]:
y_pred_3 = svc.predict(X_train_scaled)
accuracy_score(y_train, y_pred_3)

0.9866333333333334

98% accuracy!!! Maybe it means we're getting overfit? But let's try further with this model.

### Tuning the parameters

In [23]:
from sklearn.model_selection import GridSearchCV

parameters = {'kernel':['linear', 'rbf'], 'C':[1, 10], 'gamma': [0.001, 0.1]}
grid_search = GridSearchCV(svc, parameters, cv=3, scoring="accuracy")

In [24]:
grid_search.fit(X_train_scaled[:10000], y_train[:10000])

GridSearchCV(cv=3, estimator=SVC(),
             param_grid={'C': [1, 10], 'gamma': [0.001, 0.1],
                         'kernel': ['linear', 'rbf']},
             scoring='accuracy')

In [25]:
grid_search.best_params_

{'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}

In [31]:
svc = SVC(kernel="rbf", gamma=0.001, C=10)
svc.fit(X_train_scaled, y_train)

SVC(C=10, gamma=0.001)

In [32]:
from sklearn.model_selection import cross_val_score
cross_val_score(svc, X_train_scaled[:10000], y_train[:10000], 
                scoring="accuracy", cv=3, verbose=2, n_jobs=-1)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:   41.9s finished


array([0.94121176, 0.94449445, 0.93279328])

It doesn't look bad. Let's test agains the test set.

### Final test on test set

In [34]:
X_test_scaled = scaler.fit_transform(X_test)

y_pred = svc.predict(X_test_scaled)

accuracy_score(y_test, y_pred)

0.9724

97.24%, definetely not bad. One thing to notice is that of the values I gave for the grid search, it took the maximum value for C and lowest for gamma. Maybe even a greater value for C and a lesser value for gamma would give us more accuracy, but then we could risk overfit. 97% is enough for the sake of this exercise.