# Exercise 9
Train an SVM classifier on the MNIST dataset. Since SVM classifiers are binary classifiers, you will need to use one-versus-all to classify all 10 digits.You may want to tune the hyperparameters using small validation sets to speed up the process. What accuracy can you reach?

In [1]:
from sklearn.datasets import fetch_mldata
import numpy as np

mnist = fetch_mldata('MNIST original')
X, y = mnist['data'], mnist['target']
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

np.random.seed(42)
shuffle_index = np.random.permutation(60000)
X_train, y_train = X_train[shuffle_index], y_train[shuffle_index]



In [2]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train.astype(np.float32))
X_test_scaled = scaler.fit_transform(X_test.astype(np.float32))

**`LinearSVC`**

In [3]:
%timeit
from sklearn.svm import LinearSVC

lin_clf = LinearSVC(random_state=42)
lin_clf.fit(X_train_scaled, y_train)



LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=42, tol=0.0001,
     verbose=0)

In [4]:
%timeit
from sklearn.metrics import accuracy_score

y_pred_lin = lin_clf.predict(X_train_scaled)
accuracy_score(y_train, y_pred_lin)

0.9219333333333334

**`SVC`**
<br>If we want to use an SVM, we will have to use a kernel. Let's try an SVC with an RBF kernel (the default).

In [5]:
%timeit
from sklearn.svm import SVC
svm_clf = SVC(kernel='rbf', gamma='auto', C=5, random_state=42)
svm_clf.fit(X_train_scaled[:10000], y_train[:10000])

SVC(C=5, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=42, shrinking=True,
  tol=0.001, verbose=False)

In [6]:
%timeit
y_pred_svm = svm_clf.predict(X_train_scaled)
accuracy_score(y_train, y_pred_svm)

0.9563

C=1 & gamma = 'auto': 0.94615<br>
C=2 & gamma = 'auto': 0.95313<br>
C=5 & gamma = 'auto': 0.9563

In [7]:
y_pred_svm_test = svm_clf.predict(X_test_scaled)
accuracy_score(y_test, y_pred_svm_test)

0.9499

Inspired by https://github.com/ageron/handson-ml/blob/master/05_support_vector_machines.ipynb <br>
That's promising, we get better performance even though we trained the model on 6 times less data. Let's tune the hyperparameters by doing a randomized search with cross validation. We will do this on a small dataset just to speed up the process:

In [8]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import reciprocal, uniform

param_distributions = {'gamma': reciprocal(0.001, 0.1), 'C': uniform(1, 10)}
rnd_search_cv = RandomizedSearchCV(svm_clf, param_distributions, n_iter=10, verbose=2, cv=3)
rnd_search_cv.fit(X_train_scaled[:1000], y_train[:1000])

Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] C=3.5988934774124326, gamma=0.001513186272679838 ................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV] . C=3.5988934774124326, gamma=0.001513186272679838, total=   0.8s
[CV] C=3.5988934774124326, gamma=0.001513186272679838 ................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.1s remaining:    0.0s


[CV] . C=3.5988934774124326, gamma=0.001513186272679838, total=   0.8s
[CV] C=3.5988934774124326, gamma=0.001513186272679838 ................
[CV] . C=3.5988934774124326, gamma=0.001513186272679838, total=   0.8s
[CV] C=3.055855399572819, gamma=0.06719156480223124 ..................
[CV] ... C=3.055855399572819, gamma=0.06719156480223124, total=   1.0s
[CV] C=3.055855399572819, gamma=0.06719156480223124 ..................
[CV] ... C=3.055855399572819, gamma=0.06719156480223124, total=   1.0s
[CV] C=3.055855399572819, gamma=0.06719156480223124 ..................
[CV] ... C=3.055855399572819, gamma=0.06719156480223124, total=   1.0s
[CV] C=1.107005900704854, gamma=0.001848939794318145 .................
[CV] .. C=1.107005900704854, gamma=0.001848939794318145, total=   0.8s
[CV] C=1.107005900704854, gamma=0.001848939794318145 .................
[CV] .. C=1.107005900704854, gamma=0.001848939794318145, total=   0.8s
[CV] C=1.107005900704854, gamma=0.001848939794318145 .................
[CV] .

[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:   40.6s finished


RandomizedSearchCV(cv=3, error_score='raise-deprecating',
          estimator=SVC(C=5, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=42, shrinking=True,
  tol=0.001, verbose=False),
          fit_params=None, iid='warn', n_iter=10, n_jobs=None,
          param_distributions={'gamma': <scipy.stats._distn_infrastructure.rv_frozen object at 0x12d8912e8>, 'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x12d891860>},
          pre_dispatch='2*n_jobs', random_state=None, refit=True,
          return_train_score='warn', scoring=None, verbose=2)

In [9]:
rnd_search_cv.best_estimator_

SVC(C=6.335534109540218, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.0011263118134606108,
  kernel='rbf', max_iter=-1, probability=False, random_state=42,
  shrinking=True, tol=0.001, verbose=False)

In [10]:
rnd_search_cv.best_score_

0.863

In [11]:
y_pred_search_test = rnd_search_cv.best_estimator_.predict(X_test_scaled)
accuracy_score(y_test, y_pred_search_test)

0.884