In [1]:
from IPython.display import HTML
HTML(open("../style.css", "r").read())

# Digit Recognition using a Support Vector Machine

In [2]:
import gzip
import pickle
import numpy       as np
import sklearn.svm as svm

The function $\texttt{load_data}()$ returns a pair of the form
$$ (\texttt{training_data}, \texttt{test_data}) $$
where 
* $\texttt{training_data}$ is a list containing 60,000 pairs $(\textbf{x}, y)$ s.t. $\textbf{x}$ is a 784-dimensional `numpy.ndarray` containing the input image and $y$ is the digit that is supposed to be shown in the image $\textbf{x}$.
* $\texttt{test_data}$ is a list containing 10,000 pairs $(\textbf{x}, y)$.  In each case, 
     $\textbf{x}$ is a 784-dimensional `numpy.ndarry` containing the input image, 
     and $y$ is the corresponding digit value.

The images are grey scale pictures of size $28 \times 28$.

In [3]:
def load_data():
    with gzip.open('../mnist.pkl.gz', 'rb') as f:
        train, validate, test = pickle.load(f, encoding="latin1")
    X_train = np.array([np.reshape(x, (784, )) for x in train[0]])
    X_test  = np.array([np.reshape(x, (784, )) for x in test [0]])
    Y_train = np.array(train[1])
    Y_test  = np.array(test [1])
    return (X_train, X_test, Y_train, Y_test)

In [4]:
X_train, X_test, Y_train, Y_test = load_data()

Let us see what we have read:

In [5]:
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

((50000, 784), (10000, 784), (50000,), (10000,))

We define a support vector machine with a Gaussian kernel. 
The string `rbf` is short for *radial basis function* and is another name for a Gaussian kernel.

In [6]:
M = svm.SVC(kernel='rbf', gamma=0.05, C=5)

The next cell takes about 7 minutes to execute on my Mac Studio from 2022.

In [7]:
%%time
M.fit(X_train, Y_train)

CPU times: user 6min 40s, sys: 1.74 s, total: 6min 42s
Wall time: 6min 42s


0,1,2
,"C  C: float, default=1.0 Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. For an intuitive visualization of the effects of scaling the regularization parameter C, see :ref:`sphx_glr_auto_examples_svm_plot_svm_scale_c.py`.",5
,"kernel  kernel: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable, default='rbf' Specifies the kernel type to be used in the algorithm. If none is given, 'rbf' will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape ``(n_samples, n_samples)``. For an intuitive visualization of different kernel types see :ref:`sphx_glr_auto_examples_svm_plot_svm_kernels.py`.",'rbf'
,"degree  degree: int, default=3 Degree of the polynomial kernel function ('poly'). Must be non-negative. Ignored by all other kernels.",3
,"gamma  gamma: {'scale', 'auto'} or float, default='scale' Kernel coefficient for 'rbf', 'poly' and 'sigmoid'. - if ``gamma='scale'`` (default) is passed then it uses  1 / (n_features * X.var()) as value of gamma, - if 'auto', uses 1 / n_features - if float, must be non-negative. .. versionchanged:: 0.22  The default value of ``gamma`` changed from 'auto' to 'scale'.",0.05
,"coef0  coef0: float, default=0.0 Independent term in kernel function. It is only significant in 'poly' and 'sigmoid'.",0.0
,"shrinking  shrinking: bool, default=True Whether to use the shrinking heuristic. See the :ref:`User Guide `.",True
,"probability  probability: bool, default=False Whether to enable probability estimates. This must be enabled prior to calling `fit`, will slow down that method as it internally uses 5-fold cross-validation, and `predict_proba` may be inconsistent with `predict`. Read more in the :ref:`User Guide `.",False
,"tol  tol: float, default=1e-3 Tolerance for stopping criterion.",0.001
,"cache_size  cache_size: float, default=200 Specify the size of the kernel cache (in MB).",200
,"class_weight  class_weight: dict or 'balanced', default=None Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The ""balanced"" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``.",


The next cell takes about 5 minutes.  The accuracy on the training set is $100\%$! 

In [8]:
%%time
M.score(X_train, Y_train)

CPU times: user 4min 36s, sys: 332 ms, total: 4min 37s
Wall time: 4min 38s


1.0

On the test set, the accuracy is still $98\%$!

In [9]:
%%time
M.score(X_test, Y_test)

CPU times: user 55.8 s, sys: 72.3 ms, total: 55.9 s
Wall time: 55.9 s


0.9828