# MNIST 

In this example, we consider the classic machine learning dataset MNIST and the task of classifying handwritten digits. By modern computer vision standards this dataset is considered small, yet it is sufficiently large that many standard classifiers (e.g. those in the Python package `sklearn`) require significant time to train a model. Nonetheless, [Epsilon](http://epopt.io/) is able to fit a model that achieves near state-of-the-art accuracy in a few minutes. 

<img src="mnist.png" />

The standard task is to train a multiclass classifier that can correctly identify digits from their pixel intensity values. For the purposes of this example, we simplify this task slightly and instead consider the binary classification task of even vs. odd. To build our classifier we have a training set of 60K images of dimension 28x28 and a test set of 10K images.

## Support vector machine and hinge loss

In [1]:
import io
import urllib
import cvxpy as cp
import numpy as np
import numpy.linalg as LA

mnist = np.load(io.BytesIO(urllib.urlopen("http://epopt.s3.amazonaws.com/mnist.npz").read()))

In [None]:
def error(x, y):
    return 1 - np.sum(x == y) / float(len(x))

It is straightforward to define the hinge loss function in CVXPY

In [2]:
def hinge_loss(theta, X, y):
    return cp.sum_entries(cp.max_elemwise(1 - sp.diags([y],[0])*X*theta, 0))

In [3]:
# Parameters
m, n = Xf.shape
theta = cp.Variable(n)
lam = 10

# Form problem with CVXPY and solve with Epsilon
f = ep.hinge_loss(theta, Xf, y) + lam*cp.sum_squares(theta)
prob = cp.Problem(cp.Minimize(f))
ep.solve(prob)

# Get solution
theta0 = np.ravel(theta.value)
print "Train error:", error((Xf.dot(theta0)>0)*2-1, y)
print "Test error:", error((Xtestf.dot(theta0)>0)*2-1, ytest)

NameError: name 'Xf' is not defined

For convenience, Epsilon provides this definition as well as several others, see [`functions.py`](https://github.com/mwytock/epsilon/blob/master/python/epopt/functions.py) for more.

## Non-linear classifier using random Fourier features

In [None]:
def sq_dist(X, Y):
    """Squared euclidean distance for two sets of points."""
    return (
        np.tile(np.sum(X**2, axis=1), (Y.shape[0],1)).T +
        np.tile(np.sum(Y**2, axis=1), (X.shape[0],1)) -
        2*X.dot(Y.T))
    
def median_dist(X):
    """Compute the approximate median distance by sampling pairs."""
    k = 3000
    idx = np.random.randint(0, X.shape[0], k)
    D = sq_dist(X[idx,:], X[idx,:])
    return np.sqrt(np.median(D))
    
def pca(X, dim):
    """Perform centered PCA."""
    X = X - X.mean(axis=0)
    return LA.eigh(X.T.dot(X))[1][:,-dim:]
    
def error(x, y):
    return 1 - np.sum(x == y) / float(len(x))

np.random.seed(0)
mnist = np.load("/tmp/mnist.npz")

X = mnist["X"]
y = (mnist["Y"].ravel() % 2 == 0)*2-1
Xtest = mnist["Xtest"]
ytest = (mnist["Ytest"][:10000,:].ravel() % 2 == 0)*2-1

V = pca(X, 50)
Xp = X.dot(V)
sigma = median_dist(Xp)

n = 4000
W = np.random.randn(Xp.shape[1], n) / sigma
b = np.random.uniform(0, 2*np.pi, n)
X = np.cos(Xp.dot(W) + b)
Xtest = np.cos(Xtest.dot(V).dot(W) + b)

Given our transformed dataset, the next step is to fit the classifier. We apply the `hinge_loss()` function discussed above along with some regularization. 

In [None]:
# Parameters
m, n = Xf.shape
theta = cp.Variable(n)
lam = 10

# Form problem with CVXPY and solve with Epsilon
f = ep.hinge_loss(theta, Xf, y) + lam*cp.sum_squares(theta)
prob = cp.Problem(cp.Minimize(f))
ep.solve(prob)

# Get solution
theta0 = np.ravel(theta.value)
print "Train error:", error((Xf.dot(theta0)>0)*2-1, y)
print "Test error:", error((Xtestf.dot(theta0)>0)*2-1, ytest)

## Conclusions