# Keras notes: fun on mnist
## MLP on mnist
Follow the [tutorial](https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html) of tensorflow on mnist, a MLP with a single fully-connected layer and a softmax optimized in SGD with *learning rate* 0.5 is supposed to get about 92% accuracy.

In [14]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Conv2D, pooling, Flatten, Dropout
from keras.datasets import mnist
from keras.optimizers import SGD
from keras.utils import np_utils

(X_train, y_train), (X_test, y_test) = mnist.load_data()

def reshape(X_train, X_test):
    # reshape [28, 28] to 784
    (a, b, c) = X_train.shape
    X_train = X_train.reshape((a, b*c))
    (a, b, c) = X_test.shape
    X_test = X_test.reshape((a, b*c))
    return X_train, X_test


def scale(X_train, X_test):
    # scale the elements to (0, 1)
    # note: this is important to get higher accuracy, 
    # and the dataloader in tensorflow has scaled, so data is directly fed to the MLP.
    X_train = X_train.astype('float32')
    X_train /= 255
    X_test = X_test.astype('float32')
    X_test /= 255
    return X_train, X_test

# map the label to a binary vector with dimension 10
y_train = np_utils.to_categorical(y_train, 10)
y_test = np_utils.to_categorical(y_test, 10)

X_train, X_test = reshape(X_train, X_test)
X_train, X_test = scale(X_train, X_test)

simple_MLP = Sequential([
    Dense(10, input_dim=784),
    # now the model will take as input arrays of shape (*, 784)
    # and output arrays of shape (*, 10)
    # * represents the dimension of batch, can be different between different batches.
    Activation('softmax'),
])

# the definition below is equivalant to the definition above
# model = Sequential([Dense(10, input_dim=784, Activiation('softmax'))])
sgd = SGD(lr=0.5)
simple_MLP.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
# In tutorial of tensorflow, 1000 steps with batch_size 100 are executed, which roughly are 2 epochs.
simple_MLP.fit(X_train, y_train, batch_size=100, nb_epoch=2)
score = simple_MLP.evaluate(X_test, y_test, batch_size=100)
print score

Epoch 1/2
Epoch 2/2


Let's try more complecated MLP model, with relu activation layer on dense_layer_1 and one more fully-connected layer, we can get better accuracy (about 97%).

In [16]:
complex_MLP = Sequential([
    Dense(32, input_dim=784, activation='relu'),
    Dense(10, activation='softmax'),
])

sgd = SGD(lr=0.5)
complex_MLP.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

complex_MLP.fit(X_train, y_train, batch_size=100, nb_epoch=5)
score = complex_MLP.evaluate(X_test, y_test, batch_size=100)
print score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[0.1213740740250796, 0.96500000476837156]


# Convolution Neural Network on mnist
MLP is good, but we need more complex model to get higher accuracy, i.e CNN. Since mnist is a quite simple dataset, the difference of MLP and CNN is not evident. However, CNN performs much better on chanllenging datasets. The final accuracy of this network on mnist is about 99.2%.

In [20]:
# reshape the data to image-like Tensor
def reshape_CNN(X_train, X_test):
    # TensorFlow shape: (num_images, width, height, color_channels)
    X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
    X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
    return X_train, X_test

X_train, X_test = reshape_CNN(X_train, X_test)

ml_CNN = Sequential([
    Conv2D(32, 5, 5, border_mode='same', input_shape=(28, 28, 1), activation='relu'), #Tensorflow order: (width, height, color_channels)
    # output shape (28, 28, 32)
    pooling.MaxPooling2D(pool_size=(2, 2), border_mode='same'), # one maximum within 2*2 patch
    # output shape(14, 14, 32)
    Conv2D(64, 5, 5, border_mode='same', activation='relu'),
    # output shape(14, 14, 64)
    pooling.MaxPooling2D(pool_size=(2, 2), border_mode='same'),
    # output shape(7, 7, 64)
    Flatten(),
    Dense(1024, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax'),
])

ml_CNN.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

ml_CNN.fit(X_train, y_train, batch_size=100, nb_epoch=5)
score = ml_CNN.evaluate(X_test, y_test, batch_size=100)
print score

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
[0.026601558394356743, 0.99220000565052036]
