## Learning MNIST with a Multi-Layer Perceptron

First, let's download the data set.

In [None]:
!wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
!wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
!wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
!wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
!gzip -d train*.gz t10k*.gz

In [9]:
import mxnet as mx
import logging
import os

In [10]:
logging.basicConfig(level=logging.INFO)

nb_epochs=50

MXNet provides a convenient iterator for MNIST. We use it to build the training and the validation iterators.

In [11]:
train_iter = mx.io.MNISTIter(shuffle=True)
val_iter = mx.io.MNISTIter(image="./t10k-images-idx3-ubyte", label="./t10k-labels-idx1-ubyte")

We build a Multi-Layer Perceptron:
- an input layer receiving a flattened MNIST image (28x28 --> 784),
- a fully connected hidden layer with 512 neurons activated by the ReLU function,
- a dropout layer to prevent overfitting,
- a second fully connected hidden layer with 256 neurons activated by the ReLU function,
- a second dropout layer to prevent overfitting,
- an output layer with 10 neurons (because we have 10 categories), holding probabilities computed by the SoftMax function.

In [12]:
data = mx.sym.Variable('data')
data = mx.sym.Flatten(data=data)
fc1  = mx.sym.FullyConnected(data=data, name='fc1', num_hidden=512)
act1 = mx.sym.Activation(data=fc1, name='relu1', act_type="relu")
drop1= mx.sym.Dropout(data=act1,p=0.2)
fc2  = mx.sym.FullyConnected(data=drop1, name='fc2', num_hidden = 256)
act2 = mx.sym.Activation(data=fc2, name='relu2', act_type="relu")
drop2= mx.sym.Dropout(data=act2,p=0.2)
fc3  = mx.sym.FullyConnected(data=drop2, name='fc3', num_hidden=10)
mlp  = mx.sym.SoftmaxOutput(data=fc3, name='softmax')

Now, we need to:
- bind the model to the training set,
- initialize the parameters, i.e. set initial values for all weights,
- pick an optimizer and a learning rate, to adjust weights during backpropagation

In [13]:
mod = mx.mod.Module(mlp, context=mx.gpu(0))
mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)
mod.init_params(initializer=mx.init.Xavier())
#mod.init_optimizer('sgd', optimizer_params=(('learning_rate', 0.01),))
mod.init_optimizer('adagrad', optimizer_params=(('learning_rate', 0.1),))

Time to train!

In [14]:
mod.fit(train_iter, eval_data=val_iter, num_epoch=nb_epochs,
        batch_end_callback=mx.callback.Speedometer(128, 100))

  allow_missing=allow_missing, force_init=force_init)
INFO:root:Epoch[0] Batch [100]	Speed: 93204.00 samples/sec	accuracy=0.745900
INFO:root:Epoch[0] Batch [200]	Speed: 92606.10 samples/sec	accuracy=0.875938
INFO:root:Epoch[0] Batch [300]	Speed: 93716.34 samples/sec	accuracy=0.894687
INFO:root:Epoch[0] Batch [400]	Speed: 98678.08 samples/sec	accuracy=0.910859
INFO:root:Epoch[0] Train-accuracy=0.917444
INFO:root:Epoch[0] Time cost=0.634
INFO:root:Epoch[0] Validation-accuracy=0.940905
INFO:root:Epoch[1] Batch [100]	Speed: 105258.28 samples/sec	accuracy=0.929533
INFO:root:Epoch[1] Batch [200]	Speed: 103471.85 samples/sec	accuracy=0.931172
INFO:root:Epoch[1] Batch [300]	Speed: 106309.81 samples/sec	accuracy=0.936797
INFO:root:Epoch[1] Batch [400]	Speed: 106854.23 samples/sec	accuracy=0.942891
INFO:root:Epoch[1] Train-accuracy=0.941348
INFO:root:Epoch[1] Time cost=0.574
INFO:root:Epoch[1] Validation-accuracy=0.955829
INFO:root:Epoch[2] Batch [100]	Speed: 112074.59 samples/sec	accuracy=0.949

INFO:root:Epoch[17] Time cost=0.530
INFO:root:Epoch[17] Validation-accuracy=0.981671
INFO:root:Epoch[18] Batch [100]	Speed: 112141.31 samples/sec	accuracy=0.989867
INFO:root:Epoch[18] Batch [200]	Speed: 116011.97 samples/sec	accuracy=0.989141
INFO:root:Epoch[18] Batch [300]	Speed: 109663.07 samples/sec	accuracy=0.990469
INFO:root:Epoch[18] Batch [400]	Speed: 112154.43 samples/sec	accuracy=0.991328
INFO:root:Epoch[18] Train-accuracy=0.990205
INFO:root:Epoch[18] Time cost=0.535
INFO:root:Epoch[18] Validation-accuracy=0.982672
INFO:root:Epoch[19] Batch [100]	Speed: 116968.91 samples/sec	accuracy=0.991182
INFO:root:Epoch[19] Batch [200]	Speed: 114257.03 samples/sec	accuracy=0.989453
INFO:root:Epoch[19] Batch [300]	Speed: 108295.12 samples/sec	accuracy=0.990938
INFO:root:Epoch[19] Batch [400]	Speed: 117809.77 samples/sec	accuracy=0.992578
INFO:root:Epoch[19] Train-accuracy=0.992887
INFO:root:Epoch[19] Time cost=0.530
INFO:root:Epoch[19] Validation-accuracy=0.983674
INFO:root:Epoch[20] Batch

INFO:root:Epoch[36] Batch [300]	Speed: 116549.17 samples/sec	accuracy=0.996016
INFO:root:Epoch[36] Batch [400]	Speed: 110060.09 samples/sec	accuracy=0.997188
INFO:root:Epoch[36] Train-accuracy=0.997901
INFO:root:Epoch[36] Time cost=0.528
INFO:root:Epoch[36] Validation-accuracy=0.983273
INFO:root:Epoch[37] Batch [100]	Speed: 119067.83 samples/sec	accuracy=0.996519
INFO:root:Epoch[37] Batch [200]	Speed: 111317.49 samples/sec	accuracy=0.996094
INFO:root:Epoch[37] Batch [300]	Speed: 114146.98 samples/sec	accuracy=0.996484
INFO:root:Epoch[37] Batch [400]	Speed: 115374.48 samples/sec	accuracy=0.997500
INFO:root:Epoch[37] Train-accuracy=0.997785
INFO:root:Epoch[37] Time cost=0.525
INFO:root:Epoch[37] Validation-accuracy=0.984175
INFO:root:Epoch[38] Batch [100]	Speed: 113929.94 samples/sec	accuracy=0.996519
INFO:root:Epoch[38] Batch [200]	Speed: 111846.48 samples/sec	accuracy=0.996406
INFO:root:Epoch[38] Batch [300]	Speed: 112082.31 samples/sec	accuracy=0.996406
INFO:root:Epoch[38] Batch [400]

In [17]:
mod.save_checkpoint("mlp", nb_epochs)

INFO:root:Saved checkpoint to "mlp-0050.params"


Let's measure validation accuracy.

In [18]:
metric = mx.metric.Accuracy()
mod.score(val_iter, metric)
print(metric.get())

('accuracy', 0.984375)
