In [1]:
from utils import *
%matplotlib inline

# Train the neural network

<br>
<center><img src="support/robot.gif" width=600></center>

In this section, we will discuss how to train the previously defined network with data. We first import the libraries. The new ones are `mxnet.init` for more weight initialization methods, the `datasets` and `transforms` to load and transform computer vision datasets, `matplotlib` for drawing, and `time` for benchmarking.

In [2]:
from mxnet import nd, gluon, init, autograd

from mxnet.gluon import nn
from mxnet.gluon.data.vision import datasets, transforms

import matplotlib.pyplot as plt
from time import time

## Get data

### Training Dataset: MNIST

The handwritten digit MNIST dataset is one of the most commonly used datasets in deep learning. So we'll use it here
The dataset can be automatically downloaded through Gluon's `data.vision.datasets` module.

In [5]:
mnist_train = datasets.MNIST(train=True)
X, y = mnist_train[0]
print('X shape: %s dtype: %s' % (X.shape, X.dtype))
print("Number of images: %d" % len(mnist_train))

X shape: (28, 28, 1) dtype: <class 'numpy.uint8'>
Number of images: 60000


In order to feed data into a Gluon model, we need to convert the images to the `(channel, height, weight)` format with a floating point data type. It can be done by `transforms.ToTensor`. In addition, we normalize all pixel values to be between 0 and 1. We chain these two transforms together and apply it to the first element of the data pair, namely the images.

`MNIST` is a subclass of `gluon.data.Dataset`

Transform dataset:
- channel first, float32
- Normalize



In [6]:
def normalize(data):
    return data.astype('float32')/255

transformer = transforms.Compose([
    transforms.ToTensor(),
    normalize])

mnist_train = mnist_train.transform_first(transformer)

### Data Loading

In [7]:
batch_size = 256

train_data = gluon.data.DataLoader(
    mnist_train, batch_size=batch_size, shuffle=True, num_workers=4)

The returned `train_data` is an iterator that yields batches of images and labels pairs.

In [8]:
for data, label in train_data:
    print(data.shape, label.shape)
    break

(256, 1, 28, 28) (256,)


Finally, we create a validation dataset and data loader.

### Validation Dataset

In [9]:
mnist_valid = gluon.data.vision.MNIST(train=False)

valid_data = gluon.data.DataLoader(
    mnist_valid.transform_first(transformer),
    batch_size=batch_size, num_workers=4)

## Define the model

We reimplement the same LeNet introduced before. One difference here is that we changed the weight initialization method to `Xavier`, which is a popular choice for deep convolutional neural networks.

In [10]:
net = nn.Sequential()
with net.name_scope():
    net.add(
        nn.Conv2D(channels=6, kernel_size=5, activation='relu'),
        nn.MaxPool2D(pool_size=2, strides=2),
        nn.Conv2D(channels=16, kernel_size=3, activation='relu'),
        nn.MaxPool2D(pool_size=2, strides=2),
        nn.Flatten(),
        nn.Dense(120, activation="relu"),
        nn.Dense(84, activation="relu"),
        nn.Dense(10)
    )
net.initialize(init=init.Xavier())

Besides the neural network, we need to define the loss function and optimization method for training. We will use standard softmax cross entropy loss for classification problems. It first performs softmax on the output to obtain the predicted probability, and then compares the label with the cross entropy.

### Loss

In [11]:
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()

<center><img src="support/cross_entropy.png" width=400></center>

The optimization method we pick is the standard stochastic gradient descent with constant learning rate of 0.1.

### Optimization

In [12]:
trainer = gluon.Trainer(net.collect_params(),
                        'sgd', {'learning_rate': 0.1})

<center><img src="support/optimization.gif" width=400></center>

The `trainer` is created with all parameters (both weights and gradients) in `net`. Later on, we only need to call the `step` method to update its weights.

### Accuracy 

In [13]:
def acc(output, label):
    # output: (batch, num_output) float32 ndarray
    # label: (batch, ) int32 ndarray
    acc = (output.argmax(axis=1) == label.astype('float32'))
    return acc.mean().asscalar()

## Training loop

Now we can implement the complete training loop.

In [14]:
for epoch in range(10):
    train_loss, train_acc, valid_acc = 0., 0., 0.
    tic = time()
    for data, label in train_data:
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        loss.backward()
        
        trainer.step(batch_size)
        
        train_loss += loss.mean().asscalar()
        train_acc += acc(output, label)

  
    print("Epoch[%d] Loss:%.3f Acc:%.3f|%.3f Perf: %.1f img/sec"%(
        epoch, train_loss/len(train_data),
        train_acc/len(train_data),
        valid_acc/len(valid_data), 
        len(mnist_train)/(time()-tic)))

Epoch[0] Loss:2.302 Acc:0.112|0.000 Perf: 13074.0 img/sec
Epoch[1] Loss:2.301 Acc:0.112|0.000 Perf: 13459.5 img/sec
Epoch[2] Loss:2.301 Acc:0.112|0.000 Perf: 14561.0 img/sec
Epoch[3] Loss:2.301 Acc:0.112|0.000 Perf: 13906.9 img/sec
Epoch[4] Loss:2.301 Acc:0.112|0.000 Perf: 13997.2 img/sec
Epoch[5] Loss:2.301 Acc:0.112|0.000 Perf: 14125.1 img/sec
Epoch[6] Loss:2.301 Acc:0.112|0.000 Perf: 13050.7 img/sec
Epoch[7] Loss:2.301 Acc:0.112|0.000 Perf: 12960.7 img/sec
Epoch[8] Loss:2.301 Acc:0.112|0.000 Perf: 13985.7 img/sec
Epoch[9] Loss:2.300 Acc:0.112|0.000 Perf: 13794.4 img/sec


## Save the model

Finally, we save the trained parameters onto disk, so that we can use them later.


<center><img src="support/save.gif" width=600></center>

In [15]:
"Validation accuracy: %.2f"%(valid_acc/len(valid_data))

'Validation accuracy: 0.00'

In [17]:
net.save_parameters('net.params')