Copyright © 2017-2021 ABBYY Production LLC

In [1]:
#@title
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# MNIST IRNN tutorial

This tutorial partially reproduce experiment from the [Identity RNN article](https://arxiv.org/pdf/1504.00941.pdf).

It demonstrates that IRNN is especially good when working with long sequences.

It transposes MNIST images (28 x 28) into sequences of length 784.

The article claims that IRNN can achieve 0.9+ accuracy in these conditions.

This tutorial contains the following steps:

* Download and prepare MNIST dataset
* Build the net
* Train the net

## Download and prepare MNIST dataset

First of all, we need to download dataset/

In [2]:
from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)

Now we need to normalize it and convert to 32-bit datatypes for NeoML.

In [3]:
import numpy as np

# Normalize
X = (255 - X) * 2 / 255 - 1

# Fix data types
X = X.astype(np.float32)
y = y.astype(np.int32)

Finally, we have to split it into train and test parts.

In [4]:
# Split into train/test
train_size = 60000
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
del X, y

## Build the net


### Create math engine (choose device)

First of all we need to create math engine. It's an entity responsible for computational operations and data allocation for neural networks. It determines the device used for neural network training and inference.

In this tutorial we'll use single-threaded CPU math engine.

In [5]:
import neoml

math_engine = neoml.MathEngine.CpuMathEngine(1)

### Build the DNN

First of all we need the `neoml.Dnn.Dnn` object which represents a neural network (a graph of layers). Every net needs a math engine to perform its operations and its math engine can't be changed after creation.

In [6]:
dnn = neoml.Dnn.Dnn(math_engine)

The data is feeded to the network via special `neoml.Dnn.Source` layer.

In [7]:
data = neoml.Dnn.Source(dnn, 'data')  # Source for data

Then we need to transpose data into sequences of 784. We can do that by special `neoml.Dnn.Transpose` layer, which swaps 2 dimensions of the blob.

Original data will be wrapped into 2 dimensional blob where `BatchWidth` will be equal to batch size, and `Channesl` will be equal to image size. This layer will transform it into sequences (`BatchLength`) of image size, where each element of the sequence will be of size `1`.

In [8]:
transpose = neoml.Dnn.Transpose(data, first_dim='batch_length',
                                second_dim='channels', name='transpose')

Now we may add `neoml.Dnn.Irnn` layer.

In [9]:
hidden_size = 100
irnn = neoml.Dnn.Irnn(transpose, hidden_size, identity_scale=1.,
                      input_weight_std=1e-3, name='irnn')

But recurrent layers in NeoML usually return whole sequences. For experiment reproduction we must take only last elements of them. For this purpose we're gonna use `neoml.Dnn.SubSequence` layer.

In [10]:
subseq = neoml.Dnn.SubSequence(irnn, start_pos=-1,
                               length=1, name='subseq')

Now we use fully-connected layer to form logits (non-normalized distribution) over MNIST classes.

In [11]:
n_classes = 10
fc = neoml.Dnn.FullyConnected(subseq, n_classes, name='fc')

Here we must add additional `neoml.Dnn.Source` for labels and loss layer.

In [12]:
labels = neoml.Dnn.Source(dnn, 'labels')  # Source for labels
loss = neoml.Dnn.CrossEntropyLoss((fc, labels), name='loss')

We'll be calculating accuracy by the means of NeoML. To do this we'll need special `neoml.Dnn.Accuracy` layer (and `neoml.Dnn.Sink` layer for extracting accuracy's output).

In [13]:
# Auxilary layers in order to get statistics
accuracy = neoml.Dnn.Accuracy((fc, labels), name='accuracy')
# accuracy layers writes its result to its output
# We need additional sink layer to extract it
accuracy_sink = neoml.Dnn.Sink(accuracy, name='accuracy_sink')

### Create solver

Solver is an object which is responsible for weight optimization (based on gradient values). In this sample we'll use `neoml.Dnn.AdaptiveGradient` solver (neoml's realization of [Adam](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam)).

In [14]:
lr = 1e-6

# Create solver
dnn.solver = neoml.Dnn.AdaptiveGradient(math_engine, learning_rate=lr,
                                           l1=0., l2=0.,  # No regularization
                                           max_gradient_norm=1.,  # clip grad
                                           moment_decay_rate=0.9,
                                           second_moment_decay_rate=0.999)

## Train the net

The neoml's nets accept data only as `neoml.Blob.Blob`.

Blobs are 7-dimensional arrays located in device memory. Each dimension has a specific purpose:

1. `BatchLength` - temporal axis (used in recurrent layers)
2. `BatchWidth` - classic batch
3. `ListSize` - list axis, used when objects are related to the same entity, but without ordering (unlike `BatchLength`)
4. `Height` - height of the image
5. `Width` - width of the image
6. `Depth` - depth of the 3-dimensional image
7. `Channels` - channels of the image (also used when object is a 1-dimensional vector)

In our case we will use `ndarray` to split data into batches. Blobs will be created based on these batches right before feeding them to the net.

In [15]:
def irnn_data_iterator(X, y, batch_size, math_engine):
    """Slices numpy arrays into batches and wraps them in blobs"""
    def make_blob(data, math_engine):
        """Wraps numpy data into neoml blob"""
        shape = data.shape
        if len(shape) == 2:  # data
            # Wrap 2-D array into blob of (BatchWidth, Channels) shape
            return neoml.Blob.asblob(math_engine, data,
                                     (1, shape[0], 1, 1, 1, 1, shape[1]))
        elif len(shape) == 1:  # dense labels
            # Wrap 1-D array into blob of (BatchWidth,) shape
            return neoml.Blob.asblob(math_engine, data,
                                     (1, shape[0], 1, 1, 1, 1, 1))
        else:
            assert(False)

    start = 0
    data_size = y.shape[0]
    while start < data_size:
        yield (make_blob(X[start : start+batch_size], math_engine),
               make_blob(y[start : start+batch_size], math_engine))
        start += batch_size

In order to train net you should call `dnn.learn` with data as its argument.

In order to run net without traning you should call `dnn.run` with data as its argument.

The data argument is a `dict` where keys are `neoml.Dnn.Source` layers' names and values are corresponding `neoml.Blob.Blob`s.

In [16]:
def run_net(X, y, batch_size, dnn, is_train):
    """Runs dnn on given data"""
    start = time.time()
    total_loss = 0.
    run_iter = dnn.learn if is_train else dnn.run
    math_engine = dnn.math_engine
    layers = dnn.layers
    loss = layers['loss']
    accuracy = layers['accuracy']
    sink = layers['accuracy_sink']

    accuracy.reset = True  # Reset previous statistics
    # Iterate over batches
    for X_batch, y_batch in irnn_data_iterator(X, y, batch_size, math_engine):
        # Run the network on the batch data
        run_iter({'data': X_batch, 'labels': y_batch})
        total_loss += loss.last_loss * y_batch.batch_width  # Update epoch loss
        accuracy.reset = False  # Don't reset statistics within one epoch

    avg_loss = total_loss / y.shape[0]
    avg_acc = sink.get_blob().asarray()[0]
    run_time = time.time() - start
    return avg_loss, avg_acc, run_time

Now we can train the net.

*Note*: It will take 3-4 hours to train. You may uncomment print statements to see the progress.

In [17]:
%%time

import time

batch_size = 40
n_epoch = 200

for epoch in range(n_epoch):
    # Train
    train_loss, train_acc, run_time = run_net(X_train, y_train, batch_size,
                                      dnn, is_train=True)
    # print(f'Train #{epoch}\tLoss: {train_loss:.4f}\t'
    #       f'Accuracy: {train_acc:.4f}\tTime: {run_time:.2f} sec')
    # Test
    test_loss, test_acc, run_time = run_net(X_test, y_test, batch_size,
                                      dnn, is_train=False)
    # print(f'Test  #{epoch}\tLoss: {test_loss:.4f}\t'
    #       f'Accuracy: {test_acc:.4f}\tTime: {run_time:.2f} sec')
print(f'Final test acc: {test_acc:.4f}')

Final test acc: 0.9050
Wall time: 3h 54min 34s


As we can see this model has achieved 90+% of accuracy on these long sequences.