### Handwriting recognition using MXnet - Distributed Deep learning platform in Julia

This demo shows how to build a simple neural nets based handwriting recognition system.

We use MNIST data which, the training set consists of 60,000 labeled samples, and test on 10,000 images.

** Credits **

[Original source](https://github.com/dmlc/MXNet.jl/tree/master/examples)


In [9]:
using MXNet

### Network definition

We construct a simple 3-layer MLP(Multi layer perceptron).



Variables are place holders for input arrays. We give each variable a unique name.

In [2]:
#-- Option 1: explicit composition
data = mx.Variable(:data)

MXNet.mx.SymbolicNode(MXNet.mx.MX_SymbolHandle(Ptr{Void} @0x000000000a3d0420))

The input is fed to a fully connected layer that computes Y=WX+b.
This is the main computation module in the network.
Each layer also needs an unique name. We'll talk more about naming in the next section.

In [3]:
fc1  = mx.FullyConnected(data = data, name=:fc1, num_hidden=128)


MXNet.mx.SymbolicNode(MXNet.mx.MX_SymbolHandle(Ptr{Void} @0x000000000a3d0480))

Activation layers apply a non-linear function on the previous layer's output.
Here we use Rectified Linear Unit (ReLU) that computes Y = max(X, 0).

In [4]:
act1 = mx.Activation(data = fc1, name=:relu1, act_type=:relu)
fc2  = mx.FullyConnected(data = act1, name=:fc2, num_hidden=64)
act2 = mx.Activation(data = fc2, name=:relu2, act_type=:relu)
fc3  = mx.FullyConnected(data = act2, name=:fc3, num_hidden=10)

MXNet.mx.SymbolicNode(MXNet.mx.MX_SymbolHandle(Ptr{Void} @0x000000000a3d0a80))

Finally we have a loss layer that compares the network's output with label and generates gradient signals.

In [5]:
mlp = mx.SoftmaxOutput(data = fc3, name=:softmax)

MXNet.mx.SymbolicNode(MXNet.mx.MX_SymbolHandle(Ptr{Void} @0x000000000a629ea0))

### Data Loading 

We fetch and load the MNIST dataset and partition it into two sets: 60000 examples for training and 10000 examples for testing. We also visualize a few examples to get an idea of what the dataset looks like.

In [6]:
batch_size = 100
include(joinpath(Pkg.dir("MXNet"),"examples","mnist","mnist-data.jl"))
train_provider, eval_provider = get_mnist_providers(batch_size)

(MXNet.mx.MXDataProvider(MXNet.mx.MX_DataIterHandle(Ptr{Void} @0x000000000a8e50b0),Tuple{Symbol,Tuple}[(:data,(784,100))],Tuple{Symbol,Tuple}[(:softmax_label,(100,))],100,true,true),MXNet.mx.MXDataProvider(MXNet.mx.MX_DataIterHandle(Ptr{Void} @0x000000000ac682a0),Tuple{Symbol,Tuple}[(:data,(784,100))],Tuple{Symbol,Tuple}[(:softmax_label,(100,))],100,true,true))

### Training

With the network and data source defined, we can finally start to train our model. We do this with MXNet's convenience wrapper for feed forward neural networks (it can also be made to handle RNNs with explicit unrolling).


In [7]:
model = mx.FeedForward(mlp, context=mx.cpu())

# optimizer
optimizer = mx.SGD(lr=0.1, momentum=0.9, weight_decay=0.00001)

# fit parameters
mx.fit(model, optimizer, train_provider, eval_data=eval_provider, n_epoch=20)


[1m[34mINFO: Start training on MXNet.mx.Context[CPU0]
[0m[1m[34mINFO: Initializing parameters...
[0m[1m[34mINFO: Creating KVStore...
[0m[1m[34mINFO: TempSpace: Total 0 MB allocated on CPU0
[0m[1m[34mINFO: Start training...
[0m[1m[34mINFO: ## Training summary
[0m[1m[34mINFO:           accuracy = 0.7613
[0m[1m[34mINFO:               time = 2.0320 seconds
[0m[1m[34mINFO: ## Validation summary
[0m[1m[34mINFO:           accuracy = 0.9534
[0m[1m[34mINFO: ## Training summary
[0m[1m[34mINFO:           accuracy = 0.9594
[0m[1m[34mINFO:               time = 1.5780 seconds
[0m[1m[34mINFO: ## Validation summary
[0m[1m[34mINFO:           accuracy = 0.9625
[0m[1m[34mINFO: ## Training summary
[0m[1m[34mINFO:           accuracy = 0.9716
[0m[1m[34mINFO:               time = 1.9220 seconds
[0m[1m[34mINFO: ## Validation summary
[0m[1m[34mINFO:           accuracy = 0.9702
[0m[1m[34mINFO: ## Training summary
[0m[1m[34mINFO:           accuracy

In [8]:
probs = mx.predict(model, eval_provider)

[1m[34mINFO: TempSpace: Total 0 MB allocated on CPU0
[0m

10×10000 Array{Float32,2}:
 2.7592f-13   2.2813f-16   8.19169f-10  …  3.95268f-15  2.00731f-19
 1.12923f-13  1.53626f-15  0.999995        2.03693f-24  4.82747f-20
 1.72981f-12  1.0          5.23094f-9      2.44641f-23  5.95852f-18
 2.23802f-12  2.51851f-16  1.42238f-10     1.13823f-16  1.39462f-25
 2.80721f-15  4.88567f-18  4.96745f-6      6.00545f-23  1.00998f-15
 2.18267f-14  3.27711f-21  1.37281f-10  …  1.0          1.72565f-17
 2.89086f-19  7.09912f-17  4.64837f-9      9.26938f-13  1.0        
 1.0          3.85409f-20  2.94261f-7      2.09347f-18  1.76729f-25
 3.30772f-13  4.47905f-20  2.41562f-7      2.73302f-13  1.10216f-15
 6.86849f-11  8.52513f-27  1.32489f-10     8.94669f-16  5.7505f-22 

In [20]:
labels = Array[]
for batch in eval_provider
  push!(labels, copy(mx.get(eval_provider, batch, :softmax_label)))
end
labels = cat(1, labels...)


10000-element Array{Float32,1}:
 7.0
 2.0
 1.0
 0.0
 4.0
 1.0
 4.0
 9.0
 5.0
 9.0
 0.0
 6.0
 9.0
 ⋮  
 5.0
 6.0
 7.0
 8.0
 9.0
 0.0
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0

### Visualise the network

PS : Needs create permissions

In [1]:
open("visualize.dot", "w") do io
    println(io, mx.to_graphviz(mlp))
end
run(pipeline(`dot -Tsvg visualize.dot`, stdout="visualize.svg"))


LoadError: UndefVarError: mx not defined

<img src=".\visualize.svg" width="100">

### Evaluation

After the model is trained, we can evaluate it on a held out test set. First, lets classity a sample image:

In [22]:
correct = 0
for i = 1:length(labels)
  # labels are 0...9
  if indmax(probs[:,i]) == labels[i]+1
    correct += 1
  end
end
println(mx.format("Accuracy on eval set: {1:.2f}%", 100correct/length(labels)))


Accuracy on eval set: 97.88%
