# Neural Networks 101 with `gluon`

Now we're going to implement a simple example of trainin a neural network with Gluon. We'll walk through all of the steps that you'll typically need, regardless of the application. Namely:


(For notebooks that don't rely on spoken instructions see https://gluon.mxnet.io/ )

1. Define network
1. Initialize parameters
1. Loop over inputs
  1. The forward pass - propagate an input through network to generate an output
  1. Compute loss by comparing your output to the label
  1. Run backpropagation to calculate the gradient
  1. Update parameters by (stochastic) gradient descent.

In [1]:
from __future__ import print_function
import mxnet as mx
from mxnet import nd, autograd, gluon

We'll also want to set the contexts for our data and our models.

In [2]:
data_ctx = mx.cpu()
# model_ctx = mx.cpu()
model_ctx = mx.gpu(0)

## `Block`s in `gluon`

`gluon.Block` is the basic building block of models. You can define networks by
composing and inheriting `Block`.

Any object that inherits from `gluon.Block` and defines a `forward` function is a `Block`.

```
class Net(gluon.Block):
    [...]  # We cover the __init__ function later

    # One or more NDArrays can be passed to `forward`
    def forward(self, x):
        # Computation
        # Do something with your data x to compute y
        return y
```

### Purpose of `Block`

- *Computation* (usually) depends on input data as well as parameters to be learned
- Inconvenient to manually pass parameters and data to function, so `gluon.Block` encapsulates parameters

### Using `Block`s

- Gluon defines many `Block`s for you
- For example `Dense` layers
- `Dense` implements `output = activation(dot(input, weight) + bias)`


### A simple `Block`

In [3]:
net = gluon.nn.Dense(1, in_units=2)  # Single output unit, 2 input units, no activation function

As promised, the `Dense` layer encapsulates the parameters for us:

In [4]:
print(net.weight)
print(net.bias)

Parameter dense0_weight (shape=(1, 2), dtype=float32)
Parameter dense0_bias (shape=(1,), dtype=float32)


Here we relied on the knowledge that `Dense` exposes the parameters `weight` and `bias`.
But `Block`s can be composed and may contain more parameters.
`collect_params()` allows us to retrieve all Parameters associated with a `Block`.
    

In [5]:
net.collect_params()

dense0_ (
  Parameter dense0_weight (shape=(1, 2), dtype=float32)
  Parameter dense0_bias (shape=(1,), dtype=float32)
)

The returned object is a `gluon.parameter.ParameterDict`. 

In [6]:
type(net.collect_params())

mxnet.gluon.parameter.ParameterDict

Before we can use the `Block`, we need to tell ``gluon`` what the *initial values* for our parameters should be!

We'll need to pass in two arguments. 

* An initializer, many of which live in the `mx.init` module. 
* A context where the parameters should live. In this case we'll pass in the `model_ctx`. Most often this will either be a GPU or a list of GPUs. 

In [7]:
net.collect_params().initialize(mx.initializer.Uniform(0.01), ctx=model_ctx)

Now we can access the actual parameter value:

In [8]:
print(net.weight.data())
print(net.bias.data())


[[ 0.00097627  0.00185689]]
<NDArray 1x2 @gpu(0)>

[ 0.]
<NDArray 1 @gpu(0)>


### Optimization

In [9]:
square_loss = gluon.loss.L2Loss()

In [10]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.0001})

### Data

In [11]:
num_inputs = 2
num_outputs = 1
num_examples = 10000

In [12]:
X = nd.random_normal(shape=(num_examples, num_inputs))
noise = 0.01 * nd.random_normal(shape=(num_examples, ))

In [13]:
def real_fn(X):
    return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2

y = real_fn(X) + noise

In [14]:
print(X)


[[ 0.7740038   1.04344046]
 [ 1.18392551  1.89171135]
 [-1.23474145 -1.771029  ]
 ..., 
 [ 0.08873925 -0.45150325]
 [-0.13049959  0.15614532]
 [-0.22753173 -0.19928493]]
<NDArray 10000x2 @cpu(0)>


In [15]:
print(y)


[ 2.21105146  0.13669638  7.76050282 ...,  5.90324974  3.42296672
  4.4221096 ]
<NDArray 10000 @cpu(0)>


### Batching

In [16]:
batch_size = 4
train_data = gluon.data.DataLoader(
    gluon.data.ArrayDataset(X, y), batch_size=batch_size, shuffle=True)

### Training loop

In [17]:
epochs = 10
num_batches = num_examples / batch_size
print(num_batches)

2500.0


In [18]:
def train_loop(epochs):
    for e in range(epochs):
        cumulative_loss = 0
        for i, (data, label) in enumerate(train_data):
            data = data.as_in_context(model_ctx)
            label = label.as_in_context(model_ctx)
            with autograd.record():
                output = net(data)
                loss = square_loss(output, label)
            loss.backward()
            trainer.step(batch_size)
            cumulative_loss += nd.mean(loss).asscalar()
        print("Epoch %s, loss: %.4f" % (e, cumulative_loss / num_examples))

In [19]:
train_loop(epochs)

Epoch 0, loss: 3.2707
Epoch 1, loss: 1.9822
Epoch 2, loss: 1.2014
Epoch 3, loss: 0.7281
Epoch 4, loss: 0.4413
Epoch 5, loss: 0.2675
Epoch 6, loss: 0.1621
Epoch 7, loss: 0.0983
Epoch 8, loss: 0.0596
Epoch 9, loss: 0.0361


## Getting the learned model parameters

In [20]:
params = net.collect_params() # this returns a ParameterDict
for param in params.values():
    print(param.name,param.data())

dense0_weight 
[[ 1.84507799 -3.11706018]]
<NDArray 1x2 @gpu(0)>
dense0_bias 
[ 3.85613751]
<NDArray 1 @gpu(0)>
