In [None]:
from utils import *
%matplotlib inline

# Create a neural network

<center><img src="support/neuralnetwork.gif" width=500></center>

Now let's look how to create neural networks in Gluon. In addition the NDArray package (`nd`) that we just covered, we now will also import the neural network `nn` package from `gluon`.

In [None]:
import mxnet as mx

from mxnet import nd
from mxnet.gluon import nn

## Create your first neural network layer

Let's start with a dense layer with 2 output units.
The None is because there hasn't been any data passed in yet

<center><img src="support/fullyconnected.png" width=400></center>

In [None]:
layer = nn.Dense(2, activation="relu")
layer

Then initialize its weights with the default initialization method, which draws random values uniformly from $[-0.7, 0.7]$.

Initialization

In [None]:
layer.params

In [None]:
layer.initialize(mx.init.Xavier())

Then we do a forward pass with random data. We create a $(3,4)$ shape random input `x` and feed into the layer to compute the output.

Forward with input `x`

In [None]:
N = 3
x = nd.random.uniform(low=-1, high=1, shape=(N, 4))
x

In [None]:
output = layer(x)
output

Inferred shape

In [None]:
layer.params

As can be seen, the layer's input limit of 2 produced a $(3,2)$ shape output from our $(3,4)$ input. Note that we didn't specify the input size of `layer` before (though we can specify it with the argument `in_units=4` here), the system will automatically infer it during the first time we feed in data, create and initialize the weights. So we can access the weight after the first forward pass:

Weights

In [None]:
layer.weight.data()

## Chain layers into a neural network

Let's first consider a simple case that a neural network is a chain of layers. During the forward pass, we run layers sequentially one-by-one. The following code implements a famous network called [LeNet](http://yann.lecun.com/exdb/lenet/) through `nn.Sequential`.

In [None]:
net = nn.Sequential()
with net.name_scope():
    # Add a sequence of layers.
    net.add(
        nn.Conv2D(channels=6, kernel_size=(5, 5), activation='relu'),

        nn.MaxPool2D(pool_size=2, strides=2),
        nn.Conv2D(channels=16, kernel_size=3, activation='relu'),
        nn.MaxPool2D(pool_size=2, strides=2),
        
        nn.Flatten(),
        nn.Dense(120, activation="relu"),
        nn.Dense(84, activation="relu"),
        nn.Dense(10)
    )
net

In [None]:
mx.viz.plot_network(net(mx.sym.var('data')), 
                    shape={"data":(1, 1, 28, 28)},
                    node_attrs={"shape":"oval","fixedsize":"False"},
                   )

<!--Mention the tuple option for kernel and stride as an exercise for the reader? Or leave it out as too much info for now?-->

The usage of `nn.Sequential` is similar to `nn.Dense`. In fact, both of them are subclasses of `nn.Block`. The following codes show how to initialize the weights and run the forward pass.

Run network

In [None]:
net.initialize()
# Input shape is (batch_size, color_channels, height, width)
x = nd.random.uniform(shape=(4, 1, 28, 28))
y = net(x)
y.shape

We can use `[]` to index a particular layer. For example, the following
accesses the 1st layer's weight and 6th layer's bias.

Specific layer

In [None]:
net

In [None]:
"First Conv2D layer weight shape {}".format(net[0].weight.data().shape)

## Create a neural network flexibly

In `nn.Sequential`, MXNet will automatically construct the forward function that sequentially executes added layers.
Now let's introduce another way to construct a network with a flexible forward function.

To do it, we create a subclass of `nn.Block` and implement two methods:

- `__init__` create the layers
- `forward` define the forward function.

In [None]:
class MixMLP(nn.Block):
    def __init__(self, **kwargs):
        # Run `nn.Block`'s init method
        super(MixMLP, self).__init__(**kwargs)
        with self.name_scope():
            self.features = nn.Sequential()
            # Already within a name scope, no need to create
            # another scope.
            self.features.add(
                nn.Dense(3, activation='relu'),
                nn.Dense(4)
            )
            self.output = nn.Dense(5)
    def forward(self, x):
        y = nd.relu(self.features(x))
        print("Features", y)
        return self.output(y)

net2 = MixMLP()
net2

In the sequential chaining approach, we can only add instances with `nn.Block` as the base class and then run them in a forward pass. In this example, we used `print` to get the intermediate results and `nd.relu` to apply relu activation. So this approach provides a more flexible way to define the forward function.

The usage of `net` is similar as before.

Print statements

In [None]:
net2.initialize()

In [None]:
x = nd.random.uniform(shape=(2,2))
out = net2(x)

Finally, let's access a particular layer's weight

Weight access

In [None]:
net2.features[1].weight.data()

# Fast, portable neural networks with `hybrid`
<br>
<center><img src="support/fast.gif" width=300><center>

First let's understand imperative and symbolic programming

## Imperative Pseudofunction
```
def our_function(A, B, C, D):
    # Compute some intermediate values
    E = basic_function1(A, B)
    F = basic_function2(C, D)
    
    # Produce the thing you really care about
    G = basic_function3(E, F)
    return G
    
# Load up some data
W = some_stuff()
X = some_stuff()
Y = some_stuff()
Z = some_stuff()
    
result = our_function(W, X, Y, Z)
```

## Symbolic Pseudofunction

```
# Placeholders to stand in for real data
A = placeholder() 
B = placeholder()
C = placeholder()
D = placeholder()

# Compute some intermediate values
E = symbolic_function1(A, B)
F = symbolic_function2(C, D)
    
# Produce the thing you really care about
G = symbolic_function3(E, F)
    
our_function = library.compile(inputs=[A, B, C, D], outputs=[G])   
    
# Load up some data
W = some_stuff()
X = some_stuff()
Y = some_stuff()
Z = some_stuff()
    
result = our_function(W, X, Y, Z)
```

## Tradeoffs 

### Imperative Programs Tend to be More Flexible
* familiar style faster debugging, means you get to try out more ideas.
* the catch is that imperative programs are *comparatively* slow

### Symbolic Programs Tend to be More Efficient
* memory efficiency via reuse for intermediate results/speed optimizations via operator folding
* the catch is the tricky indirection of working with placeholders

## Getting the best of both worlds with MXNet Gluon's `HybridBlock`s


**All of MXNet's predefined layers are HybridBlocks.** This means that any network consisting entirely of predefined MXNet layers can be compiled and run at much faster speeds by calling ``.hybridize()``.

## HybridSequential

In [None]:
def get_net():
    # construct a MLP
    net = nn.HybridSequential()
    with net.name_scope():
        net.add(nn.Dense(256, activation="relu"))
        net.add(nn.Dense(128, activation="relu"))
        net.add(nn.Dense(2))
    # initialize the parameters
    net.collect_params().initialize()
    return net

# forward
x = nd.random_normal(shape=(1, 512))
net = get_net()
print('=== net(x) ==={}'.format(net(x)))

In [None]:
net.hybridize()
print('=== net(x) ==={}'.format(net(x)))

### Performance
Compare the performance before and after hybridizing 
by measuring the time it takes to make 1000 forward passes through the network.

In [None]:
from time import time
def bench(net, x):
    mx.nd.waitall()
    start = time()
    for i in range(1000):
        y = net(x)
    mx.nd.waitall()
    return time() - start

In [None]:
net = get_net()
print('Before hybridizing: %.4f sec'%(bench(net, x)))
net.hybridize()
print('After hybridizing: %.4f sec'%(bench(net, x)))

## let's dive deeper into how `hybridize` works.
* Recall, Gluon networks are composed of Blocks each of which subclass `gluon.Block`
* For hybrid networks, we have `gluon.HybridBlock`
* To define a `HybridBlock`, we have to define a`hybrid_forward` function:

## HybridBlock

In [None]:
from mxnet import gluon

class Net(gluon.HybridBlock):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        with self.name_scope():
            self.fc1 = nn.Dense(256)
            self.fc2 = nn.Dense(128)
            self.fc3 = nn.Dense(2)

    def hybrid_forward(self, F, x):
        # F is a function space that depends on the type of x
        # If x's type is NDArray, then F will be mxnet.nd
        # If x's type is Symbol, then F will be mxnet.sym
        print('type(x): {}, F: {}'.format(
                type(x).__name__, F.__name__))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.fc3(x)

In [None]:
net = Net()
net.collect_params().initialize()
x = nd.random_normal(shape=(1, 512))
print('=== 1st forward ===')
y = net(x)
print('=== 2nd forward ===')
y = net(x)

In [None]:
net.hybridize()
print('=== 1st forward ===')
y = net(x)
print('=== 2nd forward ===')
y = net(x)