# Network Manipulation

In this part, we are going to learn to manipulate a network, including forward
and backward passes.

In [1]:
"""Initialization (see "00 Basic solver usage")."""
import os
import numpy as np

# Silence caffe network loading output. Must be set before importing caffe
os.environ["GLOG_minloglevel"] = '2'
import caffe
CAFFE_ROOT="/caffe"
os.chdir(CAFFE_ROOT) # change the current directory to the caffe root, to help
                     # with the relative paths
USE_GPU = True
if USE_GPU:
    caffe.set_device(0)
    caffe.set_mode_gpu()
else:
    caffe.set_mode_cpu()
# For reproducible results
caffe.set_random_seed(0) # custom modification, remove this line from your code if it doesn't work
np.random.seed(0)

We're going to use the basic MNIST example, with the adapted _LeNet_ network.

In [2]:
net = caffe.Net("examples/mnist/lenet_train_test.prototxt", caffe.TRAIN)

## Inspecting the network

You can inspect the network, to see the layer names (`net._layer_names`), or the
list of layers (`net.layers`) with their types.

In [3]:
print("Network layers:")
for name, layer in zip(net._layer_names, net.layers):
    print("{:<7}: {:17s}({} blobs)".format(name, layer.type, len(layer.blobs)))

Network layers:
mnist  : Data             (0 blobs)
conv1  : Convolution      (2 blobs)
pool1  : Pooling          (0 blobs)
conv2  : Convolution      (2 blobs)
pool2  : Pooling          (0 blobs)
ip1    : InnerProduct     (2 blobs)
relu1  : ReLU             (0 blobs)
ip2    : InnerProduct     (2 blobs)
loss   : SoftmaxWithLoss  (0 blobs)


To access the weights of a layer, simply access the blobs of that layer. Usually, there are either no blobs (inputs, ReLU, Pooling, loss) or 2 blobs, the weights and the bias. You can modify them as you would modify a numpy array.

Another way to access those parameters are through `net.params`.

In [4]:
len(net.params["ip1"])

2

To inspect the data flow, and the gradients over these flows, you can inspect the network blobs. It is an `OrderedDict` (bottom to top) of blob names to blobs. These blobs carry the data between the layers (see http://caffe.berkeleyvision.org/tutorial/net_layer_blob.html)

In [5]:
print("Blobs:")
for name, blob in net.blobs.iteritems():
    print("{:<5}:  {}".format(name, blob.data.shape))

Blobs:
data :  (64, 1, 28, 28)
label:  (64,)
conv1:  (64, 20, 24, 24)
pool1:  (64, 20, 12, 12)
conv2:  (64, 50, 8, 8)
pool2:  (64, 50, 4, 4)
ip1  :  (64, 500)
ip2  :  (64, 10)
loss :  ()


The blobs provide access to:
- `num`: the number of elements (usually `batch_size`)
- `channels`, `height`, `width`: the dimension of a sample
- `shape`: a tuple with `(num, channels, height, width)`
- `count`: `num * channels * height * width`
- `data`: the data stored in the blob (see forward propagation)
- `diff`: the computed gradient for the blob (see backward propagation)

Be aware that when getting the `data` or `diff` of a blob, it is a reference, or pointer, to the actual memory, so anything changing it will change all current references. To get a snapshot of the value that will not change, copy it with `blob.data.copy()`.

In [6]:
print("Blob attributes:")
[e for e in dir(net.blobs["label"]) if not e.startswith("__")]

Blob attributes:


['channels',
 'count',
 'data',
 'diff',
 'height',
 'num',
 'reshape',
 'shape',
 'width']

To examine the links between the layers, you can access the lists of bottom/top names of the layers. `net.bottom_names` and `net.top_names` contain pseudo-dictionnaries that return the list of blob names that are the bottom/tops of the layer.

In [7]:
net.top_names["mnist"]

['data', 'label']

In [8]:
net.top_names["ip1"]

['ip1']

In [9]:
net.bottom_names["loss"]

['ip2', 'label']

The inputs of the network (layers of type `Input`, not LMDB or HDF5 or other input types), and the outputs of the network (layers whose `top`s are no one's `bottom`) can be accessed with `net.inputs` and `net.outputs`.

In [10]:
net.inputs # No inputs, since our input layer is of type "Data", not "Input"

['data', 'label']

In [11]:
net.outputs # In testing mode, we would also have 'accuracy'

['loss']

## Forward propagation

Now, let's run an example through the network. 

As we haven't covered how to
load the data yet, we're going to generate a random batch with values.

In [12]:
batch = np.random.randn(*net.blobs["data"].shape) * 50 # normal distribution(0, 50), in the shape of the input batch
labels = np.random.randint(0, 10, net.blobs["label"].shape) # random labels

To provide an input for the network, we are going to fill the `data` field of
the input layer with our batch.

We cannot assign directly the field `data`, since it is a C++ field. However,
we can set its contents, like so:

In [13]:
net.blobs["data"].data[...] = batch
net.blobs["label"].data[...] = labels

Running the forward pass is then just a matter of calling `net.forward()`.

In [14]:
net.forward()

{'loss': array(2.3706960678100586, dtype=float32)}

It is also possible to do a partial forward pass by specifying the start and end layers of the forward pass:

In [15]:
res = net.forward(start="mnist", end="conv1")

If the network had `Input` layers (layers of type `Input`), then we could have directly called `net.forward` with the data, as such:

```python
net.forward(data=batch, label=labels)
```

`data` and `label` are the blob names, and `batch` and `labels` are the values.

The output of the network can be consulted at any layer (or blob) by inspecting
the `data` field.

The predictions for each class are given by the output of the last fully
connected layer, `ip2`. To get a single prediction, take the maximum of this
array.

In [16]:
net.blobs["ip2"].data[0]

array([-0.48029846,  0.61164796, -0.92312622, -0.47545347, -0.46422586,
       -0.03909576,  0.40830323, -0.50112289, -0.07321799,  0.5741204 ], dtype=float32)

Output of the loss layer. This is what is minimized during training. If all
goes well, this should decrease during training.

In [17]:
net.blobs["loss"].data

array(2.3706960678100586, dtype=float32)

## Backward propagation

The backward propagation computes the gradient for all the weights of the layers, and all the data blobs of the network, and stores them in the corresponding `diff`. It does not update the weights, but only computes the gradients.

The initial values of the diffs represent the function we are differentiating. The basic setup is to set all the diffs to 0, except the loss layer to 1. This will ensure that the gradients are computed with respect to the loss function. These coefficients are stored in `net.blob_loss_weights`:

In [18]:
net.blob_loss_weights

OrderedDict([('data', 0.0),
             ('label', 0.0),
             ('conv1', 0.0),
             ('pool1', 0.0),
             ('conv2', 0.0),
             ('pool2', 0.0),
             ('ip1', 0.0),
             ('ip2', 0.0),
             ('loss', 1.0)])

Then we can just run `net.backward()` to compute the diffs:

In [19]:
net.backward()
net.layers[list(net._layer_names).index("ip2")].blobs[0].diff # Gradient for the parameters of the ip2 layer

array([[ -1.01645060e-01,  -6.20732643e-03,  -6.90511540e-02, ...,
         -4.16178778e-02,  -5.50730200e-03,  -1.02949061e-03],
       [  5.90243051e-03,   6.79359585e-03,  -5.63192554e-03, ...,
          1.76133737e-02,  -1.25561077e-02,  -7.78268659e-05],
       [ -5.64169660e-02,   1.90727238e-04,  -3.76255438e-02, ...,
         -2.53438223e-02,  -1.07933357e-02,   7.93526197e-05],
       ..., 
       [ -5.50880432e-02,  -9.05541796e-03,  -1.71823744e-02, ...,
         -2.88205203e-02,   3.69150680e-03,   1.57054863e-04],
       [  1.49490433e-02,  -6.27137860e-03,   1.67495478e-02, ...,
          3.13645788e-03,  -2.81939493e-03,   1.18995311e-04],
       [  1.03935905e-01,   2.79635563e-03,   3.92960571e-02, ...,
          3.40057276e-02,   1.03832046e-02,   1.61820513e-04]], dtype=float32)

If we had two loss layers, we could weight one more than the other by settings the diffs. Or to optimize for a specific output, we could set the diffs manually. For instance, to compute the gradients to optimize the output of class 1, we could do:

In [20]:
# We need to clear the previously computed diffs from the layers, otherwise they are just added
for l in net.layers:
    for b in l.blobs:
        b.diff[...] = 0
        
d = net.blobs["ip2"].diff # Top of the ip2 layer
d[...] = 0 # Clear the diff
d[:, 0] = 1 # Optimize for each element of the batch, for class 1 (indexes are 0-based)
net.backward(start="ip2") # Start the backpropagation at the ip2 layer, working down
net.layers[list(net._layer_names).index("ip2")].blobs[0].diff

array([[ 69.95400238,   3.21474099,  39.70861435, ...,  23.07016373,
          4.77206326,   0.09005913],
       [  0.        ,   0.        ,   0.        , ...,   0.        ,
          0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        , ...,   0.        ,
          0.        ,   0.        ],
       ..., 
       [  0.        ,   0.        ,   0.        , ...,   0.        ,
          0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        , ...,   0.        ,
          0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        , ...,   0.        ,
          0.        ,   0.        ]], dtype=float32)

## Updating the weights

To update the weights, we have to apply the diffs to the layer weights, affected by a learning rate.

In [21]:
lr = 0.01
for l in net.layers:
    for b in l.blobs:
        b.data[...] -= lr * b.diff

Now that we have all the elements, we will move on to writing our own custom solver.