# Deep Learning for Beginners - Programming Exercises

by Aline Sindel, Katharina Breininger and Tobias Würfl

Pattern Recognition Lab, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany 
# Exercise 6



In [None]:
# minor set-up work
import numpy as np # we will definitely need this

# automatic reloading
%load_ext autoreload
%autoreload 2

%matplotlib inline

## LeNet

As the last part of the programming exercises, we use our developed operators to construct a simple neural network inspired by the traditional LeNet architecture:

<figure>
<img src="files/img/lenet.jpg" width="600">
<figcaption><center>Source: LeCun et al, 1998.$^1$</center></figcaption>
</figure>

Use two convolutional layers with $5 \times 5$ kernels and $6$ respectively $10$ channels. Each convolution is followed by a ReLU unit and max pooling with a neighborhood and stride of 2 in each dimension. The top of the network is formed by three FC layers with ReLU activations producing outputs of dimensionality $120$, $84$ and subsequently the number of categories. Finally, use the SoftMaxCrossEntropyLoss as loss layer.

First, have a look at the class ```NeuralNetwork```, that provides the basic framework in which you can use the different layers and stack them together to a functioning network. You don't need to adapt this class, but you can use it to implement the LeNet architecture. You may also want to refer back to the description of the framework (Exercise 3).

### Implementation task

Next, implement the LeNet architecture in the ```build``` function and train and test your network using the scripts provided below. Then, choose one of the extra tasks below to tune your network.

**Extra tasks**: Experiment for example with the activation function and DropOut, tune the learning rate or look at the effect of initialization. Feel free to add your own evaluations and plots. You can get the full test data of the MNIST data object by calling ```net.data_layer.get_test_set```.

$^1$ LeCun Y., Bottou L., Bengio Y. and Haffner P. Gradient-based Learning Applied to Document Recognition. In Proc. IEEE, 1989.

In [None]:
# %load src/network.py

# Nothing to do in this cell: Just make yourself familiar with the NeuralNetwork class.
class NeuralNetwork:
    def __init__(self, weights_initializer, bias_initializer):
        # list which will contain the loss after training
        self.loss = []
        self.data_layer = None   # the layer providing data
        self.loss_layer = None   # the layer calculating the loss and the prediction
        self.layers = []
        self.weights_initializer = weights_initializer
        self.bias_initializer = bias_initializer
        self.label_tensor = None # the labels of the current iteration

    def append_fixed_layer(self, layer):
        """ Add a non-trainable layer to the network. """
        self.layers.append(layer)
    
    def append_trainable_layer(self, layer):
        """ Add a new layer with trainable parameters to the network. Initialize the parameters of 
        the network using the object's initializers for weights and bias.
        """
        layer.initialize(self.weights_initializer, self.bias_initializer)
        self.layers.append(layer)

    def forward(self):
        """ Compute the forward pass through the network. """
        # fetch some training data
        input_tensor, self.label_tensor = self.data_layer.forward()
        # defer iterating through the network
        activation_tensor = self.__forward_input(input_tensor)
        # calculate the loss of the network using the final loss layer
        return self.loss_layer.forward(activation_tensor, self.label_tensor)

    def __forward_input(self, input_tensor):
        """ Compute the forward pass through the network, stopping before the 
            loss layer.
            param: input_tensor (np.ndarray): input to the network
            returns: activation of the last "regular" layer
        """
        activation_tensor = input_tensor
        # pass the input up the network
        for layer in self.layers:
            activation_tensor = layer.forward(activation_tensor)
        # return the activation of the last layer
        return activation_tensor

    def backward(self):
        """ Perform the backward pass during training. """
        error_tensor = self.loss_layer.backward(self.label_tensor)
        # pass back the error recursively
        for layer in reversed(self.layers):
            error_tensor = layer.backward(error_tensor)

    def train(self, iterations):
        """ Train the network for a fixed number of steps.
            param: iterations (int): number of iterations for training 
        """
        for layer in self.layers:
            layer.phase = Phase.train  # Make sure phase is set to "train" for all layers
        for i in range(iterations):
            loss = self.forward()  # go up the network
            self.loss.append(loss)  # save the loss
            self.backward()  # and down again
            print('.', end='')


    def test(self, input_tensor):
        """ Apply the (trained) network to input data to generate a prediction. 
            param: input_tensor (nd.nparray): input (image or vector)
            returns (np.ndarray): prediction by the network
        """
        for layer in self.layers:
            layer.phase = Phase.test  # Make sure phase is set to "test" for all layers
        activation_tensor = self.__forward_input(input_tensor)
        return self.loss_layer.predict(activation_tensor)

In [None]:
# %load src/le_net_0.py
#----------------------------------
# Exercise: LeNet
#----------------------------------
# The original python file can be reloaded by typing %load src/le_net_0.py in the first line of this cell.
# After successfully solving this exercise, type the following command in the first line of this cell:
# %%writefile src/le_net.py
# This will save the result to a python file, which you will need for the next exercises.

from src.network import NeuralNetwork
## Load here your implementations of the previous exercises
from src.layers.initializers import He, Const, UniformRandom
from src.layers.conv import FlattenLayer, ConvolutionalLayer
from src.layers.fully_connected import FullyConnectedLayer
from src.layers.pooling import MaxPoolLayer
from src.layers.activation_functions import ReLU, Sigmoid
from src.layers.softmax_crossentropy import SoftMaxCrossEntropyLoss

def build():
    """ returns: a neural network architecture built according to the provided specification
    """ 
    
    net = NeuralNetwork(He(), Const(0.1))
    learning_rate = 0.001
    categories = 10  # MNIST, numbers 0-9
    
    # TODO: Implement the architecture of LeNet by adding layers to net. 
    # Have a look at the NeuralNetwork class how layers can be added.
    # To call the constructors of the layers, have a look at your implementations of the previous exercises.
    
    return net

In [None]:
# %load src/train_mnist.py
import matplotlib
import numpy as np
import matplotlib.pyplot as plt

net = build()

from Tests import Helpers
# TODO: you can change the parameters, depending on the batch size and how many iterations you choose, the training 
# script can run for a few minutes or longer.

# parameters
batch_size = 20 #set here the batch size
net.data_layer = Helpers.MNISTData(batch_size)
n_iters = 100 #set here the number of iterations

# training
net.train(n_iters)

# plotting
plt.plot(range(n_iters), net.loss)
plt.xlabel('iterations')
plt.ylabel('softmax loss')
plt.show()


In [None]:
# %load src/test_mnist.py
# Perform the prediction for a random test sample from the dataset:
x, l = net.data_layer.get_random_test_sample()

# plotting
fig = plt.figure(figsize=(5, 5))
plt.imshow(x[:28*28].reshape(28, 28), cmap='gray')
plt.title('Input image')
plt.axis('off')
plt.show()

# network prediction
print('Prediction with highest output: {}'.format(np.argmax(net.test(x))))
print('Ground truth: {}'.format(np.argmax(l)))


## Summary and Outlook
In the programming exercises, we implemented some of the most common building blocks of neural networks, including fully connected layers, activation functions, convolutional layers and regularization operators. Finally, we combined these operators to working network.

We covered only a small subset of elements that are relevant for neural networks. We encourage you to play with other operators, for example batch normalization$^2$, alternative activation functions, initialization strategies or recurrent units. You may also refactor the framework to experiment with different optimizers, like SGD with momentum, Adam or AdaGrad, or extend the framework to allow for weight decay.

We hope you enjoyed the programming exercises and gained a deeper understanding of neural network operators and frameworks. Have fun on your journey further into deep learning and neural networks!

$^2$ Ioffe S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proc. ICML, 2015.