In [1]:
# reloading modified files
%load_ext autoreload
%autoreload 2

import numpy as np
from modules import Network, LinearLayer, Sigmoid, ReLU, MSE, CrossEntropyLoss

# Forward Pass

In this exercise you will implement the functionality for some of the basic building blocks of artificial neural networks. You will create a network consisting of several layers each of which implements the `forward` function inherited from the base class `Module`. A skeleton for your implementation is provided in `modules.py`. Work through this notebook to validate your code.

## Network

As mentioned in the lecture, the notion of a *layer* is not well-defined and we may even regard a whole network as a layer predicting a desired output from input data. Therefore the class ```Network``` will be a subclass of ```Module```. Each subclass of ```Module``` has to provide a ```forward``` function mapping input to output. An instance of ```Network``` will itself store several layers of class ```Module```. Calling ```forward``` is supposed to sequentially execute ```forward``` on the layers stored in the network. Follow the comments below and complete the implementation in ```modules.py```.

## Layers

This section introduces a selection of common layers used in neural networks.

### Linear Layer [1 point]

A linear layer performs an affine-linear transformation mapping input $x$ to $Wx + b$. Implement the class `LinearLayer` and test you implementation by running the cell below:

In [2]:
W = np.ones((3, 4))
b = np.arange(1, 4)

linear = LinearLayer(W, b)
x = np.ones(4)

assert np.abs(np.max(linear.forward(x) - np.array([5, 6, 7]))) < 1e-6

### Sigmoid [1 point]

The sigmoid activation maps input $x$ to ${e^x} / (1 + e^x)$. Complete the forward pass of the class `Sigmoid`:

In [3]:
out = Sigmoid().forward(np.array([0, -1, 10]))
out_expected = np.array([0.5, 0.2689414, 1.0])
assert np.abs(np.max(out - out_expected)) < 1e-6

### ReLU [1 point]

**Re**ctified __L__inear **U**nits are the most common activations in use. In their forward pass non-negative input values are left unchanged and negative values are set to $0$, i.e. input $x$ is mapped to $\max(x, 0)$, where the maximum is taken element-wise. 

Test your implementation:

In [4]:
out = ReLU().forward(np.array([-3.14, 0, 1, 10]))
out_expected = np.array([0., 0., 1., 10])
assert np.abs(np.max(out - out_expected)) < 1e-6

## Loss

A network is supposed to predict a desired output from input data. The quality of the prediction is assessed by *loss functions* comparing the predicted output with the target or ground truth. In our implementation a loss is a subclass of ```Module```. It therefore also features a ```forward``` function. All loss functions will take the output of a network and and a corresponding target vector as input for their forward pass.

### MSE [1 point]

The MSE loss has already been discussed in the context of linear regression. Implement the `forward` function calculating the mean squared difference of prediction and target.

Test your implementation:

In [5]:
out = MSE().forward(np.array([0., 1., 2., 1.5]), np.array([0., 1., 1., -1.]))
out_expected = 7.25/4
assert np.abs(np.max(out - out_expected)) < 1e-6

### Cross Entropy [1 point]

Many problem instances in the field of machine learning are formulated as *classification* tasks. Given some input $x$ we want to predict a discrete class label $l\in\{1, \ldots, L\}$. In order to train neural networks, the forward pass has to be differentiable. As the prediction of discrete values is not differentiable, we rather predict a vector in $\mathbb{R}^L$ representing for each label the probability to be correct. We can actually transform this vector into a valid probability distribution using the softmax function (https://en.wikipedia.org/wiki/Softmax_function)
$$
\sigma \, \colon \, \mathbb{R}^L \to \left\{ \sigma \in \mathbb{R}^L \, \middle| \, \sigma_i > 0, \sum_{i=1}^L \sigma_i = 1 \right\}, \, \sigma_j ( z ) = \frac{e^{z_j}}{\sum_{i=1}^L e^{z_i}} \text{ for $j \in \left\{ 1, \ldots, L \right\}$}.
$$

This enables us to define a proper loss function by taking the negative logarithm of the predicted probability of the target label $l$, i.e. $\ell (x, l) = -\log (\sigma_l (x))$. Implement the cross entropy loss, where $x$ is the prediction of our network and $l$ is the target label.

Test your implementation:

In [6]:
out = CrossEntropyLoss().forward(np.array([-3.14, 0, 1, 10]), 0)
out_expected = 13.1401
assert np.abs(np.max(out - out_expected)) < 1e-4

## Layers in a network [1 point]

As a last step, implement the class `Network` following the instructions in `modules.py`.

In [7]:
W1 = np.ones((3, 4))
b1 = np.linspace(1, 3, 3)
linear1 = LinearLayer(W1, b1)

W2 = np.ones((1, 3))
b2 = np.ones((1))
linear2 = LinearLayer(W2, b2)

relu = ReLU()

net = Network([linear1, relu])

net.add_layer(linear2)

assert np.abs(np.max(net.forward(np.array([-3.14, 0, 1, 10])) - 30.58)) < 1e-4