<div style="color: green; font-weight: bold">
Comments for Exercise 01: <br/>
Comment here
</div>

<div style="color: green; font-weight: bold">
Comments for Exercise 02: <br/>
Comment here
</div>

Exercise 03

In [14]:
import numpy as np
from sklearn import datasets

<div style="color: green; font-weight: bold">
Comments for Exercise 03: <br/>
Our forward pass does the same as the sample solution. The sample solution however works with copies of the input data to then make the calculations in place with array manipulation. <br/>
Our backward propagation however is wrong. We do not need to update our parameters B and b in this layer and do not multiply the upstream gradient with the parameter B. The sample solution which applies the relu to the input is sufficient.
</div>

In [15]:
class ReLULayer(object):
    def forward(self, input):
        # remember the input for later backpropagation
        self.input = input
        # return the ReLU of the input
        relu = np.maximum(0, input)
        return relu

    def backward(self, upstream_gradient):
        # compute the derivative of the weights from upstream_gradient and the stored input
        self.grad_b = np.sum(upstream_gradient, axis=0)
        self.grad_B = np.dot(self.input.T, upstream_gradient)
        # compute the downstream gradient to be passed to the preceding layer
        downstream_gradient = np.dot(upstream_gradient, self.B.T)
        return downstream_gradient

    def update(self, learning_rate):
        pass # ReLU is parameter-free

<div style="color: green; font-weight: bold">
Comments for Exercise 03: <br/>
The forward pass is functionally the same as the sample solution, simply implementing the softmax function. <br/>
The backward pass is also similar to the sample solution but written differently
</div>

In [16]:
class OutputLayer(object):
    def __init__(self, n_classes):
        self.n_classes = n_classes

    def forward(self, input):
        # remember the input for later backpropagation
        self.input = input
        # return the softmax of the input        
        if np.isnan(input).any():
            input[np.isnan(input)] = 0.0
        input_shifted = input - np.max(input, axis=1, keepdims=True)
        exp_input = np.exp(input_shifted)
        softmax = exp_input / np.sum(exp_input, axis=1, keepdims=True)
        return softmax

    def backward(self, predicted_posteriors, true_labels):
        # return the loss derivative with respect to the stored inputs
        # (use cross-entropy loss and the chain rule for softmax,
        #  as derived in the lecture)
        downstream_gradient = (predicted_posteriors - true_labels) / len(true_labels) # your code here
        return downstream_gradient

    def update(self, learning_rate):
        pass # softmax is parameter-free

<div style="color: green; font-weight: bold">
Comments for Exercise 03: <br/>
The sample solution specifies a span the values can take, while we do not. This is likely a better approach as the initial guess will be better. <br/>
Our forward pass is the same as the sample solution, with the only difference being added exception handling in our solution. <br/>
The backwards propagation is identical to the sample solution
</div>

In [17]:
class LinearLayer(object):
    def __init__(self, n_inputs, n_outputs):
        self.n_inputs  = n_inputs
        self.n_outputs = n_outputs
        # randomly initialize weights and intercepts
        self.B = np.random.normal(size=(self.n_inputs, self.n_outputs)) 
        self.b = np.random.normal(size=n_outputs) 

    def forward(self, input):
        # remember the input for later backpropagation
        global preactivations
        self.input = input
        # compute the scalar product of input and weights
        # (these are the preactivations for the subsequent non-linear layer)
        try:
            preactivations = np.dot(input, self.B) + self.b
        except:
            print("Exception")
        return preactivations

    def backward(self, upstream_gradient):
        # compute the derivative of the weights from
        # upstream_gradient and the stored input
        # b is the first entry of Beta so it is updated by multiplying the first entry of the upstream gradient with it
        # Beta is updated by multiplying it with the upstream gradient element wise
        self.grad_b = np.sum(upstream_gradient, axis=0)
        self.grad_B = np.dot(self.input.T, upstream_gradient)  # your code here
        # compute the downstream gradient to be passed to the preceding layer
        downstream_gradient = np.dot(upstream_gradient, self.B.T)
        return downstream_gradient

    def update(self, learning_rate):
        # update the weights by batch gradient descent
        self.B = self.B - learning_rate * self.grad_B
        self.b = self.b - learning_rate * self.grad_b

<div style="color: green; font-weight: bold">
Comments for Exercise 03: <br/>
We essentially have same approach as the sample solution. We have however some inefficiencies and errors in our code. First of all we calculate the initial gradient twice. Once before the loop and once inside the loop. Second of all we skip the backward function of the relulayer by only looking at the output layer and the linearlayers. So we should adapt the iteration of the for loop to also include the relulayer and remove the duplicate calculations.
</div>

In [18]:
class MLP(object):
    def __init__(self, n_features, layer_sizes):
        # constuct a multi-layer perceptron
        # with ReLU activation in the hidden layers and softmax output
        # (i.e. it predicts the posterior probability of a classification problem)
        #
        # n_features: number of inputs
        # len(layer_size): number of layers
        # layer_size[k]: number of neurons in layer k
        # (specifically: layer_sizes[-1] is the number of classes)
        self.n_layers = len(layer_sizes)
        self.layers   = []

        # create interior layers (linear + ReLU)
        n_in = n_features
        for n_out in layer_sizes[:-1]:
            self.layers.append(LinearLayer(n_in, n_out))
            self.layers.append(ReLULayer())
            n_in = n_out

        # create last linear layer + output layer
        n_out = layer_sizes[-1]
        self.layers.append(LinearLayer(n_in, n_out))
        self.layers.append(OutputLayer(n_out))

    def forward(self, X):
        # X is a mini-batch of instances
        batch_size = X.shape[0]
        # flatten the other dimensions of X (in case instances are images)
        X = X.reshape(batch_size, -1)

        # compute the forward pass
        # (implicitly stores internal activations for later backpropagation)
        result = X
        for layer in self.layers:
            result = layer.forward(result)
        return result

    def backward(self, predicted_posteriors, true_classes):
        # perform backpropagation w.r.t. the prediction for the latest mini-batch X
        # Initialize the loss gradient by using the backward function in the output layer
        batch_size = len(true_classes)
        true_classes = true_classes.reshape((batch_size, 1))
        downstream_gradient = (predicted_posteriors - true_classes) / batch_size
        for layer in reversed(self.layers):
            if isinstance(layer, OutputLayer):
                downstream_gradient = layer.backward(predicted_posteriors, true_classes)
            elif isinstance(layer, LinearLayer):
                downstream_gradient = layer.backward(downstream_gradient)
        return downstream_gradient

    def update(self, X, Y, learning_rate):
        posteriors = self.forward(X)
        self.backward(posteriors, Y)
        for layer in self.layers:
            layer.update(learning_rate)

    def train(self, x, y, n_epochs, batch_size, learning_rate):
        N = len(x)
        n_batches = N // batch_size
        for i in range(n_epochs):
            # print("Epoch", i)
            # reorder data for every epoch
            # (i.e. sample mini-batches without replacement)
            permutation = np.random.permutation(N)

            for batch in range(n_batches):
                # create mini-batch
                start = batch * batch_size
                x_batch = x[permutation[start:start+batch_size]]
                y_batch = y[permutation[start:start+batch_size]]

                # perform one forward and backward pass and update network parameters
                self.update(x_batch, y_batch, learning_rate)

<div style="color: green; font-weight: bold">
Comments for Exercise 03: <br/>
Our solution is nearly identical to the sample solution only doing the mean and sum simultaneously instead of after one another.
</div>

In [23]:



if __name__=="__main__":

    # set training/test set size
    N = 2000

    # create training and test data
    X_train, Y_train = datasets.make_moons(N, noise=0.05)
    X_test,  Y_test  = datasets.make_moons(N, noise=0.05)
    n_features = 2
    n_classes  = 2

    # standardize features to be in [-1, 1]
    offset  = X_train.min(axis=0)
    scaling = X_train.max(axis=0) - offset
    X_train = ((X_train - offset) / scaling - 0.5) * 2.0
    X_test  = ((X_test  - offset) / scaling - 0.5) * 2.0

    # set hyperparameters (play with these!)
    layer_sizes = [5, 5, n_classes]
    n_epochs = 5
    batch_size = 200
    learning_rate = 0.05

    # create network
    network = MLP(n_features, layer_sizes)

    # train
    network.train(X_train, Y_train, n_epochs, batch_size, learning_rate)

    # test
    predicted_posteriors = network.forward(X_test)
    # determine class predictions from posteriors by winner-takes-all rule
    predicted_classes = np.argmax(predicted_posteriors, axis=1)
    # compute and output the error rate of predicted_classes
    error_rate = np.mean(predicted_classes != Y_test.reshape(-1))
    print("error rate:", error_rate)


error rate: 0.83


<div style="color: green; font-weight: bold">
Comments for Exercise 03: <br/>
The last part of the exercise of trying the process for different variations in the variables was not present in the sample solution.
</div>

In [21]:
def calculate_validation_error(network, X_val, Y_val):
    predicted_posteriors = network.forward(X_val)
    predicted_classes = np.argmax(predicted_posteriors, axis=1)
    error_rate = np.mean(predicted_classes != Y_val)
    return error_rate

layer_sizes_list = [[2, 2, n_classes], [3, 3, n_classes], [5, 5, n_classes], [30, 30, n_classes]]
print("\nValidation Error for various layers:")

# Train and compare the networks
for layer_sizes in layer_sizes_list:
    network = MLP(n_features, layer_sizes)
    network.train(X_train, Y_train, n_epochs=5, batch_size=200, learning_rate=0.05)
    validation_error = calculate_validation_error(network, X_test, Y_test)
    print("-----------------------------")
    print("Layer Sizes: ", layer_sizes)
    print("Validation Error: ", validation_error)


Validation Error for various layers:
-----------------------------
Layer Sizes:  [2, 2, 2]
Validation Error:  0.866
-----------------------------
Layer Sizes:  [3, 3, 2]
Validation Error:  0.5
-----------------------------
Layer Sizes:  [5, 5, 2]
Validation Error:  0.121
-----------------------------
Layer Sizes:  [30, 30, 2]
Validation Error:  0.5


  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
