Name:   Anh Tuan Tran
Matrikelnummer:  7015463
Email:   antr00001@stud.uni-saarland.de
   
Name:   Deborah Dormah Kanubala
Matrikelnummer:   7025906
Email:  dkanubala@aimsammi.org

Name:    Irem Begüm Gündüz
Matrikelnummer:     7026821
Email: irgu00001@stud.uni-saarland.de

## 4.4.a Building your own feed-forward network

Import numpy, which is really all we need to create our own NN.

In [1]:
import numpy as np

Recall that our simple neural network consisted of two layers. We also added a `ReLU` function as a non-linearity to the output of our intermediate layer. Given an input $\mathbf{x} \in \mathbb{R}^n $ we have

$ \mathbf{h} = f^{(1)}(\mathbf{x}; \mathbf{W},c) = ReLU(\mathbf{W}^\mathsf{T} \mathbf{x} + c) $ 

$ \mathbf{y} = f^{(2)}(\mathbf{h}; \mathbf{w},b) = \text{$ softmax $}( \mathbf{w}^\mathsf{T} \mathbf{h} + b) $

In this exercise you will create your own network. However, we will do it in a way that allows you to specify the depth of network, i.e. we extend our network such that there isn't just one $\mathbf{h}$ intermediate layers, but rather $n$ of them $\mathbf{h}_{i}$ with $i \in \{1,..., n\}$

**NOTE**: You are not allowed to use any built-in functions to calculate the ReLU, Softmax or the forward pass directly.

**NOTE 2**: Remember to include the non-linearity at every layer. Remember to also add the bias to every layer. Finally, remember to apply the softmax in the output layer.

In [2]:
def relu(x):
    """
    Implement the ReLU function as defined in the lecture
    Input: an array of numbers
    Output: ReLU(x)
    """
    # TODO: Implement
    # raise NotImplementedError
    return x * (x > 0)

In [3]:
def softmax(x):
    """
    Implement the `softmax` function as defined in the lecture
    """
    # TODO: Implement
    # raise NotImplementedError
    e_x = np.exp (x)
    ret = e_x / np.sum (e_x, -1)
    return ret

In [4]:
class FFNetwork:
    """
    Class representing the feed-forward neural network
    """
    def __init__(self, input_dim: int, hidden_dim: int,
                 output_dim: int, hidden_size: int):
        """
        Args:
        input_dim: dimensionality of `x`
        hidden_dim: dimensionality of the intermediate `h_i`
        output_dim: dimensionality of `y`
        hidden_size: number of intermediate layers `h_i`
        """
        # TODO: Implement
        # Initialize each layer as a random matrix of the
        # appropriate dimensions
        
        ## SOLUTION ##
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.output_dim = output_dim
        self.hidden_size = hidden_size
        
        self.linear_weights = []
        for i in range (hidden_size):
            if i == 0:
                inp_dim = input_dim
                out_dim = hidden_dim
            elif i != hidden_size -1:
                inp_dim = hidden_dim
                out_dim = hidden_dim
            else:
                inp_dim = hidden_dim
                out_dim = output_dim
            w = np.random.randn (inp_dim, out_dim)
            bias = np.random.randn (out_dim)
            self.linear_weights.append ({
                'w': w,
                'bias': bias,
            })
        
        ## SOLUTION ##
    
    def forward(self, x):
        """
        Args:
        x: input to the neural network
        
        Output:
        `y`, i.e. the prediction of the network
        
        Note: Remember to apply the ReLU and add the bias for each layer
        """
        # TODO: Implement the forward pass of the network,
        # i.e. calculate `y` from an input `x`
        # Remember that each layer's output is calculated by
        # f^(i) = ReLU(W_i^T * f^(i-1) + b_i)
        res = x
        
        ## SOLUTION ##
        res = np.array (res)
        for i in range (self.hidden_size):
            res = res @ self.linear_weights[i]['w'] + self.linear_weights[i]['bias']
            res = relu (res)
        res = softmax (res)
        ## SOLUTION ##
        
        return res

Your implementation needs to be compatible with the following test code:

In [5]:
np.random.seed(0)

# A configuration that reflects the example from the lecture
# i.e. our input is of size 2, our intermediate layers are also of size 2,
# and we will only have 1 hidden layer.
network = FFNetwork(2, 2, 2, 1)
network.forward([1.,0.])

array([0.97420925, 0.02579075])

Disclaimer: Do not expect a correct output at this stage, you are simply building the structure of the network.

However, our setup also allows us to create larger networks:

In [6]:
np.random.seed(0)

network = FFNetwork(2, 3, 2, 4)
network.forward([1.,0.]) 

array([0.2164466, 0.7835534])

Some sanity checks:

1. You should be seeing the number of units you specified as output units in your output.
1. The numbers in your outputs should be in the range $[0,1]$
1. The numbers should add up to $1$
1. Varying the structure of the network should not break its functionality.

In [7]:
# sanity check

np.random.seed(0)


# TVarying the structure of the network should not break its functionality.

for inp_dim in range (2, 5):
    for out_dim in range (2, 5):
        for hidden_dim in range (2, 5):
            for hidden_size in range (2, 5):
                network = FFNetwork(inp_dim, hidden_dim, out_dim, hidden_size)
                output = network.forward(np.random.randn (inp_dim)) 

                # You should be seeing the number of units you specified as output units in your output.
                assert (len (output) == network.output_dim)
                # The numbers in your outputs should be in the range [0,1]
                assert (not np.any (output < 0) & np.any (output > 1))
                # The numbers should add up to 1
                assert (np.abs (np.sum (output) - 1 < 1e-9))

## 4.4.b Implementing a feed-forward network using `torch`

### 4.4.b.1 Creating the network (1 point)

For this we will be using the `nn` module of `torch`, which contains modules representing types of layers. In your case, the specific relevant module would be that of a *fully connected linear layer*.

We will also be using the `nn.functional` module to take advantage of the built in functions for ReLU and Softmax. In this exercise, you are allowed to use them.

In [8]:
import torch
import torch.nn.functional as F

from torch import nn

In [9]:
class TorchFFNetwork(nn.Module):
    """
    A `torch` version of the network implemented for 4.3.b
    """
    def __init__(self, input_dim: int, hidden_dim: int,
                 output_dim: int, hidden_size: int):
        """
        Args:
        input_dim: dimensionality of `x`
        hidden_dim: dimensionality of the intermediate `h_i`
        output_dim: dimensionality of `y`
        hidden_size: number of intermediate layers `h_i`
        """
        ## SOLUTION ##
        super(TorchFFNetwork, self).__init__()
        
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.output_dim = output_dim
        self.hidden_size = hidden_size
        
        self.linear_layers = nn.ModuleList ([])
        for i in range (hidden_size):
            if i == 0:
                inp_dim = input_dim
                out_dim = hidden_dim
            elif i != hidden_size -1:
                inp_dim = hidden_dim
                out_dim = hidden_dim
            else:
                inp_dim = hidden_dim
                out_dim = output_dim
            self.linear_layers.append (nn.Linear (inp_dim, out_dim, bias=True))
        self.softmax = nn.Softmax (dim=1)
        self.relu = nn.ReLU ()
        ## SOLUTION ##

    def forward(self, x):
        ## SOLUTION ##
        if (len (x.shape) == 1):
            # Extending the first dimension for batch 
            res = x[None] 
        else:
            res = x
        for i in range (self.hidden_size):
            res = self.linear_layers [i] (res)
            res = self.relu (res)
        res = self.softmax (res)
        
        if (len (x.shape) == 1):
            return res [0]
        else:
            return res
        
        ## SOLUTION ##
        
 


Your implementation, once more, needs to be compatible with the following test code:

In [10]:
torch_network = TorchFFNetwork(2, 3, 2, 1)

In [11]:
with torch.no_grad():
    print(torch_network(torch.tensor([1.,0.])))

tensor([0.2167, 0.5665, 0.2167])


Note that the `forward` method is automatically called when you call your network object.

### 4.4.b.2 Training your network (1 point)

Even though we have not covered how training actually works, we will proceed with the training of the a neural network as a blackbox procedure and we will later on learn the internals of the training process (and even implement them ourselves!).

For now, train a neural network (the one you created above) to learn the XOR operation. You are to create a neural network with the appropriate number of input variables, an intermediate hidden layer with 2 units and an output layer with 2 units.

Notes:
- Please read [this introduction to the optimization loop in PyTorch](https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html#optimization-loop). It should give you a good overview to what PyTorch needs from you to train a neural network.
- You are to train the network until the network learns the operation. Remember to set your random seeds so the results are reproducible.
- There are many optimizers available and Adam is an optimizer that's more complex than SGD. It has not yet been covered in the lecture but its usage in code is equivalent to that of SGD and performs much better.

In [12]:
# Our training X, where each instance includes an x1 and an x2, (where the operation is defined as x1 XOR x2)
training_x = [[0,0], [0,1], [1,0], [1,1]]

# We have only covered softmax in the lecture, so we format the output as follows:
training_y = [[1,0], [0,1], [0,1], [1,0]]

# The Y is formatted such that the its first element corresponds to the probability of the XOR resulting in a 0
# and the second element to the probability of the XOR resulting in a 1

################################################################
# TODO: Adapt the training set so it can be used with `pytorch`
################################################################


training_x = torch.tensor (training_x, dtype=torch.float32)
training_y = torch.tensor (training_y, dtype=torch.float32)
training_data = (training_x, training_y)

In [55]:
# Create the model from the previous class and pick a learning rate
torch.manual_seed(42)
(inp_dim, hidden_dim, out_dim, hidden_size) = (2, 10, 2, 3)
model = TorchFFNetwork(inp_dim, hidden_dim, out_dim, hidden_size)
model.train ()
learning_rate = 1e-2
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
loss_fn = torch.nn.BCEWithLogitsLoss()
loss_fn = nn.MSELoss()

In [56]:
def train_loop(data, model, loss_fn, optimizer):
    # TODO: Implement
    
    num_epoch = 5000
    training_x, training_y = data
    for i_epoch in range (num_epoch):
        epoch_losses = []
        for x, y in zip (training_x, training_y):
            optimizer.zero_grad()
            y_pred = model (x)
            loss = loss_fn (y_pred, y)
            loss.backward()
            optimizer.step()
            epoch_losses.append (loss.detach ().numpy ())
        if i_epoch % 100 == 0:
            print ("epoch:", i_epoch, "mean epoch loss:", np.mean (epoch_losses))


In [57]:
# TODO: Run training
train_loop (training_data, model, loss_fn, optimizer)

epoch: 0 mean epoch loss: 0.25630927
epoch: 100 mean epoch loss: 0.0012917169
epoch: 200 mean epoch loss: 0.00022464931
epoch: 300 mean epoch loss: 8.97075e-05
epoch: 400 mean epoch loss: 4.6921836e-05
epoch: 500 mean epoch loss: 2.8103903e-05
epoch: 600 mean epoch loss: 1.8245566e-05
epoch: 700 mean epoch loss: 1.2460289e-05
epoch: 800 mean epoch loss: 8.836173e-06
epoch: 900 mean epoch loss: 6.4254327e-06
epoch: 1000 mean epoch loss: 4.7666604e-06
epoch: 1100 mean epoch loss: 3.5876328e-06
epoch: 1200 mean epoch loss: 2.7335243e-06
epoch: 1300 mean epoch loss: 2.1008789e-06
epoch: 1400 mean epoch loss: 1.625893e-06
epoch: 1500 mean epoch loss: 1.2671265e-06
epoch: 1600 mean epoch loss: 9.919838e-07
epoch: 1700 mean epoch loss: 7.792564e-07
epoch: 1800 mean epoch loss: 6.1426437e-07
epoch: 1900 mean epoch loss: 4.855406e-07
epoch: 2000 mean epoch loss: 3.8477003e-07
epoch: 2100 mean epoch loss: 3.0529534e-07
epoch: 2200 mean epoch loss: 2.427858e-07
epoch: 2300 mean epoch loss: 1.9324

In [58]:
# testing
with torch.no_grad ():
    for x, y in zip (training_x, training_y):
        optimizer.zero_grad()
        y_pred = model (x)
        print ('input', x)
        print ('output', y_pred)
        print ('gt output', y)
        print ()

input tensor([0., 0.])
output tensor([9.9997e-01, 2.7327e-05])
gt output tensor([1., 0.])

input tensor([0., 1.])
output tensor([2.6398e-05, 9.9997e-01])
gt output tensor([0., 1.])

input tensor([1., 0.])
output tensor([1.8236e-05, 9.9998e-01])
gt output tensor([0., 1.])

input tensor([1., 1.])
output tensor([9.9998e-01, 1.7359e-05])
gt output tensor([1., 0.])

