Name: Subrat Kishore Dutta  
Matrikelnummer:  7028082
Email: subratkishoredutta1234@gmail.com,sudu00001@stud.uni-saarland.de  
   
Name: Prathvish Mithare 
Matrikelnummer:  7028692
Email:   prmi00001@stud.uni-saarland.de

## 4.4.a Building your own feed-forward network

Import numpy, which is really all we need to create our own NN.

In [265]:
import numpy as np

Recall that our simple neural network consisted of two layers. We also added a `ReLU` function as a non-linearity to the output of our intermediate layer. Given an input $\mathbf{x} \in \mathbb{R}^n $ we have

$ \mathbf{h} = f^{(1)}(\mathbf{x}; \mathbf{W},c) = ReLU(\mathbf{W}^\mathsf{T} \mathbf{x} + c) $ 

$ \mathbf{y} = f^{(2)}(\mathbf{h}; \mathbf{w},b) = \text{$ softmax $}( \mathbf{w}^\mathsf{T} \mathbf{h} + b) $

In this exercise you will create your own network. However, we will do it in a way that allows you to specify the depth of network, i.e. we extend our network such that there isn't just one $\mathbf{h}$ intermediate layers, but rather $n$ of them $\mathbf{h}_{i}$ with $i \in \{1,..., n\}$

**NOTE**: You are not allowed to use any built-in functions to calculate the ReLU, Softmax or the forward pass directly.

**NOTE 2**: Remember to include the non-linearity at every layer. Remember to also add the bias to every layer. Finally, remember to apply the softmax in the output layer.

In [266]:
def relu(x):
    """
    Implement the ReLU function as defined in the lecture
    Input: an array of numbers
    Output: ReLU(x)
    """
    # TODO: Implement
    return np.maximum(x,0)

In [267]:
def softmax(x):
    """
    Implement the `softmax` function as defined in the lecture
    """
    # TODO: Implement
    s=np.exp(x) / (np.exp(x).sum())
    return s

In [271]:
class FFNetwork:
    """
    Class representing the feed-forward neural network
    """
    def __init__(self, input_dim: int, hidden_dim: int,
                 output_dim: int, hidden_size: int):
        """
        Args:
        input_dim: dimensionality of `x`
        hidden_dim: dimensionality of the intermediate `h_i`
        output_dim: dimensionality of `y`
        hidden_size: number of intermediate layers `h_i`
        """
        # TODO: Implement
        # Initialize each layer as a random matrix of the
        # appropriate dimensions
        
        ## SOLUTION ##
        self.i=np.random.randn(input_dim+1,hidden_dim)#+1 for the bias
        self.h=np.random.randn(hidden_size,hidden_dim+1,hidden_dim)#+1 fot the bias
        self.o=np.random.randn(hidden_dim+1,output_dim)
        ## SOLUTION ##
    
    def forward(self, x):
        """
        Args:
        x: input to the neural network
        
        Output:
        `y`, i.e. the prediction of the network
        
        Note: Remember to apply the ReLU and add the bias for each layer
        """
        # TODO: Implement the forward pass of the network,
        # i.e. calculate `y` from an input `x`
        # Remember that each layer's output is calculated by
        # f^(i) = ReLU(W_i^T * f^(i-1) + b_i)
        ## SOLUTION ##
        
        res = np.array([x])
        res=np.hstack((res,np.ones((res.shape[0],1))))
        res=relu(np.matmul(res,self.i))
        for j in range(self.h.shape[0]):
            res=np.hstack((res,np.ones((res.shape[0],1))))
            res=relu(np.matmul(res,self.h[j]))
        res=np.hstack((res,np.ones((res.shape[0],1))))
        res=softmax(np.matmul(res,self.o))
        
        
        ## SOLUTION ##
        
        return res

Your implementation needs to be compatible with the following test code:

In [272]:
np.random.seed(7)

# A configuration that reflects the example from the lecture
# i.e. our input is of size 2, our intermediate layers are also of size 2,
# and we will only have 1 hidden layer.
network = FFNetwork(2, 2, 2, 1)
network.forward([-1.,0.])

array([[0.63720011, 0.36279989]])

Disclaimer: Do not expect a correct output at this stage, you are simply building the structure of the network.

However, our setup also allows us to create larger networks:

In [273]:
np.random.seed(0)

network = FFNetwork(2, 3, 2, 4)
network.forward([1.,0.]) 

array([[0.12968929, 0.87031071]])

Some sanity checks:

1. You should be seeing the number of units you specified as output units in your output.
1. The numbers in your outputs should be in the range $[0,1]$
1. The numbers should add up to $1$
1. Varying the structure of the network should not break its functionality.

## 4.4.b Implementing a feed-forward network using `torch`

### 4.4.b.1 Creating the network (1 point)

For this we will be using the `nn` module of `torch`, which contains modules representing types of layers. In your case, the specific relevant module would be that of a *fully connected linear layer*.

We will also be using the `nn.functional` module to take advantage of the built in functions for ReLU and Softmax. In this exercise, you are allowed to use them.

In [274]:
import torch
import torch.nn.functional as F

from torch import nn

In [275]:
class TorchFFNetwork(nn.Module):
    """
    A `torch` version of the network implemented for 4.3.b
    """
    def __init__(self, input_dim: int, hidden_dim: int,
                 output_dim: int, hidden_size: int):
        """
        Args:
        input_dim: dimensionality of `x`
        hidden_dim: dimensionality of the intermediate `h_i`
        output_dim: dimensionality of `y`
        hidden_size: number of intermediate layers `h_i`
        """
        ## SOLUTION ##
        super(TorchFFNetwork, self).__init__()
        self.input_dim = input_dim
        self.hidden_dim  = hidden_dim
        self.output_dim = output_dim
        self.hidden_size = hidden_size
        self.fci = torch.nn.Linear(self.input_dim, self.hidden_dim)
        self.fch = torch.nn.Linear(self.hidden_dim, self.hidden_dim)
        self.fco = torch.nn.Linear(self.hidden_dim,self.output_dim)
        self.relu = torch.nn.ReLU()
        self.softmax = torch.nn.Softmax()
        ## SOLUTION ##

    def forward(self, x):
        ## SOLUTION ##
        x=self.relu(self.fci(x))
        for i in range(self.hidden_size):
            x=self.relu(self.fch(x))
        x=self.softmax(self.fco(x))
        return x
        ## SOLUTION ##
 


Your implementation, once more, needs to be compatible with the following test code:

In [276]:
torch_network = TorchFFNetwork(2, 3, 2, 1)

In [277]:
with torch.no_grad():
    print(torch_network(torch.tensor([1.,0.])))



tensor([0.4038, 0.5962])


Note that the `forward` method is automatically called when you call your network object.

### 4.4.b.2 Training your network (1 point)

Even though we have not covered how training actually works, we will proceed with the training of the a neural network as a blackbox procedure and we will later on learn the internals of the training process (and even implement them ourselves!).

For now, train a neural network (the one you created above) to learn the XOR operation. You are to create a neural network with the appropriate number of input variables, an intermediate hidden layer with 2 units and an output layer with 2 units.

Notes:
- Please read [this introduction to the optimization loop in PyTorch](https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html#optimization-loop). It should give you a good overview to what PyTorch needs from you to train a neural network.
- You are to train the network until the network learns the operation. Remember to set your random seeds so the results are reproducible.
- There are many optimizers available and Adam is an optimizer that's more complex than SGD. It has not yet been covered in the lecture but its usage in code is equivalent to that of SGD and performs much better.

In [278]:
# Our training X, where each instance includes an x1 and an x2, (where the operation is defined as x1 XOR x2)
training_x = [[0,0], [0,1], [1,0], [1,1]]

# We have only covered softmax in the lecture, so we format the output as follows:
training_y = [[1,0], [0,1], [0,1], [1,0]]

# The Y is formatted such that the its first element corresponds to the probability of the XOR resulting in a 0
# and the second element to the probability of the XOR resulting in a 1

################################################################
# TODO: Adapt the training set so it can be used with `pytorch`
################################################################
X=torch.tensor(training_x,dtype=torch.float32)
Y=torch.tensor(training_y,dtype=torch.float32)

In [279]:
# Create the model from the previous class and pick a learning rate
torch.manual_seed(42)
model = torch_network
learning_rate = 0.1

In [280]:
def train_loop(data, model, loss_fn, optimizer):
    # TODO: Implement
    X,Y=data[0],data[1]
    # Compute prediction and loss
    pred = model(X)
    loss = loss_fn(pred, Y)

        # Backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    loss= loss.item()
    print(f"loss: {loss:>7f}")

In [281]:
# TODO: Run training
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
loss_fn = nn.CrossEntropyLoss()
data=(X,Y)
epochs=range(100)
for epoch in epochs:
    print("epoch:",epoch)
    train_loop(data,model,loss_fn,optimizer)

epoch: 0




loss: 0.706204
epoch: 1
loss: 0.691678
epoch: 2
loss: 0.693942
epoch: 3
loss: 0.690251
epoch: 4
loss: 0.689541
epoch: 5
loss: 0.689409
epoch: 6
loss: 0.685096
epoch: 7
loss: 0.678389
epoch: 8
loss: 0.671014
epoch: 9
loss: 0.657473
epoch: 10
loss: 0.641583
epoch: 11
loss: 0.612608
epoch: 12
loss: 0.589663
epoch: 13
loss: 0.560368
epoch: 14
loss: 0.524388
epoch: 15
loss: 0.487841
epoch: 16
loss: 0.443445
epoch: 17
loss: 0.397335
epoch: 18
loss: 0.361403
epoch: 19
loss: 0.338889
epoch: 20
loss: 0.324128
epoch: 21
loss: 0.317894
epoch: 22
loss: 0.315419
epoch: 23
loss: 0.314367
epoch: 24
loss: 0.313882
epoch: 25
loss: 0.313635
epoch: 26
loss: 0.313497
epoch: 27
loss: 0.313416
epoch: 28
loss: 0.313369
epoch: 29
loss: 0.313338
epoch: 30
loss: 0.313318
epoch: 31
loss: 0.313305
epoch: 32
loss: 0.313296
epoch: 33
loss: 0.313289
epoch: 34
loss: 0.313284
epoch: 35
loss: 0.313280
epoch: 36
loss: 0.313277
epoch: 37
loss: 0.313275
epoch: 38
loss: 0.313273
epoch: 39
loss: 0.313272
epoch: 40
loss: 0.3

In [282]:
model(torch.tensor([0.,1.]))>0.5



tensor([False,  True])