# Homework 1
### Marco Sicklinger, 03/2021

## Question 1

In [18]:
import torch as pt
import random

Defining more sophisticated `print` function (the one used in the laboratory).

In [34]:
def pretty_print(obj, title=None):
    if title is not None:
        print(title)
    print(obj)
    print("\n")

Creating class for multi-layer perceptrons:

In [35]:
class MultiLayerPerceptron(pt.nn.Module):

    def __init__(self):
        super().__init__()

        # Create members to simulate layers
        self._h_layer_1 = pt.nn.Linear(in_features = 5, out_features = 11, bias = False)
        self._h_layer_2 = pt.nn.Linear(in_features = 11, out_features = 16, bias = False)
        self._h_layer_3 = pt.nn.Linear(in_features = 16, out_features = 13, bias = False)
        self._h_layer_4 = pt.nn.Linear(in_features = 13, out_features = 8, bias = False)
        self._o_layer = pt.nn.Linear(in_features = 8, out_features = 4, bias = False)

    def forward(self, X):

        out = self._h_layer_1(X)
        out = pt.nn.functional.relu(out)

        out = self._h_layer_2(out)
        out = pt.nn.functional.relu(out)

        out = self._h_layer_3(out)
        out = pt.nn.functional.relu(out)

        out = self._h_layer_4(out)
        out = pt.nn.functional.relu(out)

        out = self._o_layer(out)
        out = pt.nn.functional.softmax(out, dim=0)

        return out

        

## Question 2

Creating instance of `MultiLayerPerceptron` class:

In [36]:
mlp = MultiLayerPerceptron()

#### Print summary with standard method

In [37]:
pretty_print(mlp, "Multi-Layer Perceptron")

Multi-Layer Perceptron
MultiLayerPerceptron(
  (_h_layer_1): Linear(in_features=5, out_features=11, bias=False)
  (_h_layer_2): Linear(in_features=11, out_features=16, bias=False)
  (_h_layer_3): Linear(in_features=16, out_features=13, bias=False)
  (_h_layer_4): Linear(in_features=13, out_features=8, bias=False)
  (_o_layer): Linear(in_features=8, out_features=4, bias=False)
)




#### Print summary with `torchsummary.summary` 

In [38]:
import sys
from torchsummary import summary

In [39]:
summary(mlp)

Layer (type:depth-idx)                   Param #
├─Linear: 1-1                            55
├─Linear: 1-2                            176
├─Linear: 1-3                            208
├─Linear: 1-4                            104
├─Linear: 1-5                            32
Total params: 575
Trainable params: 575
Non-trainable params: 0


Layer (type:depth-idx)                   Param #
├─Linear: 1-1                            55
├─Linear: 1-2                            176
├─Linear: 1-3                            208
├─Linear: 1-4                            104
├─Linear: 1-5                            32
Total params: 575
Trainable params: 575
Non-trainable params: 0

## Question 3

### *No bias* case
Since the network has a total of six layers, input and output layers included, we need five matrices to store the weights.  

A first matrix $W^{(1)}$ is needed between the input and the first hidden layer. Since we have a "5-noded" input layer and an "11-noded" hidden layer, $W^{(1)}$ will be of the form
$$
W^{(1)}=
\begin{pmatrix}
w^{(1)}_{1,1} & \dotsm & w^{(1)}_{1,5}\\
\dotsm & \ddots & \dotsm\\
w^{(1)}_{11,1} & \dotsm & w^{(1)}_{11,5}
\end{pmatrix}
$$
that is a $11\times5$ order matrix.

A second matrix of weights is needed between the first and the second hidden layers: since the former belongs to $\mathbb{R}^{11}$ and the latter to $\mathbb{R}^{16}$, we will need a $16\times 11$ matrix this time:
$$
W^{(2)}=
\begin{pmatrix}
w^{(2)}_{1,1} & \dotsm & w^{(2)}_{1,11}\\
\dotsm & \ddots & \dotsm\\
w^{(2)}_{16,1} & \dotsm & w^{(2)}_{16,11}
\end{pmatrix}
$$
that is a $11\times5$ order matrix.

Since hidden layers two and three belongs respectively to $\mathbb{R}^{16}$ and $\mathbb{R}^{13}$, now we need a $13\times16$ order matrix:
$$
W^{(3)}=
\begin{pmatrix}
w^{(3)}_{1,1} & \dotsm & w^{(3)}_{1,16}\\
\dotsm & \ddots & \dotsm\\
w^{(3)}_{13,1} & \dotsm & w^{(3)}_{13,16}
\end{pmatrix}.
$$

Similar reasoning brings us to deduce that, in order to deal with the rest of the layers, we need 
$$
W^{(4)}=
\begin{pmatrix}
w^{(4)}_{1,1} & \dotsm & w^{(4)}_{1,13}\\
\dotsm & \ddots & \dotsm\\
w^{(4)}_{8,1} & \dotsm & w^{(4)}_{8,13}
\end{pmatrix}
$$
$$
W^{(5)}=
\begin{pmatrix}
w^{(5)}_{1,1} & \dotsm & w^{(5)}_{1,8}\\
\dotsm & \ddots & \dotsm\\
w^{(5)}_{4,1} & \dotsm & w^{(5)}_{4,8}
\end{pmatrix},
$$
that are of order $8\times13$ and $4\times8$, respectively.

This leads to the need of a number of parameters corresponding
$$
N_p=11\cdot5+16\cdot11+13\cdot16+8\cdot13+4\cdot8=575.
$$

### *Bias* case
In this case, to the number of parameters computed for the previous case, one must add the number of biases needed for each hidden layer layer.

Since the first hidden layer belongs to $\mathbb{R}^{11}$, here we need a bias vector
$$
b^{(1)}=
\begin{pmatrix}
b^{(1)}_{1} \\
\vdots \\
b^{(1)}_{11}
\end{pmatrix},
$$
which is a $11\times1$ vector. With a similar anrgument, the bias vectors for the rest of the hidden layers are
$$
b^{(2)}=
\begin{pmatrix}
b^{(2)}_{1} \\
\vdots \\
b^{(2)}_{16}
\end{pmatrix},\,\,\,\,\,
b^{(3)}=
\begin{pmatrix}
b^{(3)}_{1} \\
\vdots \\
b^{(3)}_{13}
\end{pmatrix},\,\,\,\,\,
b^{(4)}=
\begin{pmatrix}
b^{(4)}_{1} \\
\vdots \\
b^{(4)}_{8}
\end{pmatrix},\,\,\,\,\,
b^{(5)}=
\begin{pmatrix}
b^{(5)}_{1} \\
\vdots \\
b^{(5)}_{4}
\end{pmatrix},
$$
that are $16\times1$, $13\times1$, $8\times1$ and $4\times 1$ vectors, respectively.

The total number of parameters needed goes to
$$
N_p=575+11\cdot1+16\cdot1+13\cdot1+8\cdot1+4\cdot1=575+52=627.
$$

## Question 4

In [40]:
for par_name, par in mlp.state_dict().items():
    print(par_name, par)
    print('\n')
    norm_1 = par.norm(1).item()
    pretty_print(norm_1, "1-norm of tensor {}".format(par_name))
    norm_2 = par.norm(2).item()
    pretty_print(norm_2, "2-norm of tensor {}".format(par_name))
    print('-------------------------------------------------------')
    print('\n')

_h_layer_1.weight tensor([[ 0.1471,  0.3730, -0.0693, -0.1390, -0.2650],
        [ 0.4020, -0.0305, -0.2658,  0.3058,  0.0800],
        [-0.2078, -0.2932,  0.1696,  0.2723, -0.0473],
        [ 0.1108, -0.1379, -0.1138,  0.2811, -0.3071],
        [ 0.3451, -0.2241, -0.3402, -0.2188,  0.3836],
        [-0.0816, -0.2221,  0.3596, -0.3859,  0.3842],
        [ 0.1919, -0.0690, -0.0911, -0.1626, -0.3167],
        [-0.0872,  0.0738,  0.4034,  0.1438,  0.3268],
        [-0.0642,  0.1412,  0.0314,  0.2642, -0.0618],
        [-0.0170, -0.2534, -0.4254,  0.2709,  0.0534],
        [-0.3370, -0.3395, -0.0854, -0.2339, -0.0328]])


1-norm of tensor _h_layer_1.weight
11.44139289855957


2-norm of tensor _h_layer_1.weight
1.7872098684310913


-------------------------------------------------------


_h_layer_2.weight tensor([[ 0.1309,  0.1489,  0.0418, -0.2248, -0.2164, -0.2639, -0.0938, -0.0142,
          0.1847, -0.0436,  0.2053],
        [-0.2421, -0.1001,  0.0068, -0.0866,  0.0682, -0.0189, -0.075