#### Homework 1

1. build the MLP in the image above using PT built-ins
2. Provide calculation for the exact number of parameters of the MLP
   - Do it first supposing that the layers don't have a bias term, then supposing that the bias is present wherever it's possible
3. Calculate the $L_1$ (vectorial) norm and the Frobenius norm for the params of each layer
4. Given 10 random datapoints, feed them into the network. This operation must be done all in one single command and must **not** make use of loops.
   - Given the output of the network, using PyTorch code, find the class of assignment of each datapoint. This also must be done in a single PyTorch command without using loops.
   - Drafting a vector of ground truths (whichever labels you like), provide code for the calculation of the accuracy
     - Tip: first get the number of correct assignments, then...

In [1]:
import torch
!pip install torchinfo
from torchinfo import summary



## 1. Build the MLP in the image above using PT built-ins

In [2]:
class MLP(torch.nn.Module):
    def __init__(self):
      super().__init__()
      self.layers = torch.nn.Sequential(
           torch.nn.Linear(5, 11),
           torch.nn.ReLU(),
           torch.nn.Linear(11, 16),
           torch.nn.ReLU(), 
           torch.nn.Linear(16, 13),
           torch.nn.ReLU(),
           torch.nn.Linear(13, 8),
           torch.nn.ReLU(),
           torch.nn.Linear(8, 4), 
           torch.nn.Softmax(dim=1)
      )
        
    def forward(self, X):
      return self.layers(X)

  

## 2. Provide calculation for the exact number of parameters of the MLP

Supposing that no bias term is present, we have that the number of weights is given by the sum across the layers of the multiplication of input and output dimension of a nueron in the layer (since the architecture is dense), without taking into accont the input nuerons.

$$ \sum_l in_l \times out_l = 5 \times 11 + 11 \times 16 + 16 × 13 + 13 \times 8 + 8 \times 4 = 575$$

If we consider the bias term we have to add a number of weight corresponding to the number of neurons that are not input nodes: 
$$ \sum_{l!=0} n_i = 11+16+13+8+4 = 52$$

The total number of weight will therefore be: 
$$\sum_l in_l \times out_l + \sum_{l!=0} n_i = 575 +52 = 627$$

In the cells below the summary of the model is presented and confirms that the calculation is correct.

In [3]:
model = MLP()
summary(model)

Layer (type:depth-idx)                   Param #
MLP                                      --
├─Sequential: 1-1                        --
│    └─Linear: 2-1                       66
│    └─ReLU: 2-2                         --
│    └─Linear: 2-3                       192
│    └─ReLU: 2-4                         --
│    └─Linear: 2-5                       221
│    └─ReLU: 2-6                         --
│    └─Linear: 2-7                       112
│    └─ReLU: 2-8                         --
│    └─Linear: 2-9                       36
│    └─Softmax: 2-10                     --
Total params: 627
Trainable params: 627
Non-trainable params: 0

## 3.Calculate the L1 (vectorial) norm and the Frobenius norm for the params of each layer

In [8]:
@torch.no_grad()
def get_norm(norm):
  return [l.weight.norm(p=norm).numpy().item() for l in model.layers
          if isinstance(l, torch.nn.Linear)]

print("L1 norm: ", get_norm(1))
print("Frobenious norm: ", get_norm('fro'))

L1 norm:  [12.455259323120117, 25.966724395751953, 26.8754940032959, 14.715418815612793, 6.213144779205322]
Frobenious norm:  [1.911424160003662, 2.27777099609375, 2.1251652240753174, 1.6577861309051514, 1.2323379516601562]


## 4. Given 10 random datapoints, feed them into the network. This operation must be done all in one single command and must not make use of loops.

In [5]:
y_true = torch.randint(0, 4, (10,))
print("true:", y_true)
y_out = model(torch.rand((10, 5)))
print("out:", y_out)

true: tensor([2, 3, 3, 0, 3, 1, 1, 0, 0, 2])
out: tensor([[0.2323, 0.1684, 0.2854, 0.3139],
        [0.2320, 0.1708, 0.2842, 0.3130],
        [0.2318, 0.1709, 0.2842, 0.3131],
        [0.2318, 0.1703, 0.2845, 0.3134],
        [0.2319, 0.1701, 0.2845, 0.3135],
        [0.2316, 0.1712, 0.2840, 0.3131],
        [0.2325, 0.1682, 0.2856, 0.3137],
        [0.2316, 0.1704, 0.2845, 0.3136],
        [0.2317, 0.1697, 0.2848, 0.3137],
        [0.2320, 0.1703, 0.2844, 0.3133]], grad_fn=<SoftmaxBackward0>)


In [6]:
def accuracy(y_true, y_pred):
  return torch.mean((y_true == y_pred).float())

def get_pred_from_out(y_out):
  return torch.max(y_out, dim=1).indices

def accuracy_from_output(y_true, y_out):
  return accuracy(y_true, get_pred_from_out(y_out))


In [9]:
print("true:", y_true)
print("pred:", get_pred_from_out(y_out))
print("acc :", accuracy_from_output(y_true, y_out).numpy().item())

true: tensor([2, 3, 3, 0, 3, 1, 1, 0, 0, 2])
pred: tensor([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
acc : 0.30000001192092896
