## A.I. Assignment 5

## Learning Goals

By the end of this lab, you should be able to:
* Get more familiar with tensors in pytorch 
* Create a simple multilayer perceptron model with pytorch
* Visualise the parameters


### Task

Build a fully connected feed forward network that adds two bits. Determine the a propper achitecture for this network (what database you use for this problem? how many layers? how many neurons on each layer? what is the activation function? what is the loss function? etc)

Create at least 3 such networks and compare their performance (how accurate they are?, how farst they are trained to get at 1 accuracy?)

Display for the best one the weights for each layer.


In [250]:
import torch
import torch.nn as nn
from collections import OrderedDict

In [251]:
# input layer: 2 neurons
# hidden layer: 32 neurons
# output layer: 2 neurons
# sigmoid activation -> sigmoid activation
model1 = nn.Sequential(OrderedDict([
    ('hidden_net', nn.Linear(2,32)),
    ('hidden_act', nn.ReLU()),
    ('output_net', nn.Linear(32,2)),
    ('output_act', nn.Tanh())
]))

# input layer: 2 neurons
# hidden layer: 8 neurons
# output layer: 2 neurons
# relu activation -> sigmoid activation
model2 = nn.Sequential(OrderedDict([
    ('hidden_net', nn.Linear(2,16)),
    ('hidden_act', nn.ReLU()),
    ('output_net', nn.Linear(16,2)),
    ('output_act', nn.Sigmoid())
]))

# input layer: 2 neurons
# hidden layer: 16 neurons
# output layer: 2 neurons
# sigmoid activation -> sigmoid activation
model3 = nn.Sequential(OrderedDict([
    ('hidden_net', nn.Linear(2,64)),
    ('hidden_act', nn.Sigmoid()),
    ('output_net', nn.Linear(64,2)),
    ('output_act', nn.Sigmoid())
]))

In [252]:
print(model1)
print(model2)
print(model3)

Sequential(
  (hidden_net): Linear(in_features=2, out_features=32, bias=True)
  (hidden_act): ReLU()
  (output_net): Linear(in_features=32, out_features=2, bias=True)
  (output_act): Tanh()
)
Sequential(
  (hidden_net): Linear(in_features=2, out_features=16, bias=True)
  (hidden_act): ReLU()
  (output_net): Linear(in_features=16, out_features=2, bias=True)
  (output_act): Sigmoid()
)
Sequential(
  (hidden_net): Linear(in_features=2, out_features=64, bias=True)
  (hidden_act): Sigmoid()
  (output_net): Linear(in_features=64, out_features=2, bias=True)
  (output_act): Sigmoid()
)


In [253]:
# input data
data_in = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float)
print(data_in)

tensor([[0., 0.],
        [0., 1.],
        [1., 0.],
        [1., 1.]])


In [254]:
# expected output data
data_target = torch.tensor([[0, 0], [0, 1], [0, 1], [1, 0]], dtype=torch.float)
print(data_target)

tensor([[0., 0.],
        [0., 1.],
        [0., 1.],
        [1., 0.]])


In [255]:
# loss functions:
### model 1: nn.MSELoss() - Mean Squared Error Loss; it calculates the mean squared difference between the predicted and target outputs
### model 2: nn.CrossEntropyLoss() - suitable for multi-class classification problems.
### model 3: nn.L1Loss() - Mean Absolute Error Loss; it calculates the mean absolute difference between the predicted and targe outputs

# optimizers:
### model 1: Adam
### model 2: SGD (Stochastic Gradient Descent) 
### model 3: SGD

criterion1 = nn.MSELoss()
optimizer1 = torch.optim.Adam(model1.parameters(), lr=0.01)
criterion2 = nn.CrossEntropyLoss()
optimizer2 = torch.optim.SGD(model2.parameters(), lr=0.01)
criterion3 = nn.L1Loss()
optimizer3 = torch.optim.SGD(model3.parameters(), lr=0.01)

In [256]:
def train(model, inputs, outputs, criterion, optimizer):
    for step in range(10000):
        # clears the gradients of all optimized tensors
        optimizer.zero_grad()

        # computes the loss by comparing the model's predictions (model(inputs)) with the actual outputs (outputs)
        loss = criterion(model(inputs), outputs)

        # computes gradients of the loss with respect to model parameters
        loss.backward()

        # updates the model parameters based on the gradients computed previously
        optimizer.step()
        
        if ((model(inputs) >= 0.5) == outputs).float().mean() == 1:
            return step

In [257]:
for model in [model1, model2, model3]:
    if model == model1:
        criterion = criterion1
        optimizer = optimizer1
    if model == model2:
        criterion = criterion2
        optimizer = optimizer2
    if model == model3:
        criterion = criterion3
        optimizer = optimizer3
    step = train(model, data_in, data_target, criterion, optimizer)
    outputs = model(data_in)

    outputs_pairs = outputs.view(outputs.size(0), -1, 2)
    data_target_pairs = data_target.view(data_target.size(0), -1, 2)

    print(f'Predicted sum: \n{(outputs_pairs >= 0.5).float()}')
    print(f'Target sum: \n{data_target_pairs}')
    
    # Compare pairs of elements
    accuracy = ((outputs_pairs >= 0.5).float() == data_target_pairs).all(dim=-1).float().mean()
    print(f'Got 100% accuracy at step {step}')
    print(f'Training Accuracy: {accuracy.item()*100}%')

Predicted sum: 
tensor([[[0., 0.]],

        [[0., 1.]],

        [[0., 1.]],

        [[1., 0.]]])
Target sum: 
tensor([[[0., 0.]],

        [[0., 1.]],

        [[0., 1.]],

        [[1., 0.]]])
Got 100% accuracy at step 20
Training Accuracy: 100.0%
Predicted sum: 
tensor([[[0., 1.]],

        [[0., 1.]],

        [[0., 1.]],

        [[1., 0.]]])
Target sum: 
tensor([[[0., 0.]],

        [[0., 1.]],

        [[0., 1.]],

        [[1., 0.]]])
Got 100% accuracy at step None
Training Accuracy: 75.0%
Predicted sum: 
tensor([[[0., 1.]],

        [[0., 1.]],

        [[0., 1.]],

        [[0., 1.]]])
Target sum: 
tensor([[[0., 0.]],

        [[0., 1.]],

        [[0., 1.]],

        [[1., 0.]]])
Got 100% accuracy at step None
Training Accuracy: 50.0%


In [145]:
print('Weight of network1 :\n',model1[0].weight)
print('Weight of network2 :\n',model2[0].weight)
print('Weight of network3 :\n',model3[0].weight)

Weight of network1 :
 Parameter containing:
tensor([[-3.2480, -3.2885],
        [ 3.3860,  3.4893],
        [ 4.2936, -3.8407],
        [ 3.6871,  3.6478],
        [-2.9255, -2.9753],
        [-2.9699, -2.9571],
        [ 2.8422,  2.6562],
        [-3.3319, -3.4475],
        [-4.5871, -4.7537],
        [-2.3196, -2.5263],
        [-3.8301, -3.5614],
        [-3.8168, -3.5684],
        [-3.7551,  4.5618],
        [-3.1402, -3.1070],
        [-4.0948, -4.1819],
        [ 2.7616,  2.8866],
        [ 3.2162,  3.2121],
        [-1.8533, -1.9224],
        [-3.3478, -3.6515],
        [ 2.8064,  1.2328],
        [-3.6245, -3.6802],
        [ 3.6965, -4.3792],
        [-3.7593, -3.7607],
        [-3.3812, -3.2490],
        [-2.5035, -2.1806],
        [-3.6973, -3.5702],
        [-0.5385, -2.1092],
        [ 4.2259, -3.8352],
        [ 3.9677,  3.6659],
        [-2.9159, -2.6737],
        [-2.8601, -2.8625],
        [-2.7602, -0.5675]], requires_grad=True)
Weight of network2 :
 Parameter contain