## A.I. Assignment 5

## Learning Goals

By the end of this lab, you should be able to:
* Get more familiar with tensors in pytorch 
* Create a simple multilayer perceptron model with pytorch
* Visualise the parameters


### Task

Build a fully connected feed forward network that adds two bits. Determine the a propper achitecture for this network (what database you use for this problem? how many layers? how many neurons on each layer? what is the activation function? what is the loss function? etc)

Create at least 3 such networks and compare their performance (how accurate they are?, how farst they are trained to get at 1 accuracy?)

Display for the best one the weights for each layer.


In [8]:
import torch
import torch.nn as nn
from collections import OrderedDict
from datetime import datetime

In [18]:
# The database will contain 1 and 0 because we must add two bits
# We are going to have 3 main layers: input layer, hidden layers, and output layer

# MODEL 1: Defines a feedforward neural network with: 2 neurons on input layer, 32 on the hidden layer, and 2 on the output layer.
# ReLU activation (for simplicity and efectiveness) function for the hidden layer and sigmoid activation (since the problem involves binary classification)
#function for the output layer
# MODEL 2&3: Defines a feedforward neural network with: 2 neurons on input layer, 128 and 16 on the hidden layer, and 2 on the output layer

model1 = nn.Sequential(OrderedDict([
    ('hidden', nn.Linear(2, 32)),
    ('hidden_activation', nn.ReLU()),
    ('output', nn.Linear(32, 2)),
    ('output_activation', nn.Sigmoid())
]))

model2 = nn.Sequential(OrderedDict([ # increase in the hidden layer (from 32 to 128) allows the model to potentially capture more complex relationships in the data
    ('hidden', nn.Linear(2,128)),
    ('hidden_act', nn.ReLU()),
    ('output', nn.Linear(128,2)),
    ('output_act', nn.Sigmoid())
]))
model3 = nn.Sequential(OrderedDict([
    ('hidden', nn.Linear(2,16)),
    ('hidden_activation', nn.Sigmoid()),
    ('output', nn.Linear(16,2)),
    ('hidden_activation', nn.Sigmoid())
]))

In [19]:
print(model1)
print(model2)
print(model3)

Sequential(
  (hidden): Linear(in_features=2, out_features=32, bias=True)
  (hidden_activation): ReLU()
  (output): Linear(in_features=32, out_features=2, bias=True)
  (output_activation): Sigmoid()
)
Sequential(
  (hidden): Linear(in_features=2, out_features=128, bias=True)
  (hidden_act): ReLU()
  (output): Linear(in_features=128, out_features=2, bias=True)
  (output_act): Sigmoid()
)
Sequential(
  (hidden): Linear(in_features=2, out_features=16, bias=True)
  (hidden_activation): Sigmoid()
  (output): Linear(in_features=16, out_features=2, bias=True)
)


In [11]:
#data_in = torch.tensor( ...
# In data_in we have all the possible combinations of 2 bits 1 and 0 
data_in = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)

print(data_in)

tensor([[0., 0.],
        [0., 1.],
        [1., 0.],
        [1., 1.]])


In [13]:
# data_target = torch.tensor( ...
# contains all the possible results of adding 2 bits
data_target = torch.tensor([[0,0], [0,1], [0,1], [1,0]], dtype=torch.float32)
print(data_target)

tensor([[0., 0.],
        [0., 1.],
        [0., 1.],
        [1., 0.]])


In [20]:
# your code here
# criterion = 
# optimizer = 

# The loss function for each model will be the MSELoss function (Mean Squared Error).
# MSELoss() is a torch method that creates a criterion that measures the mean squared error between each element in the 
# input x and the target y
# L1Loss() aka Absolute Error Loss, is the absolute difference prediction and the actual value, calculated for each ex in a
# dataset
# We do this in order to compute the error between the predicted output and the target output.
criterion1 = nn.BCEWithLogitsLoss()
criterion2 = nn.MSELoss()
criterion3 = nn.L1Loss()
optimizer1 = torch.optim.SGD(model1.parameters(), lr=0.01)
optimizer2 = torch.optim.SGD(model2.parameters(), lr=0.01)
optimizer3 = torch.optim.SGD(model3.parameters(), lr=0.01)
models = [model1, model2, model3]
criterions = [criterion1,criterion2,criterion3]
optimizers = [optimizer1,optimizer2,optimizer3]

# We use stochastic gradient descent (SGD) as the optimization algorithm with a learning rate of 0.01 for all models.

In [15]:
# your code here
# Train the model
# Wr train ach model for 1000 epochs. During training, the loss is computed using the chosen loss function, and the optimizer is used 
#to update the model parameters to minimize this loss.
def train(model, inputs,outputs, criterion, optimizer):
    for epoch in range(1000):
        optimizer.zero_grad()
        output = model(inputs)
        loss = criterion(output, outputs)
        loss.backward()
        optimizer.step()
        if (epoch + 1) % 100 == 0:
            print(f"Epoch [{epoch+1}/1000], Loss: {loss.item():.4f}")
            
# In the next part, we train each model
accuracies = []

for i in range(3):
    print("Model", i+1, ": ")
    start_time = datetime.now()
    train(models[i],data_in, data_target, criterions[i], optimizers[i])
    
    # next, we compute the accuracy
    #After training, each model's accuracy is evaluated by comparing the predicted output to the target output.
    #The accuracy represents the percentage of correctly predicted outputs.
    
    outputs = models[i](data_in)
    predicted = (outputs >= 0.5).float()
    
    accuracy = (predicted == data_target).float().mean()
    end_time = datetime.now()
    start_time = datetime.strptime(start_time.strftime("%H:%M:%S:%f"),"%H:%M:%S:%f") 
    end_time = datetime.strptime(end_time.strftime("%H:%M:%S:%f"),"%H:%M:%S:%f")                                                                           
    
    if accuracy == 1: #accuracy reaching 1 means that the model perfectly learned the addition operation for all input combinations (00, 01, 10, 11).
        print("The time for model ", i+1, "to get accuracy 1 was", end_time - start_time)
        
    accuracies.append(accuracy)
    

Model 1 : 
Epoch [100/1000], Loss: 0.7760
Epoch [200/1000], Loss: 0.7695
Epoch [300/1000], Loss: 0.7635
Epoch [400/1000], Loss: 0.7578
Epoch [500/1000], Loss: 0.7526
Epoch [600/1000], Loss: 0.7478
Epoch [700/1000], Loss: 0.7434
Epoch [800/1000], Loss: 0.7393
Epoch [900/1000], Loss: 0.7356
Epoch [1000/1000], Loss: 0.7323
Model 2 : 
Epoch [100/1000], Loss: 0.2183
Epoch [200/1000], Loss: 0.2025
Epoch [300/1000], Loss: 0.1903
Epoch [400/1000], Loss: 0.1797
Epoch [500/1000], Loss: 0.1703
Epoch [600/1000], Loss: 0.1620
Epoch [700/1000], Loss: 0.1545
Epoch [800/1000], Loss: 0.1477
Epoch [900/1000], Loss: 0.1414
Epoch [1000/1000], Loss: 0.1356
The time for model  2 to get accuracy 1 was 0:00:00.462541
Model 3 : 
Epoch [100/1000], Loss: 0.3750
Epoch [200/1000], Loss: 0.3749
Epoch [300/1000], Loss: 0.3749
Epoch [400/1000], Loss: 0.3749
Epoch [500/1000], Loss: 0.3749
Epoch [600/1000], Loss: 0.3748
Epoch [700/1000], Loss: 0.3749
Epoch [800/1000], Loss: 0.3748
Epoch [900/1000], Loss: 0.3748
Epoch [

In [16]:
for accuracy in accuracies:
    print("Accuracy: {:.2f}%".format(accuracy.item()*100))

Accuracy: 62.50%
Accuracy: 100.00%
Accuracy: 62.50%


In [17]:
#Now we display the weights of the best-performing model (the one with the highest accuracy)
best_model = models[accuracies.index(max(accuracies))]
print("The weights of the best model are: ", best_model[0].weight)

The weights of the best model are:  Parameter containing:
tensor([[ 0.4900, -0.3621],
        [ 0.0456,  0.1005],
        [-0.3126,  0.1588],
        [-0.3836, -0.0766],
        [ 0.5458,  0.2008],
        [-0.2888, -0.6026],
        [-0.6038,  0.4639],
        [ 0.4141,  0.6408],
        [-0.4101, -0.4607],
        [-0.4255,  0.3358],
        [-0.6347,  0.5502],
        [ 0.3085, -0.5927],
        [ 0.0680,  0.4877],
        [ 0.1634,  0.2946],
        [-0.0496, -0.4931],
        [-0.1507, -0.5999],
        [-0.4730,  0.0181],
        [-0.1828, -0.2440],
        [ 0.0757, -0.3814],
        [-0.3157, -0.4939],
        [ 0.6544,  0.0960],
        [-0.2633,  0.4348],
        [-0.2384,  0.6393],
        [-0.3434,  0.4190],
        [-0.0185,  0.4230],
        [-0.6746,  0.3041],
        [-0.4605, -0.6994],
        [-0.6107, -0.1823],
        [-0.2228,  0.1888],
        [-0.6613, -0.1034],
        [ 0.0067,  0.3856],
        [-0.0564, -0.1842],
        [ 0.3779,  0.4395],
        [-0.1584, 