# PyTorch - homework 2: neural networks

-- Prof. Dorien Herremans

Please run the whole notebook with your code and submit the `.ipynb` file on eDimension that includes your answers [so after you run it]. 

In [1]:
from termcolor import colored

student_number="1004455"
student_name="Victoria Yong"

print(colored("Homework by "  + student_name + ', number: ' + student_number,'red'))

[31mHomework by Victoria Yong, number: 1004455[0m


 ## Question 1 -- XOR neural network [3pts]

a) Train an (at least) 2-layer neural network that can solve the XOR problem. Hint: be sure to check both this week and last week's lab. 

b) Check the predictions resulting from your model in the second code box below.


In [2]:
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.optim as optim

In [3]:
# load your data
feats = torch.Tensor([[0, 0],
                        [0, 1],
                        [1, 0],
                        [1, 1]])
labels = torch.Tensor([[0, 1, 1, 0]]).view(-1, 1)

in_dim = feats.size(1)

# name your model xor
def xor(in_dim=4, out_dim=1):
# define your model loss function, optimizer, etc. 
    model = nn.Sequential(
        nn.Linear(in_dim, 128),
        nn.Sigmoid(),
        nn.Linear(128, out_dim),
        nn.Sigmoid()
    ).cuda()

# initialize weights
    for m in model.modules():
        if isinstance(m, nn.Linear):
            m.weight.data.normal_(0, 1)

    return model

#########################################################################################

model = xor(in_dim)
epochs = 1000

loss_func = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# train the model
for epoch in range(epochs):
    for j in range(feats.size(0)):
        idx = np.random.randint(feats.size(0))
        x = torch.autograd.Variable(feats[idx], requires_grad=False).cuda()
        y = torch.autograd.Variable(labels[idx], requires_grad=False).cuda()

        optimizer.zero_grad()

        pred = model(x)
        loss = loss_func(pred, y)
        loss.backward()
        optimizer.step()

    if epoch % 100 == 0:
        print(f"Epoch: {epoch}, Loss: {loss.cpu().data.numpy()}, ")


Epoch: 0, Loss: 0.0027467745821923018, 
Epoch: 100, Loss: 0.5857964754104614, 
Epoch: 200, Loss: 0.3704299330711365, 
Epoch: 300, Loss: 0.7951344847679138, 
Epoch: 400, Loss: 0.3863573670387268, 
Epoch: 500, Loss: 0.3926236629486084, 
Epoch: 600, Loss: 0.3062394857406616, 
Epoch: 700, Loss: 0.20453286170959473, 
Epoch: 800, Loss: 0.18820373713970184, 
Epoch: 900, Loss: 0.18948961794376373, 


In [4]:
# test your model using the following functions (make sure the output is printed and saved when you submit this notebook):
# depending on how you defined your network you may need to slightly tweek the below prediction function

test = [[0,0],[0,1],[1,1],[1,0]]

for trial in test: 
  Xtest = torch.Tensor(trial).cuda()
  y_hat = model(Xtest)

  if y_hat > 0.5:
    prediction = 1
  else: 
    prediction = 0

  print("{0} xor {1} = {2}".format(int(Xtest[0]), int(Xtest[1]), prediction))

0 xor 0 = 0
0 xor 1 = 1
1 xor 1 = 0
1 xor 0 = 1


## Question 2  [2pts]

Imagine a neural network model for a multilabel classification task. 

a) Which loss function should you use?

b) The resulting trained modal has a high variance error. Give 4 possible solutions to improve the model. 


```
a) Binary Cross Entropy Loss

b) 
- 1 : Add dropout layers
- 2 : Use data augmentation on training data
- 3 : Implement early stopping
- 4 : Get more training data that is balanced across classes

```


## Question 3 - Improve hit classification [5pts]

Remember the hit predicton dataset from last week? 

a) Improve the model using a multiplayer perceptron. 

b) Make sure to run your models on the GPU. 

c) Tweek the hyperparameters such as number of nodes or layers, or other. Show two possible configurations and explain which works better and very briefly explain why this may be the case. 




In [5]:
# load train_data
train_url = 'https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030training.csv'
test_url = 'https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030test.csv'

train_data = pd.read_csv(train_url)
train_labels = train_data['Topclass1030']
train_feats = train_data.drop('Topclass1030', 1)

  train_feats = train_data.drop('Topclass1030', 1)


In [6]:
# code your model 1

# MLP
class MLP(nn.Module):
 
  def __init__(self, input_size, num_classes):
    super(MLP, self).__init__()
    self.fc1 = nn.Linear(input_size, 64)
    self.fc2 = nn.Linear(64, 16)
    self.fc3 = nn.Linear(16, num_classes)
   
  def forward(self, x):
    out = self.fc1(x)
    out = torch.relu(out)
    out = self.fc2(out)
    out = torch.relu(out)
    out = self.fc3(out)
    out = torch.relu(out)
    out = torch.sigmoid(out)
    return out

# train model
num_outputs = 1
num_input_features = train_data.shape[1] - 1
model1 = MLP(num_input_features, num_outputs).cuda()

lr_rate = 1e-3
loss_function = nn.BCELoss() 
optimizer = torch.optim.SGD(model1.parameters(), lr=lr_rate)
epochs = 50 

for i in range(epochs):
  for j in range(train_data.shape[0]):
      feats = torch.tensor(train_feats.loc[j].values).float().cuda()
      label = torch.tensor([train_labels.loc[j]]).float().cuda()

      optimizer.zero_grad()
      pred = model1(feats)

      loss = loss_function(pred, label).cuda() 
      loss.backward() 
      optimizer.step() 

  if i % 10 == 0:
      print (f"Epoch: {i}\t Loss: {loss}")


Epoch: 0	 Loss: 0.6931471824645996
Epoch: 10	 Loss: 0.7532297968864441
Epoch: 20	 Loss: 0.7110101580619812
Epoch: 30	 Loss: 0.6931471824645996
Epoch: 40	 Loss: 0.6931471824645996


In [7]:
# evaluate model 1 (called model1 here)
import pandas as pd 

def run_evaluation(my_model):

  test = pd.read_csv(test_url)
  labels = test.iloc[:,-1]
  test = test.drop('Topclass1030', axis=1)
  testdata = torch.Tensor(test.values)
  testlabels = torch.Tensor(labels.values).view(-1,1).cuda()

  TP = 0
  TN = 0
  FN = 0
  FP = 0

  for i in range(0, testdata.size()[0]): 
    # print(testdata[i].size())
    Xtest = torch.Tensor(testdata[i]).cuda()
    y_hat = my_model(Xtest)
    
    if y_hat > 0.5:
      prediction = 1
    else: 
      prediction = 0

    if (prediction == testlabels[i]):
      if (prediction == 1):
        TP += 1
      else: 
        TN += 1

    else:
      if (prediction == 1):
        FP += 1
      else: 
        FN += 1

  print("True Positives: {0}, True Negatives: {1}".format(TP, TN))
  print("False Positives: {0}, False Negatives: {1}".format(FP, FN))
  rate = TP/(FN+TP)
  print("Class specific accuracy of correctly predicting a hit song is {0}".format(rate))

run_evaluation(model1)

True Positives: 42, True Negatives: 17
False Positives: 12, False Negatives: 8
Class specific accuracy of correctly predicting a hit song is 0.84


In [8]:
# code your model 2

class MLP2(nn.Module):
 
  def __init__(self, input_size, num_classes):
    super(MLP2, self).__init__()
    self.fc1 = nn.Linear(input_size, 32)
    self.fc2 = nn.Linear(32, 8)
    self.fc3 = nn.Linear(8, num_classes)
    self.dropout = nn.Dropout(p=0.75)
   
  def init_weights(self):
      # initialize weights
      for m in self.modules():
          if isinstance(m, nn.Linear):
              m.weight.data.normal_(0, 1)
              
  def forward(self, x):
    out = self.fc1(x)
    out = torch.relu(out)
    out = self.dropout(out)
    out = self.fc2(out)
    out = torch.relu(out)
    out = self.fc3(out)
    out = torch.relu(out)
    out = torch.sigmoid(out)
    return out

# train model
num_outputs = 1
num_input_features = train_data.shape[1] - 1
model2 = MLP2(num_input_features, num_outputs).cuda()

lr_rate = 1e-4
loss_function = nn.BCELoss() 
optimizer = torch.optim.SGD(model2.parameters(), lr=lr_rate, weight_decay=5e-4)

epochs = 100 

for i in range(epochs):
    for j in range(train_data.shape[0]):
        feats = torch.tensor(train_feats.loc[j].values).float().cuda()
        label = torch.tensor([train_labels.loc[j]]).float().cuda()

        optimizer.zero_grad()
        pred = model2(feats)

        loss = loss_function(pred, label).cuda() 
        loss.backward() 
        optimizer.step() 

    if i % 10 == 0:
        print (f"Epoch: {i}\t Loss: {loss}")


Epoch: 0	 Loss: 0.763859212398529
Epoch: 10	 Loss: 0.6931471824645996
Epoch: 20	 Loss: 0.8280501961708069
Epoch: 30	 Loss: 0.9652078151702881
Epoch: 40	 Loss: 0.836501955986023
Epoch: 50	 Loss: 0.805893063545227
Epoch: 60	 Loss: 0.9505552649497986
Epoch: 70	 Loss: 0.850154459476471
Epoch: 80	 Loss: 0.7896213531494141
Epoch: 90	 Loss: 0.8565614223480225


In [9]:
# evaluate model 2 (called model2 here)

run_evaluation(model2)

True Positives: 50, True Negatives: 0
False Positives: 29, False Negatives: 0
Class specific accuracy of correctly predicting a hit song is 1.0


Which works better and why do you think this may be (very briefly)? 


**[your answer here, also please summarise the differences between your two models]**

From the evaluation, model 2 seemed to perfodrm better. Model 1 was a basic multilayer perceptron with 1 input layer, 1 output layer, and 1 hidden layer. The model uses the ReLu activation function to add non-linearities, and the final output is passed through a sigmoid function for binary classification. Model 2 adds a dropout layer with probability 0.75 and trains the model for twice the number of epochs as model 1. 

The dropout layer randomly assigns certain weights to 0, which helps to prevent overfitting.

Additionally, submit your results [here](https://forms.gle/NtJJEE7Wm5ZRM3Je7) for 'Class specific accuracy of correctly predicting a hit song' and see if you got the best performance of the class! Good luck!