# Welcome to assignment #5!

Please submit your solution of this notebook in the Whiteboard at the corresponding Assignment entry. We need you to upload the .ipynb-file and the exported .pdf of this notebook.

If you have any questions, ask them in either in the tutorials or in the "Mattermost" channel. The channel is called SSL_WS_2324, you can join the server using this [Link](https://mattermost.imp.fu-berlin.de/signup_user_complete/?id=h5ssupqokprtpyf4dr7xabiwpc&md=link&sbr=su) and can search for the public channel.


This week we will learn representations using a Contrastive loss function.

# Slide Review

[Google Form](https://forms.gle/3DTirLWzpmbatqnV7) for the slide review. Please take a minute to scroll over the slides again and improve your lecture.

Please make sure to only choose your top 5 slides per lecture!

# PapagAI

From the second week onwards we started the reflective study.
Register on the [PapagAI website](https://www.papag.ai) and write your first reflection about your impressions and challenges in the context of the lectures and tutorials you had this and previous week. The size of reflection can be anywhere bigger than 100 words. You can check out this [YouTube video](https://www.youtube.com/watch?v=QdmZHocZQBk&ab_channel=FernandoRamosL%C3%B3pez) with instructions on how to register, create a reflection and get an ai feedback.

Please note, that this task is an obligatory one for this course and make sure each of you does the reflection, not only one person per group.

#### Please state both names of your group members here:
Authors: Omar Ahmed and Can Aydin

# Assignment 5: Contrastive Learning

## Ex. 5.1 Supervised model baseline

Implement a small supervised ConvNet and train it on [MNIST](https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html) to have a baseline accuracy to compare against later. This helps to evaluate the representation quality later. Try using similar hyperparameters (i.e., Learning rate) for your contrastive learning in 5.2 and 5.3. You may train for 3-5 epochs. **(RESULT)**

In [13]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader, random_split,Dataset
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
torch.manual_seed(42)

<torch._C.Generator at 0x7b4074b810d0>

In [14]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
mnist_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_size = int(0.8 * len(mnist_dataset))
test_size = len(mnist_dataset) - train_size

train_dataset, test_dataset = random_split(mnist_dataset, [train_size, test_size])

batch_size = 32

loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=False)

#### The Neural Network

In [15]:
class ConvNeuralNetwork(nn.Module):
  def __init__(self):
    super(ConvNeuralNetwork, self).__init__()
    self.conv_layers = nn.Sequential(
    nn.Conv2d(1, 8, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size = 2, stride = 2),
    nn.Conv2d(8, 16, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size = 2, stride = 2),
    nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),
    )
    self.linear_layers = nn.Sequential(
    nn.Linear(32 * 7 * 7, 32),
    nn.ReLU(),
    nn.Linear(32,10)
  )

  def forward(self, x):
    x = self.conv_layers(x)
    x = x.view(x.size(0), -1)
    x = self.linear_layers(x)
    return x

In [16]:
model = ConvNeuralNetwork()
optimizer = optim.Adam(model.parameters(), lr = 0.001)
loss_function = nn.CrossEntropyLoss()

In [17]:
train_accuracy = []
epoch_count = []
epochs = 3

for epoch in range(epochs):
    model.train()
    correct = 0
    total = 0

    for data in loader:
        inputs, labels = data
        optimizer.zero_grad()
        y_train = model(inputs)
        loss = loss_function(y_train, labels)
        loss.backward()
        optimizer.step()

        y_train_softmax = F.softmax(y_train,dim=1)
        _, predicted = torch.max(y_train_softmax,1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)
    train_accuracy_r = correct / total
    train_accuracy.append(train_accuracy_r)
    epoch_count.append(epoch+1)
    print(f"Epoch {epoch}, Train Accuracy: {100 * train_accuracy_r:.2f}%")


Epoch 0, Train Accuracy: 73.95%
Epoch 1, Train Accuracy: 95.04%
Epoch 2, Train Accuracy: 96.64%


## Ex. 5.2 Contrastive Learning

Implement a ConvNet to learn representations in a constrastive fashion for the [MNIST](https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html) dataset. 3 Conv layers should be sufficient. You don't need a fully connected layer in the end during training. **(RESULT)**

Test the quality of your representations using a classifier consisting of just one linear layer. What accuracy can you achieve based on your representations? Compare against the accuracy of your supervised model. **(RESULT)**

In [18]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,)),
])
augmentation = transforms.Compose([
    transforms.RandomAffine(degrees = 20, translate = (0.1,0.1), scale = (0.9,1.1)),
    transforms.ToTensor(),
    transforms.Normalize((0.5,),(0.5,))
])

mnist_dataset = datasets.MNIST(root='./data', train=False, download=True, transform = transform)
augmented_dataset = datasets.MNIST(root='./data', train = False, download= True, transform = augmentation)

batch_size = 32

loader = DataLoader(mnist_dataset, batch_size=batch_size, shuffle=False)
augmented_loader = DataLoader(augmented_dataset, batch_size=batch_size, shuffle=False)

In [19]:
class ContrastiveConvNet(nn.Module):
  def __init__(self):
    super(ContrastiveConvNet, self).__init__()
    self.conv_layers = nn.Sequential(
    nn.Conv2d(1, 8, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size = 2, stride = 2),
    nn.Conv2d(8, 16, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size = 2, stride = 2),
    nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),
    )

  def forward(self, x):
    x = self.conv_layers(x)
    x = x.view(x.size(0), -1)
    return x

In [20]:
def simple_contrastive_loss(z_i,z_j,q):
  distance = F.pairwise_distance(z_i,z_j,p=2)
  loss = q * distance - (1-q) * distance
  return loss.mean()

In [21]:
model_1 = ContrastiveConvNet()
optimizer = torch.optim.Adam(model_1.parameters(),lr=0.001)

In [22]:
for epoch in range(epochs):
  total_loss = 0
  for (data,_), (aug_data,_) in zip(loader,augmented_loader):
    optimizer.zero_grad()
    loss = 0

    z_i = model_1(data)
    z_j = model_1(aug_data)

    positive_loss = simple_contrastive_loss(z_i,z_j,torch.ones(data.size(0)))
    loss += positive_loss

    for anchor_index in range(data.size(0)):
      for negative_index in range(data.size(0)):
        if anchor_index != negative_index:
          negative_loss = simple_contrastive_loss(z_i[anchor_index].unsqueeze(0), z_i[negative_index].unsqueeze(0),torch.zeros(1))
          loss += negative_loss

    loss.backward()
    optimizer.step()
    total_loss += loss.item()
  average_loss = total_loss/len(loader)
  print(f"Epoch {epoch+1}, Loss: {average_loss}")


Epoch 1, Loss: -15180371.17985091
Epoch 2, Loss: -290139717.07348245
Epoch 3, Loss: -1291932111.9488819


In [28]:
class LinearClassifier(nn.Module):
  def __init__(self,i,n):
    super(LinearClassifier,self).__init__()
    self.fc = (nn.Linear(i,n))

  def forward(self,x):
    x = self.fc(x)
    return x

In [34]:
classifier = LinearClassifier(1568 ,10)
classifier_optim = torch.optim.Adam(classifier.parameters(),lr=0.001)

In [35]:
model_1.eval()

ContrastiveConvNet(
  (conv_layers): Sequential(
    (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU()
  )
)

In [36]:
for param in model_1.conv_layers.parameters():
    param.requires_grad = False

In [37]:
model_1.eval()
sample_data, _ = next(iter(loader))
sample_output = model_1(sample_data)
print(sample_output.shape)


torch.Size([32, 1568])


In [38]:
correct = 0
total = 0
classifier.eval()
with torch.no_grad():
  for (data,target) in loader:
    representations = model_1(data)
    outputs = classifier(representations)
    _, predicted = torch.max(outputs.data,1)

    total+= target.size(0)
    correct += (predicted == target).sum().item()
accuracy = 100 * correct / total
print(f'Accuracy {accuracy}%')

Accuracy 9.87%


Horrible accuracy


## Ex. 5.3 Contrastive Loss with Margin (BONUS)

Implement a contrastive loss function with margin. Does this improve your representation quality? Check the accuracy with a classifier like in 5.2. Compare your results with those from 5.1 and 5.2. **(RESULT)**

In [39]:
def contrastive_loss(z_ref, z_pos,z_neg,margin = 1.0):
  positive_distance = F.pairwise_distance(z_ref, z_pos, p = 2)
  negative_distance = F.pairwise_distance(z_ref, z_neg, p = 2)
  loss = torch.clamp(positive_distance - negative_distance + margin, min = 0.0)
  return loss.mean()

In [40]:
model_2 = ContrastiveConvNet()
optimizer = torch.optim.Adam(model_1.parameters(),lr=0.001)

In [44]:
for epoch in range(epochs):
    total_loss = 0
    for (data, _), (aug_data, _) in zip(loader, augmented_loader):
        optimizer.zero_grad()
        loss = 0
        z_i = model_2(data)
        z_j = model_2(aug_data)
        for anchor_index in range(data.size(0)):
            z_anchor = z_i[anchor_index].unsqueeze(0)
            z_positive = z_j[anchor_index].unsqueeze(0)

            for negative_index in range(data.size(0)):
                if anchor_index != negative_index:
                    z_negative = z_i[negative_index].unsqueeze(0)

                    negative_loss = contrastive_loss(z_anchor, z_positive, z_negative, margin=1.0)
                    loss += negative_loss
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    average_loss = total_loss / len(loader)
    print(f"Epoch {epoch+1}, Loss: {average_loss}")


Epoch 1, Loss: 775.2120826404315
Epoch 2, Loss: 767.4655979144306
Epoch 3, Loss: 769.4723883680642


In [45]:
correct = 0
total = 0
classifier.eval()
with torch.no_grad():
  for (data,target) in loader:
    representations = model_2(data)
    outputs = classifier(representations)
    _, predicted = torch.max(outputs.data,1)

    total+= target.size(0)
    correct += (predicted == target).sum().item()
accuracy = 100 * correct / total
print(f'Accuracy {accuracy}%')

Accuracy 11.62%


###Model in 5.1 performed the best, but model 5.2 and 5.2 had weak performance, yet model 5.3 has the better out of the two.