# Welcome to assignment #6!

Please submit your solution of this notebook in the Whiteboard at the corresponding Assignment entry. We need you to upload the .ipynb-file and the exported .pdf of this notebook.

If you have any questions, ask them in either in the tutorials or in the "Mattermost" channel. The channel is called SSL_WS_2324, you can join the server using this [Link](https://mattermost.imp.fu-berlin.de/signup_user_complete/?id=h5ssupqokprtpyf4dr7xabiwpc&md=link&sbr=su) and can search for the public channel.


This week we want to to implement a PixelRNN on MNIST.

# PapagAI

From the second week onwards we started the reflective study.
Register on the [PapagAI website](https://www.papag.ai) and write your first reflection about your impressions and challenges in the context of the lectures and tutorials you had this and previous week. The size of reflection can be anywhere bigger than 100 words. You can check out this [YouTube video](https://www.youtube.com/watch?v=QdmZHocZQBk&ab_channel=FernandoRamosL%C3%B3pez) with instructions on how to register, create a reflection and get an ai feedback.

Please note, that this task is an obligatory one for this course and make sure each of you does the reflection, not only one person per group.

#### Please state both names of your group members here:
Authors: Omar Ahmed, Can Aydin

# Assignment 6: PixelRNN

Paper: [https://arxiv.org/pdf/1601.06759.pdf](https://arxiv.org/pdf/1601.06759.pdf) <br>
(Probably) Useful Blogpost: [https://towardsdatascience.com/auto-regressive-generative-models-pixelrnn-pixelcnn-32d192911173](https://towardsdatascience.com/auto-regressive-generative-models-pixelrnn-pixelcnn-32d192911173)

## Ex. 6.1 Architecture

Pixel Recurrent Neural Networks perform a generative task using a recurrent network architecture. Working on images, we usually work with 2 spatial dimensions (+ color channel dimension in case of colored images). The paper presents 2 different versions of LSTM layers to make it work, but you don't have to. You are free to use any other recurrent layer and any library you want. Whatever works best for you.

Implement your model architecture here. **(RESULT)**

In [26]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import random


In [7]:
input_size = 28
sequence_length = 28
num_classes = 10
hidden_size = 256
num_layers = 2
learning_rate = 0.001
batch_size = 64
num_epochs = 2

### BiDirectional LSTM (not required)


In [8]:
class BILSTM(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers, num_classes):
    super (BILSTM, self).__init__()
    self.hidden_size = hidden_size
    self.num_layers = num_layers
    self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first = True,
                        bidirectional = True)
    self.fc = nn.Linear(hidden_size * 2, num_classes)

  def forward(self,x):
    h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size)
    c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size)

    out, _ = self.lstm(x,(h0,c0))
    out = self.fc(out[:,-1,:])

    return out

In [37]:
class yourPixelRNN(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers):
    super(yourPixelRNN, self).__init__()
    self.hidden_size = hidden_size
    self.num_layers = num_layers

    # Row LSTM
    self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first = True)

    # Full FC -> Probability for pixel value
    self.fc = nn.Linear(hidden_size, 1)
  def forward(self,x):
    h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)
    c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size)

    out, _ = self.lstm(x,(h0,c0))
    out = self.fc(out)
    return out



## Ex. 6.2 Training on MNIST

Train your model on the [MNIST](https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html) dataset. **(RESULT)**

In [39]:
train_dataset = datasets.MNIST(root='dataset/', train = True, transform = transforms.ToTensor(), download = True)
train_loader = DataLoader(dataset = train_dataset, batch_size = batch_size, shuffle = True)

test_dataset = datasets.MNIST(root='dataset/', train = False, transform = transforms.ToTensor(), download = True)
test_loader = DataLoader(dataset = test_dataset, batch_size = batch_size, shuffle = True)

model = yourPixelRNN(28, hidden_size, num_layers)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr = learning_rate)

for epoch in range(num_epochs):
  for batch_idx, (data, _) in enumerate(train_loader):
    data = data.reshape(data.size(0), 28, 28)
    data = data.float()
    # Taking every sequence in the batch, for each sequence, take all rows except the last one
    input_sequence = data[:, :-1, :]
    # Taking every pixel after the current
    target_sequence = data[:, 1:, :]

    scores = model(input_sequence)
    loss = criterion(scores, target_sequence)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()



  return F.mse_loss(input, target, reduction=self.reduction)


In [24]:
print(model)

yourPixelRNN(
  (lstm): LSTM(28, 256, num_layers=2, batch_first=True)
  (fc): Linear(in_features=256, out_features=1, bias=True)
)


## Ex. 6.3 Complete some samples

Using your trained model, are you able to complete MNIST samples that are partially masked? Show your best results and compare them to the original. **(RESULT)**

In [55]:
def plot_images(original, masked, completed):
    plt.figure(figsize=(10, 3))
    for i in range(10):
        # Original Image
        plt.subplot(3, 10, i + 1)
        plt.imshow(original[i].reshape(28, 28), cmap='gray')
        plt.axis('on')

        # Masked Image
        plt.subplot(3, 10, i + 11)
        plt.imshow(masked[i].reshape(28, 28), cmap='gray')
        plt.axis('on')

        # Generated Image
        plt.subplot(3, 10, i + 21)
        plt.imshow(completed[i].reshape(28, 28), cmap='gray')
        plt.axis('on')
    plt.show()

model.eval()

# Lets randomly mask the upper or lower half

for images, _ in test_loader:
  masked_images = images.clone()

# Just set the pixel value to 0
  for img in masked_images:
    if random.choice([True,False]):
      img[:, :14, :] = 0
    else:
      img[:, 14:, : ] = 0
  generated_images = model(masked_images)
plot_images(images, masked_images, generated_images)


ValueError: ignored