# 5. Generative adversarial networks

- Used as part of INFO8010 Deep Learning (Gilles Louppe, 2018-2019).
- Originally adapted from [Pytorch tutorial for Deep Learning researchers](https://github.com/yunjey/pytorch-tutorial) (Yunvey Choi, 2018).

---

In [1]:
import os
import torch
import torchvision
import torch.nn as nn
import tensorboardX

from torchvision import transforms
from torchvision.utils import save_image

In [None]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters

The hyperparameters that are needed for successfully training your first GAN

In [None]:
latent_size = 64
hidden_size = 256
image_size = 784
num_epochs = 200
batch_size = 100
sample_dir = 'samples'

# Data

We define a data generator as usual which will normalize the input data before feeding it to the model. 

In [None]:
# Create a directory if not exists
if not os.path.exists(sample_dir):
    os.makedirs(sample_dir)

# Image processing
transform = transforms.Compose([
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.5],   # 3 for RGB channels
                                     std=[0.5])])

# MNIST dataset
mnist = torchvision.datasets.MNIST(root='./data/',
                                   train=True,
                                   transform=transform,
                                   download=True)

# Data loader
data_loader = torch.utils.data.DataLoader(dataset=mnist,
                                          batch_size=batch_size, 
                                          shuffle=True)

<div class="alert alert-success">
<b>EXERCISE</b>:

Define the neural architectures by using the nn.Sequential() container.
Remember these are the building blocks of any neural architecture, so you should consider:
<ul>
    <li>The amount of hidden layers and units </li>
    <li>Non-linearities between layers</li>
    <li>Final activation function</li>
</ul>
   
Pay special attention to the last activation function of your models.
</div>

In [None]:
# Discriminator
D = nn.Sequential(
   


    )

# Generator 
G = nn.Sequential(
   

    )

In [None]:
# Tensorboard 
writer = tensorboardX.SummaryWriter()

In [None]:
# Device setting
D = D.to(device)
G = G.to(device)

<div class="alert alert-success">
<b>EXERCISE</b>:

Define the loss function you would like to minimize and two appropriate optimizers with reasonable learning rates.
Remember what has been discussed during the lecture.

</div>


In [None]:
criterion = 
d_optimizer = 
g_optimizer = 

# Utils

Here are some auxiliary functions which will make your training easier.
<ul>
    <li>The first function transforms the output of your generator into a meaningful image.</li>
    <li>The second function will avoid the accumulation of the gradients during training: you will have to call this function yourself!</li>
</ul>


In [None]:
# Utils
def denorm(x):
    out = (x + 1) / 2
    return out.clamp(0, 1)

def reset_grad():
    d_optimizer.zero_grad()
    g_optimizer.zero_grad()

<div class="alert alert-success">
<b>EXERCISE</b>:


Here are the steps you need to define if you want a proper training loop:

<ul>
    <li>Start by defining the labels that are used at training time </li>
    <li>Evaluate the performance of the discriminator based on the different inputs that it requires:</li>
    
    <ul>
        <li> Remember that it has access to two things, therefore be sure to use them both and to compute its final loss accordingly </li>
    </ul>

   <li> Make a proper use of the tensorboard methods that are being used, what do they expect as input? </li>
    
</ul>

</div>

In [None]:
total_step = len(data_loader)

# Add appropriate labels

for epoch in range(num_epochs):
    for i, (images, _) in enumerate(data_loader):
        images = images.reshape(batch_size, -1).to(device)
        
        ## Train your discriminator

        outputs = D(?)
        
        d_loss_real = criterion()
        real_score = outputs
        
        z = ?
        
        fake_images = G(?)
        outputs = D(fake_images)
        d_loss_fake = criterion()
        fake_score = outputs
        
        # Backprop and optimize the discriminator
        d_loss = ?
        d_loss.backward()
        d_optimizer.step()
        
        ## Train your generator
        
        fake_images = G(?)
        
        writer.add_graph(?)
          
        outputs = D(?)
        
        g_loss = criterion(?, ?)
        
        # Backprop and optimize
        g_loss.backward()
        g_optimizer.step()
        
        if (i+1) % 200 == 0:
            print('Epoch [{}/{}], Step [{}/{}], d_loss: {:.4f}, g_loss: {:.4f}, D(x): {:.2f}, D(G(z)): {:.2f}' 
                  .format(epoch, num_epochs, i+1, total_step, d_loss.item(), g_loss.item(), 
                          real_score.mean().item(), fake_score.mean().item()))
    
    # at the end of each epoch we store the behaviour of the loss

    writer.add_scalar(?)
    
    # Save real images
    if (epoch+1) == 1:
        images = images.reshape(images.size(0), 1, 28, 28)
        save_image(denorm(images), os.path.join(sample_dir, 'real_images.png'))
    
        
        writer.add_image(?)
    
    # Save sampled images
    fake_images = fake_images.reshape(fake_images.size(0), 1, 28, 28)
    save_image(denorm(fake_images), os.path.join(sample_dir, 'fake_images-{}.png'.format(epoch+1)))

    writer.add_image(?)
    
# Save the model checkpoints 
torch.save(G.state_dict(), 'G.ckpt')
torch.save(D.state_dict(), 'D.ckpt')

# When you think you are done ...

Training this model on your own machine can be particularly expensive if you do not have a GPU.
Therefore you can try to run your notebook on [Google-Colab](https://colab.research.google.com/notebooks/welcome.ipynb).

<div class="alert alert-success">
<b>EXERCISE</b>:


Investigate what happens when using always the same set of images within a training batch.
Monitor the behaviour of the loss and the resulting generated samples in tensorboard.
Which phenomenon do you observe?

</div>

<div class="alert alert-success">
<b>EXERCISE</b>:
    
Now instead of using the MNIST dataset try to train a GAN on the images coming from the [Cifar-10]()
dataset. In this case you should define an architecture based on convolutional layers.

</div>