<a href="https://colab.research.google.com/github/MatchLab-Imperial/deep-learning-course/blob/master/07_VAE_GAN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Coursework



## Task 1: MNIST generation using VAE and GAN

**Report**
* Train the given VAE model in the tutorial also using the same hyper parameters (batch size, optimizer, number of epochs, etc...) but increasing the latent dimensionality to 10. Compute the MSE on the reconstructed `x_test` images and Inception Score (IS). Now train the same model without the KL divergence loss, and compute again the MSE and IS score. Report the results in a table and discuss them. For the IS you can use in this case `compute_inception_score(model, 10, denormalize=False)`.

* Train the GAN given in the tutorial with increased dimensionality of the initial random sample to 10 for 10 epochs and report its IS (use the same table as in the VAE case). Discuss the difference in the obtained IS for VAE and GAN and link it to the qualitative results. For the IS use in this case `compute_inception_score(model, 10, denormalize=True)`.



## Task 2: Quantitative VS Qualitative Results

In this task, we will observe the difference between two trained models for colouring images. One is the model trained during the tutorial, which uses a cGAN approach to predict the RGB pixel-wise values of a B&W image. The other one is a simple UNet autoencoder trained with a Mean Absolute Error (MAE) loss, which is trained to predict directly the RBG image without any GAN based learning strategy. We refer to the first and second models as cGAN and MAE models, respectively. For this task, 20 epochs trained weights for the cGAN and MAE models are provided. If desired, the code to train the MAE model can be found below:

In [None]:
set_seed(42)

# Training Hyperparameters
img_shape = (32, 32)
num_epochs = 20
batch_size = 128
lr = 2.0e-4
betas = (0.5, 0.999)

# Data
train_dataset = Cifar('CIFAR10', train=True)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

# Model, optimizer and criterion (MAE)
generator_mae = CGenerator().to(DEVICE)
optimizer_mae = torch.optim.Adam(generator_mae.parameters(), lr=2e-4, betas=(0.5, 0.999))
criterion_mae = nn.L1Loss()

for epoch in range(num_epochs):
    g_avg_loss = []

    for batch_i, (imgs_rgb, imgs_bw) in enumerate(train_loader):
        # Move data to device
        imgs_rgb, imgs_bw = imgs_rgb.to(DEVICE), imgs_bw.to(DEVICE)

        # Generate fake rgb images
        optimizer_mae.zero_grad()
        imgs_rgb_generated = generator_mae(imgs_bw)
        g_loss = criterion_mae(imgs_rgb_generated, imgs_rgb)

        # Backward pass and optimize
        g_loss.backward()
        optimizer_mae.step()

        g_avg_loss.append(g_loss.item())

        # Plot examples
        if batch_i % 50 == 0:
            with torch.no_grad():
                show_colored_images(imgs_rgb, imgs_bw, imgs_rgb_generated)

        # Print progress
        if batch_i % 10 == 0:
            print(f"[Epoch {epoch}/{num_epochs}] [Batch {batch_i}/{len(train_loader)}] "
                  f"[G loss: {np.mean(g_avg_loss):.4f}] ")

# Save model
torch.save(generator_mae.state_dict(), 'generator_mae.pth')

Instead of training the models, we can directly load their pre-trained weights by running:

In [None]:
!wget https://github.com/MatchLab-Imperial/deep-learning-course/raw/master/asset/07_VAE_GAN/cgan_cifar10_epoch20.pth
!wget https://github.com/MatchLab-Imperial/deep-learning-course/raw/master/asset/07_VAE_GAN/cgenerator_mae_cifar10_epoch20.pth

In [None]:
# Pre-trained CGAN
generator_cGAN = CGAN().to(DEVICE)
generator_cGAN.load_state_dict(torch.load('cgan_cifar10_epoch20.pth'))
generator_cGAN.eval()

# Pre-trained MAE model
generator_mae = CGenerator().to(DEVICE)
generator_mae.load_state_dict(torch.load('cgenerator_mae_cifar10_epoch20.pth'))
generator_mae.eval()

We have loaded both models, and we are ready to compare them. In this task, you are asked to analyse the difference between the quantitative versus the qualitative results. To do so, we provided two pieces of code. The first one will compute the MAE metric for both models in the test dataset. As we know, this metric is widely used on image generation tasks, such as image upsampling, image reconstruction, image translation, and so on.

In [None]:
# Data
test_dataset = Cifar('CIFAR10', train=False)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False)

# Loss
l1_mae, l1_cgan = [], []

with torch.no_grad():  # Disable gradient calculation
    for imgs_rgb, imgs_bw in test_loader:
        imgs_rgb, imgs_bw = imgs_rgb.to(DEVICE), imgs_bw.to(DEVICE)
        imgs_rgb_generated_mae = generator_mae(imgs_bw)
        imgs_rgb_generated_cgan = generator_cGAN(imgs_bw)

        l1_mae.append(F.l1_loss(imgs_rgb_generated_mae, imgs_rgb).item())
        l1_cgan.append(F.l1_loss(imgs_rgb_generated_cgan, imgs_rgb).item())

print("MAE (Trained MAE): {:.4f}".format(np.mean(l1_mae)))
print("MAE (Trained cGAN): {:.4f}".format(np.mean(l1_cgan)))

The next piece of code will show coloured examples for both networks, so you can check them visually and discuss which model is better. First, we need to create an iterator object to go through the test dataset:

In [None]:
# Change seed and num_samples here to see different samples
set_seed(42)
num_samples = 3

# Data
test_dataset = Cifar('CIFAR10', train=False)
test_loader = DataLoader(test_dataset, batch_size=1, shuffle=True)

for _ in range(num_samples):
    img_rgb, img_bw = next(iter(test_loader))
    img_rgb, img_bw = img_rgb.to(DEVICE), img_bw.to(DEVICE)

    with torch.no_grad():
        img_rgb_generated_mae = generator_mae(img_bw)
        img_rgb_generated_cgan = generator_cGAN(img_bw)
        show_colored_two_models(img_rgb, img_bw, img_rgb_generated_mae, img_rgb_generated_cgan)

We showed that both models obtain a similar MAE value. If we would only take into account the quantitative metric, as done in many scientific articles, we would say that the MAE model is better. However, in addition to the quantitative results, we need to analyse visually the results produced by the two networks to declare which is the best model.

**Report**


*   Run the previous code to analyse several coloured images for both models. Based on previous results and linked to GAN theory, discuss from the numerical and visual perspective if both models are similar, or whether there is a better one. You can provide in the report visual examples together with their MAE values to support your arguments. The figure of this task can be included in the Appendix. Discussion still needs to go into the main text.