# HOMEWORK 2 NEURAL NETWORKS AND DEEP LEARNING

---
A.A. 2021/22 (6 CFU) - Dr. Alberto Testolin, Dr. Umberto Michieli
---
Student: Matteo Grandin
---
id: 2020374

# Unsupervised Deep Learning

### General overview
 In this homework you will learn how to implement and test neural network models for
solving unsupervised problems. For simplicity and to allow continuity with the kind of data you have seen
before, the homework will be based on images of FashionMNIST. However, you can optionally explore
different image collections (e.g., Caltech or Cifar) or other datasets based on your interests. The basic tasks
for the homework will require to test and analyze the convolutional autoencoder implemented during the
Lab practice. If you prefer, you can opt for a fully-connected autoencoder, which should achieve similar
performance considering the relatively small size of the FashionMNIST images. As for the previous
homework, you should explore the use of advanced optimizers and regularization methods. Learning
hyperparameters should be tuned using appropriate search procedures, and final accuracy should be
evaluated using a cross-validation setup. More advanced tasks will require the exploration of denoising and
variational / adversarial architectures.

## Convolutional autoencoder
- implement and test (convolutional) autoencoder, reporting the trend of reconstruction loss and
some examples of image reconstruction; explore advanced optimizers and regularization methods

In [1]:
import matplotlib.pyplot as plt # plotting library
import numpy as np # this module is useful to work with numerical arrays
import pandas as pd # this module is useful to work with tabular data
import random # this module will be used to select random samples from a collection
import os # this module will be used just to create directories in the local filesystem
from tqdm import tqdm # this module is useful to plot progress bars
from sklearn.model_selection import KFold # this module is useful to split data into training and test sets

import torch
import torchvision
import torch.nn.functional as F
from torchvision import transforms
from torch.utils.data import DataLoader
from torch import nn
import pickle

In [2]:
## Dataset
# training and validation will be performed on the training dataset
train_dataset = torchvision.datasets.FashionMNIST('data', train=True, download=True)
# test dataset will only be used for evaluating final model performance
test_dataset  = torchvision.datasets.FashionMNIST('data', train=False, download=True)

label_names=['t-shirt','trouser','pullover','dress','coat','sandal','shirt',
             'sneaker','bag','boot']
num_labels = len(label_names)

In [3]:
## Data transformation
train_transform = transforms.Compose([
    # OneHotEncoder(num_classes=10),
    transforms.ToTensor()
])
test_transform = transforms.Compose([
    # OneHotEncoder(num_classes=10),
    transforms.ToTensor()
])

# Set the train transform
train_dataset.transform = train_transform
# Set the test transform
test_dataset.transform = test_transform


In [4]:
## Model definition
class Encoder(nn.Module):
    def __init__(self, encoded_space_dim):
        super().__init__()
        ### Convolutional section
        self.encoder_cnn = nn.Sequential(
            # First convolutional layer
            nn.Conv2d(in_channels= 1, out_channels=8, kernel_size=3, 
                      stride=2, padding=1),
            nn.ReLU(True),
            # Second convolutional layer
            nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, 
                      stride=2, padding=1),
            nn.ReLU(True),
            # Third convolutional layer
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, 
                      stride=2, padding=0),
            nn.ReLU(True)
        )
        ### Flatten layer
        self.flatten = nn.Flatten(start_dim=1)
        ### Linear section
        self.encoder_lin = nn.Sequential(
            # First linear layer

            nn.Linear(in_features=32*3*3, out_features=64),
            nn.ReLU(True),
            # Second linear layer
            nn.Linear(in_features=64, out_features=encoded_space_dim)
        )
    def forward(self, x):
        # Apply convolutions
        x = self.encoder_cnn(x)
        # Flatten
        x = self.flatten(x)
        # # Apply linear layers
        x = self.encoder_lin(x)
        return x

class Decoder(nn.Module):
    def __init__(self, encoded_space_dim):
        super().__init__()
        ### Linear section
        self.decoder_lin = nn.Sequential(
            # First linear layer
            nn.Linear(in_features=encoded_space_dim, out_features=64),
            nn.ReLU(True),
            # Second linear layer
            nn.Linear(in_features=64, out_features=3*3*32),
            nn.ReLU(True)
        )
        ### Unflatten
        self.unflatten = nn.Unflatten(dim=1, unflattened_size=(32, 3, 3))
        ### Convolutional section
        self.decoder_conv = nn.Sequential(
            # First transposed convolution
            nn.ConvTranspose2d(in_channels=32, out_channels=16, kernel_size=3, 
                               stride=2, output_padding=0),
            nn.ReLU(True),
            # Second transposed convolution
            nn.ConvTranspose2d(in_channels=16, out_channels=8, kernel_size=3, 
                               stride=2, padding=1, output_padding=1),
            nn.ReLU(True),
            # Third transposed convolution
            nn.ConvTranspose2d(in_channels=8, out_channels=1, kernel_size=3, 
                               stride=2, padding=1, output_padding=1)
        ) 
    def forward(self, x):
        # Apply linear layers
        x = self.decoder_lin(x)
        # Unflatten
        x = self.unflatten(x)
        # Apply transposed convolutions
        x = self.decoder_conv(x)
        # Apply a sigmoid to force the output to be between 0 and 1 (valid pixel values)
        x = torch.sigmoid(x)
        return x
        

In [5]:
### Training function
def train_epoch(encoder, decoder, device, dataloader, loss_fn, optimizer):
    # Set train mode for both the encoder and the decoder
    encoder.train()
    decoder.train()
    train_loss = []
    # Iterate the dataloader (we do not need the label values, this is unsupervised learning)
    for (image_batch, _) in dataloader: # with "_" we just ignore the labels (the second element of the dataloader tuple)
        # Move tensor to the proper device
        image_batch = image_batch.to(device)
        # Encode data
        encoded_data = encoder(image_batch)
        # Decode data
        decoded_data = decoder(encoded_data)
        # Evaluate loss
        loss = loss_fn(decoded_data, image_batch)
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # Add loss to the list
        train_loss.append(loss.data.detach().cpu().numpy())
    train_loss = np.mean(train_loss)
    #print(f"Batch Train loss: {train_loss}")
    return train_loss

### Testing function
def validate_epoch(encoder, decoder, device, dataloader, loss_fn):
    # Set evaluation mode for encoder and decoder
    encoder.eval()
    decoder.eval()
    with torch.no_grad(): # No need to track the gradients
        # Define the lists to store the outputs for each batch
        conc_out = []
        conc_label = []
        for (image_batch, _) in dataloader:
            # Move tensor to the proper device
            image_batch = image_batch.to(device)
            # Encode data
            encoded_data = encoder(image_batch)
            # Decode data
            decoded_data = decoder(encoded_data)
            # Append the network output and the original image to the lists
            conc_out.append(decoded_data.cpu())
            conc_label.append(image_batch.cpu())
        # Create a single tensor with all the values in the lists
        conc_out = torch.cat(conc_out)
        conc_label = torch.cat(conc_label) 
        # Evaluate global loss
        val_loss = loss_fn(conc_out, conc_label)
        #print(f"Batch Validation loss: {val_loss}")
    return val_loss.data

In [6]:
#useful functions
def plot_losses(train_losses, val_losses):
    plt.figure(figsize=(10,5))
    plt.plot(train_losses, label="Training loss")
    plt.plot(val_losses, label="Validation loss")
    plt.legend()
    plt.show()

def reset_weights(model):
    for layer in model.modules():
        if isinstance(layer, nn.Conv2d) or isinstance(layer, nn.Linear):
            layer.reset_parameters()

## Optimize hyperparameters
- 1 pt: optimize hyperparameters using grid/random search or automatic tuning tools (e.g., Optuna)
- final accuracy should be evaluated using a cross-validation setup (concatenate training and test set like in here, than evaluate accuracy for each fold and take the avg https://www.machinecurve.com/index.php/2021/02/03/how-to-use-k-fold-cross-validation-with-pytorch/)

In [7]:
#decide to load or not the pretrained model
load_good_model = False

#create a folder called training to save the model
if not load_good_model:
    if not os.path.exists('data/training'):
        os.makedirs('data/training')
    #clear the training folder
    if os.listdir('data/training'):
        for f in os.listdir('data/training'):
            os.remove(os.path.join('data/training', f))

In [8]:
#create param combinations for grid search parameters tuning
lr = 1e-3
encoded_space_dim = 2
param_combinations = [[lr, 0, 0, encoded_space_dim],[1e-2, 0,0, 2], [1e-3, 0, 0, 8], [1e-2, 0,0, 8]]
print(len(param_combinations))

4


In [9]:
### Main block
k_folds = 5
num_epochs = 50

# Check if the GPU is available
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
print(f'Selected device: {device}')

train_losses = []
val_losses = []

print(f"Training set: {train_dataset}")

for comb, params in enumerate(param_combinations):
    print("*****************************************************************************")
    print(f"Parameter combination {comb}: {params}")
    lr, par2, par3, encoded_space_dim = params
    ## train the model
    # perform cross validation 
    kfold = KFold(n_splits=k_folds, shuffle=True)
    #initialize trainining and validation losses
    comb_train_losses = np.zeros(num_epochs)
    comb_val_losses = np.zeros(num_epochs)
    for fold, (train_ids, validation_ids) in enumerate(kfold.split(train_dataset)):
        print("___________________________________________________")
        print(f"Fold {fold}")
        # Sample elements randomly from a given list of ids, no replacement.
        train_subsampler = torch.utils.data.SubsetRandomSampler(train_ids)
        validation_subsampler = torch.utils.data.SubsetRandomSampler(validation_ids)
        # dataloaders
        train_dataloader = DataLoader(train_dataset, batch_size=256, sampler=train_subsampler) 
        validation_dataloader = DataLoader(train_dataset, batch_size=256, sampler=validation_subsampler) 
        # initialize models
        enc = Encoder(encoded_space_dim)
        dec = Decoder(encoded_space_dim)
        #reset weights
        enc.apply(reset_weights)
        dec.apply(reset_weights)
        # initialize optimizer
        params_to_optimize = [
            {'params': enc.parameters()},
            {'params': dec.parameters()}
        ]
        optim = torch.optim.Adam(params_to_optimize, lr=lr, weight_decay=1e-5)
        # initialize loss function
        loss_fn = nn.MSELoss()
        # move to device
        enc.to(device)
        dec.to(device)
        # train the model
        i_train_losses = []
        i_val_losses = []
        for epoch in tqdm(range(num_epochs)):
            #print(f"Epoch {epoch+1}/{num_epochs}")
            #train
            epoch_train_loss = train_epoch(enc, dec, device, train_dataloader, loss_fn, optim)
            #validate
            epoch_val_loss = validate_epoch(enc, dec, device, validation_dataloader, loss_fn)
            # store losses
            i_train_losses.append(epoch_train_loss)
            i_val_losses.append(epoch_val_loss)
            # save model
            torch.save(enc.state_dict(), f"data/training/encoder_{comb}_{fold}_{epoch}.pt")
            torch.save(dec.state_dict(), f"data/training/decoder_{comb}_{fold}_{epoch}.pt")
        
        comb_train_losses += np.array(i_train_losses)/k_folds
        comb_val_losses += np.array(i_val_losses)/k_folds
    
    plot_losses(comb_train_losses, comb_val_losses)

    # train and validation loss for paramters combination
    comb_train_loss = comb_train_losses[-1] # last epoch
    comb_val_loss = comb_val_losses[-1] # last epoch
    print(f"\n\n\nCombination {comb} Train loss: {comb_train_loss}")
    print(f"Combination {comb} Validation loss: {comb_val_loss}")

# save losses
with open(f"data/training/losses_{comb}.pkl", 'wb') as f:
    pickle.dump([train_losses, val_losses], f)

        

Selected device: cuda
Training set: Dataset FashionMNIST
    Number of datapoints: 60000
    Root location: data
    Split: Train
*****************************************************************************
Parameter combination 0: [0.001, 0, 0, 2]
___________________________________________________
Fold 0


  0%|          | 0/50 [00:00<?, ?it/s]

## Supervised fine tuning and comparison with supervised methods
- 1 pt: fine-tune the (convolutional) autoencoder using a supervised classification task, and compare
classification accuracy and learning speed with results achieved in Homework 1

## Latent space exploration and generation of new samples
- 2 pt: explore the latent space structure (e.g., PCA, t-SNE) and generate new samples from latent codes

## Variational autoencoder / GAN / SimCLR
- 2 pt: implement and test variational (convolutional) autoencoder or GAN or SimCLR