# Implementing a Simple DCGAN in Pytorch
> Here is a simple DCGAN implementation for generating data based on the CIFAR-10 dataset. I've mainly done this to try out logging and experiment tracking using Weights & Biases

- toc: true 
- badges: true
- comments: true
- categories: [Python, Computer Vision, Deep Learning, GANs, Pytorch]
- hide: false
- image: images/dcgan.png

## Overview
Below is a short overview of the key features of a DCGAN 

### Basic GAN Loss Function: 


$\underset{G}{\text{min}} \underset{D}{\text{max}}V(D,G) = \mathbb{E}_{x\sim p_{data}(x)}\big[logD(x)\big] + \mathbb{E}_{z\sim p_{z}(z)}\big[log(1-D(G(z)))\big]$



### Special Features of the DCGAN:


*  Explicitly uses convolutional layers in the discriminator and transposed-convolutional layers in the  generator
*  Further the discriminator uses batch norm layers and `LeakyReLU` activations while the generator uses `ReLU` activations
* The input is a latent vector drawn from a standard normal distribution and the output is a `3 x 32 x 32` RGB image
* In this implementation, I also added in [label smoothing](https://towardsdatascience.com/what-is-label-smoothing-108debd7ef06)
* More details are in the [paper](https://arxiv.org/pdf/1511.06434.pdf)

### Implementation Details:
I've borrowed the majority of code for this from the wonderful Pytorch tutorial [here](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html) but have made a couple of tweaks. The first of these is the structure of the generator and discriminator. Since the CIFAR-10 dataset has images of size `32 x 32`, the output size of the generator and input size of the discriminator have to be changed. Secondly, I added in label smoothing to help with the stability of the training process. 

## Requirements

In [0]:
#collapse-hide
#Author: Sairam Sundaresan
#Version: 1.0
#Date May 1, 2020
# Preliminaries
# WandB – Install the W&B library
%pip install wandb -q

In [0]:
from __future__ import print_function
import argparse
import random # to set the python random seed
%matplotlib inline
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.utils.data
import torchvision.utils as vutils
import torch.optim as optim
from torchvision import datasets, transforms
# Ignore excessive warnings
import logging
logging.propagate = False 
logging.getLogger().setLevel(logging.ERROR)

# Set random seed for reproducibility
manualSeed = 42
random.seed(manualSeed)
torch.manual_seed(manualSeed)

# WandB – Import the wandb library
import wandb
wandb.login()
wandb.init(project="dcgan") # Change the project name based on your W & B account

## Parameters of Interest
Note that the Pytorch tutorial [referenced below](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html) is designed for the **Celebrity faces** dataset and produces `64 x 64` images. I've tweaked the network architecture to produce `32 x 32` images as corresponding to the **CIFAR-10** dataset. The parameters below reflect the same. 

In [0]:
# Number of workers for dataloader
workers = 2

# Batch size during training
batch_size = 128

# Spatial size of training images. All images will be resized to this
#   size using a transformer.
image_size = 32

# Number of channels in the training images. For color images this is 3
nc = 3

# Size of z latent vector (i.e. size of generator input)
nz = 100

# Size of feature maps in generator
ngf = 64

# Size of feature maps in discriminator
ndf = 64

# Number of training epochs
num_epochs = 30

# Learning rate for optimizers
lr = 0.0002

# Beta1 hyperparam for Adam optimizers
beta1 = 0.5

# Number of GPUs available. Use 0 for CPU mode.
ngpu = 1

## Model Definition
Let's define a generator and discriminator first. Weight initialization is a key factor in being able to produce a decent GAN and as per the paper, the weights are drawn from a _normal_ distribution with `0` mean and a standard-deviation of `0.02`. Also note that unlike in the original pytorch tutorial, I've removed one layer from the generator (at the end) and from the discriminator (at the beginning) to accomodate the CIFAR-10 dataset.

In [0]:
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

In [0]:
# Generator
class Generator(nn.Module):
    def __init__(self, ngpu):
        super(Generator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            nn.ConvTranspose2d( ngf * 2, nc, 4, 2, 1, bias=False),
            nn.Tanh()
        )

    def forward(self, input):
        return self.main(input)

In [0]:
# Discriminator
class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(ndf * 4, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        return self.main(input)

## Defining the Training Function
The training function first trains the discriminator and then the generator as shown below. Note that by setting the real label value to `0.9` and the fake label value to `0.1`, I've applied label smoothing which has been shown to improve the results produced by the GAN.

In [0]:
def train(args, gen, disc, device, dataloader, optimizerG, optimizerD, criterion, epoch, iters):
  gen.train()
  disc.train()
  img_list = []
  fixed_noise = torch.randn(64, config.nz, 1, 1, device=device)

  # Establish convention for real and fake labels during training (with label smoothing)
  real_label = 0.9
  fake_label = 0.1
  for i, data in enumerate(dataloader, 0):

      #*****
      # Update Discriminator
      #*****
      ## Train with all-real batch
      disc.zero_grad()
      # Format batch
      real_cpu = data[0].to(device)
      b_size = real_cpu.size(0)
      label = torch.full((b_size,), real_label, device=device)
      # Forward pass real batch through D
      output = disc(real_cpu).view(-1)
      # Calculate loss on all-real batch
      errD_real = criterion(output, label)
      # Calculate gradients for D in backward pass
      errD_real.backward()
      D_x = output.mean().item()

      ## Train with all-fake batch
      # Generate batch of latent vectors
      noise = torch.randn(b_size, config.nz, 1, 1, device=device)
      # Generate fake image batch with G
      fake = gen(noise)
      label.fill_(fake_label)
      # Classify all fake batch with D
      output = disc(fake.detach()).view(-1)
      # Calculate D's loss on the all-fake batch
      errD_fake = criterion(output, label)
      # Calculate the gradients for this batch
      errD_fake.backward()
      D_G_z1 = output.mean().item()
      # Add the gradients from the all-real and all-fake batches
      errD = errD_real + errD_fake
      # Update D
      optimizerD.step()

      #*****
      # Update Generator
      #*****
      gen.zero_grad()
      label.fill_(real_label)  # fake labels are real for generator cost
      # Since we just updated D, perform another forward pass of all-fake batch through D
      output = disc(fake).view(-1)
      # Calculate G's loss based on this output
      errG = criterion(output, label)
      # Calculate gradients for G
      errG.backward()
      D_G_z2 = output.mean().item()
      # Update G
      optimizerG.step()

      # Output training stats
      if i % 50 == 0:
          print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                % (epoch, args.epochs, i, len(dataloader),
                    errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
          wandb.log({
              "Gen Loss": errG.item(),
              "Disc Loss": errD.item()})

      # Check how the generator is doing by saving G's output on fixed_noise
      if (iters % 500 == 0) or ((epoch == args.epochs-1) and (i == len(dataloader)-1)):
          with torch.no_grad():
              fake = gen(fixed_noise).detach().cpu()
          img_list.append(wandb.Image(vutils.make_grid(fake, padding=2, normalize=True)))
          wandb.log({
              "Generated Images": img_list})
      iters += 1

## Monitoring the Run
Once we have all the pieces in place, all we need to do is train the model and watch it learn.

In [0]:
#hide-collapse
wandb.watch_called = False 
# WandB – Config is a variable that holds and saves hyperparameters and inputs
config = wandb.config          # Initialize config
config.batch_size = batch_size 
config.epochs = num_epochs         
config.lr = lr              
config.beta1 = beta1
config.nz = nz          
config.no_cuda = False         
config.seed = manualSeed # random seed (default: 42)
config.log_interval = 10 # how many batches to wait before logging training status

def main():
    use_cuda = not config.no_cuda and torch.cuda.is_available()
    device = torch.device("cuda" if use_cuda else "cpu")
    kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
    
    # Set random seeds and deterministic pytorch for reproducibility
    random.seed(config.seed)       # python random seed
    torch.manual_seed(config.seed) # pytorch random seed
    np.random.seed(config.seed) # numpy random seed
    torch.backends.cudnn.deterministic = True

    # Load the dataset
    transform = transforms.Compose(
        [transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    trainset = datasets.CIFAR10(root='./data', train=True,
                                            download=True, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=config.batch_size,
                                              shuffle=True, num_workers=workers)

    # Create the generator
    netG = Generator(ngpu).to(device)

    # Handle multi-gpu if desired
    if (device.type == 'cuda') and (ngpu > 1):
        netG = nn.DataParallel(netG, list(range(ngpu)))

    # Apply the weights_init function to randomly initialize all weights
    #  to mean=0, stdev=0.2.
    netG.apply(weights_init)

    # Create the Discriminator
    netD = Discriminator(ngpu).to(device)

    # Handle multi-gpu if desired
    if (device.type == 'cuda') and (ngpu > 1):
        netD = nn.DataParallel(netD, list(range(ngpu)))

    # Apply the weights_init function to randomly initialize all weights
    #  to mean=0, stdev=0.2.
    netD.apply(weights_init)

    # Initialize BCELoss function
    criterion = nn.BCELoss()

    # Setup Adam optimizers for both G and D
    optimizerD = optim.Adam(netD.parameters(), lr=config.lr, betas=(config.beta1, 0.999))
    optimizerG = optim.Adam(netG.parameters(), lr=config.lr, betas=(config.beta1, 0.999))
    
    # WandB – wandb.watch() automatically fetches all layer dimensions, gradients, model parameters and logs them automatically to your dashboard.
    # Using log="all" log histograms of parameter values in addition to gradients
    wandb.watch(netG, log="all")
    wandb.watch(netD, log="all")
    iters = 0
    for epoch in range(1, config.epochs + 1):
        train(config, netG, netD, device, trainloader, optimizerG, optimizerD, criterion, epoch, iters)
        
    # WandB – Save the model checkpoint. This automatically saves a file to the cloud and associates it with the current run.
    torch.save(netG.state_dict(), "model.h5")
    wandb.save('model.h5')

if __name__ == '__main__':
    main()

Files already downloaded and verified
[1/30][0/391]	Loss_D: 1.5330	Loss_G: 2.0065	D(x): 0.4798	D(G(z)): 0.4893 / 0.1298
[1/30][50/391]	Loss_D: 0.8715	Loss_G: 4.9647	D(x): 0.7946	D(G(z)): 0.3070 / 0.0045
[1/30][100/391]	Loss_D: 0.7232	Loss_G: 3.8356	D(x): 0.8442	D(G(z)): 0.1959 / 0.0156
[1/30][150/391]	Loss_D: 0.8965	Loss_G: 2.2287	D(x): 0.7690	D(G(z)): 0.2954 / 0.0948
[1/30][200/391]	Loss_D: 0.9530	Loss_G: 1.8095	D(x): 0.6412	D(G(z)): 0.1987 / 0.1468
[1/30][250/391]	Loss_D: 0.9138	Loss_G: 2.8588	D(x): 0.6709	D(G(z)): 0.1570 / 0.0513
[1/30][300/391]	Loss_D: 0.8062	Loss_G: 2.6343	D(x): 0.7680	D(G(z)): 0.1848 / 0.0608
[1/30][350/391]	Loss_D: 1.1042	Loss_G: 1.6227	D(x): 0.5420	D(G(z)): 0.1842 / 0.1997
[2/30][0/391]	Loss_D: 1.0089	Loss_G: 1.7177	D(x): 0.5759	D(G(z)): 0.2028 / 0.1679
[2/30][50/391]	Loss_D: 0.9720	Loss_G: 2.1761	D(x): 0.7768	D(G(z)): 0.3881 / 0.0989
[2/30][100/391]	Loss_D: 0.9278	Loss_G: 2.0908	D(x): 0.7203	D(G(z)): 0.2952 / 0.1097
[2/30][150/391]	Loss_D: 0.8625	Loss_G: 2.261

## Loss Curve and Results
For a few lines of code, the GAN produces pretty decent images in 30 epochs as you can see below.

### Loss
![](images_for_posts/LossCurve.png "Loss Curves")
### Images
![](images_for_posts/dcgan.png "Images after 30 epochs")

## References
1. DCGAN Pytorch Tutorial: https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html