# DCGAN

A DCGAN is a direct extension of the GAN described above, except that it explicitly uses convolutional and convolutional-transpose layers in the discriminator and generator, respectively. It was first described by Radford et. al. in the paper Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks. The discriminator is made up of strided convolution layers, batch norm layers, and LeakyReLU activations. The input is a 3x64x64 input image and the output is a scalar probability that the input is from the real data distribution. The generator is comprised of convolutional-transpose layers, batch norm layers, and ReLU activations. The input is a latent vector, z, that is drawn from a standard normal distribution and the output is a 3x64x64 RGB image. The strided conv-transpose layers allow the latent vector to be transformed into a volume with the same shape as an image. In the paper, the authors also give some tips about how to setup the optimizers, how to calculate the loss functions, and how to initialize the model weights, all of which will be explained in the coming sections.

In [1]:
from __future__ import print_function
#%matplotlib inline
import argparse
import os
import random
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML

# Set random seed for reproducibility
manualSeed = 999
#manualSeed = random.randint(1, 10000) # use if you want new results
print("Random Seed: ", manualSeed)
random.seed(manualSeed)
torch.manual_seed(manualSeed)

  from .autonotebook import tqdm as notebook_tqdm


Random Seed:  999


<torch._C.Generator at 0x10b3b67b0>

## Inputs
------

Letâ€™s define some inputs for the run:

-  **dataroot** - the path to the root of the dataset folder. We will
   talk more about the dataset in the next section
-  **workers** - the number of worker threads for loading the data with
   the DataLoader
-  **batch_size** - the batch size used in training. The DCGAN paper
   uses a batch size of 128
-  **image_size** - the spatial size of the images used for training.
   This implementation defaults to 64x64. If another size is desired,
   the structures of D and G must be changed. See
   `here <https://github.com/pytorch/examples/issues/70>`__ for more
   details
-  **nc** - number of color channels in the input images. For color
   images this is 3
-  **nz** - length of latent vector
-  **ngf** - relates to the depth of feature maps carried through the
   generator
-  **ndf** - sets the depth of feature maps propagated through the
   discriminator
-  **num_epochs** - number of training epochs to run. Training for
   longer will probably lead to better results but will also take much
   longer
-  **lr** - learning rate for training. As described in the DCGAN paper,
   this number should be 0.0002
-  **beta1** - beta1 hyperparameter for Adam optimizers. As described in
   paper, this number should be 0.5
-  **ngpu** - number of GPUs available. If this is 0, code will run in
   CPU mode. If this number is greater than 0 it will run on that number
   of GPUs




In [2]:
# Root directory for dataset
dataroot = "data/celeba"

# Number of workers for dataloader
workers = 2

# Batch size during training
batch_size = 128

# Spatial size of training images. All images will be resized to this
#   size using a transformer.
image_size = 64

# Number of channels in the training images. For color images this is 3
nc = 3

# Size of z latent vector (i.e. size of generator input)
nz = 100

# Size of feature maps in generator
ngf = 64

# Size of feature maps in discriminator
ndf = 64

# Number of training epochs
num_epochs = 5

# Learning rate for optimizers
lr = 0.0002

# Beta1 hyperparam for Adam optimizers
beta1 = 0.5

# Number of GPUs available. Use 0 for CPU mode.
ngpu = 1

## Dataset

The Celeb A dataset is a large-scale face attributes dataset consisting of more than 200,000 celebrity images. It is commonly used as a benchmark for image classification, face recognition, and generative modeling tasks. Each image in the dataset is annotated with 40 attributes such as gender, age, and facial hair.

The Celeb A dataset can be downloaded from various sources such as the official website or through PyTorch's torchvision module. The images are in JPEG format and have a resolution of 178x218 pixels. The dataset also comes with a pre-defined training and validation split, which is useful for training and evaluating machine learning models.

This code block sets up a PyTorch dataloader for the Celeb A dataset. The dataset is created using the ImageFolder class from the torchvision.datasets module. The ImageFolder class expects the images to be organized in subdirectories, where each subdirectory represents a different class. In this case, the root parameter is set to dataroot, which is the directory containing the Celeb A dataset. The transform parameter is set to a series of image transformations, which include resizing, center cropping, converting to a tensor, and normalizing the pixel values to a range of [-1, 1]. The resulting dataset is then loaded into a dataloader using the DataLoader class from the torch.utils.data module. The dataloader is used to feed batches of images to the neural network during training.

The code block also sets the device to run on either the GPU or CPU depending on the availability of a GPU and the value of ngpu. Finally, the code block plots a batch of training images using the matplotlib.pyplot module. The vutils.make_grid() function is used to create a grid of images from the batch, which is then displayed using the matplotlib.pyplot.imshow() function.

In [None]:
# We can use an image folder dataset the way we have it setup.
# Create the dataset
dataset = dset.ImageFolder(root=dataroot,
                           transform=transforms.Compose([
                               transforms.Resize(image_size),
                               transforms.CenterCrop(image_size),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                           ]))
# Create the dataloader
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
                                         shuffle=True, num_workers=workers)

# Decide which device we want to run on
device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")

# Plot some training images
real_batch = next(iter(dataloader))
plt.figure(figsize=(8,8))
plt.axis("off")
plt.title("Training Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=2, normalize=True).cpu(),(1,2,0)))

## Implementation

This code block defines a function weights_init that is used to initialize the weights of the generator and discriminator neural networks. The function takes a module m as input and checks its class name using the __class__.__name__ attribute. If the class name contains the string 'Conv', the weights are initialized using a normal distribution with mean 0 and standard deviation 0.02. If the class name contains the string 'BatchNorm', the weights are initialized with a normal distribution with mean 1 and standard deviation 0.02, and the biases are set to 0. The nn.init module from PyTorch is used to perform the weight initialization.

Weight initialization is an important step in training neural networks as it can greatly affect the performance of the model. Initializing the weights with small random values prevents the model from getting stuck in a poor solution during training. The specific method used for weight initialization can vary depending on the architecture and the problem being solved.

In [None]:
# custom weights initialization called on netG and netD
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

## Generator

This code block defines the generator neural network architecture for the DCGAN. The generator takes a random noise vector of size nz as input and produces an image of size nc x 64 x 64 as output.

The architecture consists of a series of transpose convolutional layers, also known as deconvolutional layers, that gradually increase the spatial resolution of the feature maps. Each transpose convolutional layer is followed by batch normalization and a rectified linear unit (ReLU) activation function. The output layer uses a tanh activation function to ensure that the pixel values of the generated image are in the range of -1 to 1, which is the same range as the real images in the CelebA dataset.

The architecture uses the ngf and nz hyperparameters, which control the number of generator filters and the size of the input noise vector, respectively. The number of filters is doubled at each layer until the desired image size is reached.

The ngpu hyperparameter specifies the number of GPUs to use for training. If ngpu > 1, the model will use parallel processing across multiple GPUs. The weights_init function defined earlier can be used to initialize the weights of the convolutional and batch normalization layers in the generator.

The forward method takes a batch of random noise vectors as input and passes it through the layers to generate the output image. The main sequential module defines the layers of the generator network. The transpose convolutional layers increase the spatial resolution of the feature maps, while the batch normalization layers help stabilize the learning process. The ReLU activation function is used to introduce non-linearity in the network.

The Tanh activation function is used in the last layer of the generator to ensure that the output image has pixel values in the range of -1 to 1, which is the same range as the real images in the CelebA dataset.

In [None]:
# Generator Code

class Generator(nn.Module):
    def __init__(self, ngpu):
        super(Generator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8
            nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16
            nn.ConvTranspose2d( ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 32 x 32
            nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh()
            # state size. (nc) x 64 x 64
        )

    def forward(self, input):
        return self.main(input)

The Generator class is initialized with the number of GPUs to be used, which is specified by the ngpu argument. The netG object is created as an instance of this class, and moved to the device specified earlier.

If multiple GPUs are available and ngpu is greater than 1, nn.DataParallel is used to parallelize the model across multiple GPUs.

The weights_init function is applied to the generator to initialize all weights in the network.

Finally, the print function is used to display the architecture of the generator network.

In [None]:
# Create the generator
netG = Generator(ngpu).to(device)

# Handle multi-gpu if desired
if (device.type == 'cuda') and (ngpu > 1):
    netG = nn.DataParallel(netG, list(range(ngpu)))

# Apply the weights_init function to randomly initialize all weights
#  to mean=0, stdev=0.2.
netG.apply(weights_init)

# Print the model
print(netG)

## Discriminator

This is the implementation of the Discriminator neural network used in the DCGAN architecture.

The Discriminator takes an image as input and tries to distinguish if the input image is real or fake. It is implemented as a convolutional neural network with four convolutional layers, each with batch normalization and leaky ReLU activation function. The final layer produces a single scalar output with the Sigmoid function applied, which represents the probability of the input being real or fake.

The hyperparameters ndf and nc represent the number of filters and input channels of the first convolutional layer, respectively, and ngpu represents the number of GPUs available for training.

The nn.Sequential module is used to define the layers of the Discriminator. Each layer is a tuple consisting of the convolutional layer, batch normalization, and leaky ReLU activation function. The first convolutional layer has a kernel size of 4, a stride of 2, and a padding of 1. The subsequent convolutional layers double the number of filters while keeping the same kernel size and stride. The final convolutional layer produces a single scalar output.

The forward function takes the input image and passes it through the layers of the network to produce the output.


In [3]:
class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        return self.main(input)

The code above creates the Discriminator neural network, which is a convolutional neural network that takes an image as input and outputs a scalar probability that the input image is real or fake. The discriminator is designed to have four convolutional layers with batch normalization and leaky ReLU activation functions. The first convolutional layer has a kernel size of 4 and a stride of 2, and it takes the input image that has nc channels and a size of 64x64 pixels. The number of output channels of the first convolutional layer is ndf, which is a hyperparameter that controls the number of filters in the network. The output feature maps from each convolutional layer are downsampled by a factor of 2 by setting the stride of the convolutional layer to 2. The final output of the discriminator is a scalar value between 0 and 1, which is achieved by applying a sigmoid function to the output of the last convolutional layer.

The code applies the weights_init function to randomly initialize all weights of the discriminator to have a mean of 0 and a standard deviation of 0.2. The ngpu argument is used to specify the number of GPUs to use for training. If ngpu > 1, the code wraps the model with the nn.DataParallel class to enable parallel computation across multiple GPUs. The print(netD) statement is used to print the architecture of the discriminator network.

In [None]:
# Create the Discriminator
netD = Discriminator(ngpu).to(device)

# Handle multi-gpu if desired
if (device.type == 'cuda') and (ngpu > 1):
    netD = nn.DataParallel(netD, list(range(ngpu)))
    
# Apply the weights_init function to randomly initialize all weights
#  to mean=0, stdev=0.2.
netD.apply(weights_init)

# Print the model
print(netD)

## Loss Function

- criterion: a Binary Cross Entropy loss function that measures the difference between two probability distributions, in this case between the predicted output and the target output for both the real and fake samples.
- fixed_noise: a fixed batch of noise vectors that will be used to visualize the progression of the generator's output during training.
- real_label: a label used to indicate real samples during training.
- fake_label: a label used to indicate fake samples during training.
- optimizerD: an Adam optimizer that updates the parameters of the discriminator model using the gradients of the loss function with respect to the model parameters.
- optimizerG: an Adam optimizer that updates the parameters of the generator model using the gradients of the loss function with respect to the model parameters.

In [4]:
# Initialize BCELoss function
criterion = nn.BCELoss()

# Create batch of latent vectors that we will use to visualize
#  the progression of the generator
fixed_noise = torch.randn(64, nz, 1, 1, device=device)

# Establish convention for real and fake labels during training
real_label = 1
fake_label = 0

# Setup Adam optimizers for both G and D
optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

NameError: name 'device' is not defined

## Training

This is the training loop for the GAN. Here's what's happening in each iteration:

- The Discriminator (D) is updated to maximize the objective function log(D(x)) + log(1 - D(G(z))), where x is a real image and G(z) is the fake image generated by the Generator (G) from a noise vector z.
- First, the real batch of images is fed to the Discriminator and the output is compared to the expected output (which is 1, as they are real images) to calculate the loss errD_real. This loss is backpropagated through D.
- Then, a fake batch of images is generated by G from random noise vectors, and fed to the Discriminator. The output is compared to the expected output (which is 0, as they are fake images) to calculate the loss errD_fake. This loss is backpropagated through D.
- The losses errD_real and errD_fake are added to get the total loss errD, which is used to update the weights of D using the Adam optimizer.
- The Generator is updated to maximize the objective function log(D(G(z))).
- First, a fake batch of images is generated by G from random noise vectors, and fed to the Discriminator. The output is compared to the expected output (which is 1, as they are fake images but we want D to classify them as real) to calculate the loss errG. This loss is backpropagated through G.
- The loss errG is used to update the weights of G using the Adam optimizer.
- The losses errD and errG are saved for plotting later.
- The fake images generated by G from a fixed set of noise vectors are saved for visualization.
- The training loop runs for a fixed number of epochs, and at the end of each epoch, the Discriminator and Generator weights are saved to a file. After training is complete, the losses and generated images are plotted.






In [None]:
# Training Loop

# Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []
iters = 0

print("Starting Training Loop...")
# For each epoch
for epoch in range(num_epochs):
    # For each batch in the dataloader
    for i, data in enumerate(dataloader, 0):
        
        ############################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        ###########################
        ## Train with all-real batch
        netD.zero_grad()
        # Format batch
        real_cpu = data[0].to(device)
        b_size = real_cpu.size(0)
        label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
        # Forward pass real batch through D
        output = netD(real_cpu).view(-1)
        # Calculate loss on all-real batch
        errD_real = criterion(output, label)
        # Calculate gradients for D in backward pass
        errD_real.backward()
        D_x = output.mean().item()

        ## Train with all-fake batch
        # Generate batch of latent vectors
        noise = torch.randn(b_size, nz, 1, 1, device=device)
        # Generate fake image batch with G
        fake = netG(noise)
        label.fill_(fake_label)
        # Classify all fake batch with D
        output = netD(fake.detach()).view(-1)
        # Calculate D's loss on the all-fake batch
        errD_fake = criterion(output, label)
        # Calculate the gradients for this batch
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        # Add the gradients from the all-real and all-fake batches
        errD = errD_real + errD_fake
        # Update D
        optimizerD.step()

        ############################
        # (2) Update G network: maximize log(D(G(z)))
        ###########################
        netG.zero_grad()
        label.fill_(real_label)  # fake labels are real for generator cost
        # Since we just updated D, perform another forward pass of all-fake batch through D
        output = netD(fake).view(-1)
        # Calculate G's loss based on this output
        errG = criterion(output, label)
        # Calculate gradients for G
        errG.backward()
        D_G_z2 = output.mean().item()
        # Update G
        optimizerG.step()
        
        # Output training stats
        if i % 50 == 0:
            print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                  % (epoch, num_epochs, i, len(dataloader),
                     errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
        
        # Save Losses for plotting later
        G_losses.append(errG.item())
        D_losses.append(errD.item())
        
        # Check how the generator is doing by saving G's output on fixed_noise
        if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
            with torch.no_grad():
                fake = netG(fixed_noise).detach().cpu()
            img_list.append(vutils.make_grid(fake, padding=2, normalize=True))
            
        iters += 1

## Results

The code above generates a plot showing the Generator and Discriminator loss during training. The x-axis represents the iterations, and the y-axis represents the loss. The blue line corresponds to the Generator loss, and the orange line corresponds to the Discriminator loss. The plot can be useful for visualizing the convergence of the model during training.

In [None]:
plt.figure(figsize=(10,5))
plt.title("Generator and Discriminator Loss During Training")
plt.plot(G_losses,label="G")
plt.plot(D_losses,label="D")
plt.xlabel("iterations")
plt.ylabel("Loss")
plt.legend()
plt.show()

## Visualization

In [None]:
#%%capture
fig = plt.figure(figsize=(8,8))
plt.axis("off")
ims = [[plt.imshow(np.transpose(i,(1,2,0)), animated=True)] for i in img_list]
ani = animation.ArtistAnimation(fig, ims, interval=1000, repeat_delay=1000, blit=True)

HTML(ani.to_jshtml())

## Comparison Between Real and Fake Images

In [None]:
# Grab a batch of real images from the dataloader
real_batch = next(iter(dataloader))

# Plot the real images
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.axis("off")
plt.title("Real Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(),(1,2,0)))

# Plot the fake images from the last epoch
plt.subplot(1,2,2)
plt.axis("off")
plt.title("Fake Images")
plt.imshow(np.transpose(img_list[-1],(1,2,0)))
plt.show()