# Introduction

Original basic GAN's are very hard to train and inconsistent at producing results. Hence we go through another paper [DCGAN](https://arxiv.org/abs/1511.06434), where they make some architectural constraints for stable and better training.GANs provide an attractive alternative to maximum likelihood techniques. One can additionally argue that their learning process and the lack of a heuristic cost function (such as pixel-wise independent mean-square error) are attractive to representation learning. GANs have been known to be unstable to train, often resulting in generators that produce nonsensical outputs.

Paper proposes and evaluates a set of constraints on the architectural topology of Convolutional GANs that make them stable to train in most settings and name this class of architectures Deep Convolutional GANs (DCGAN).

Historical attempts to scale up GANs using CNNs to model images have been unsuccessful. 
Paper identifies family of architectures that resulted in stable training across range of datasets
and allowed training for higher resolution and deeper generative models.

Paper proposes majorly 3 changes required for better training of GAN's':
* Implement fuly convolutional net and replace maxpooling with strided convolutions, allowing the network to learn its own spatial downsampling. They use this approach in generator, allowing it to learn its own spatial upsampling, and discriminator.
* Second is the trend towards eliminating fully connected layers on top of convolutional features. The first layer of the GAN, which takes a uniform noise distribution Z as input, could be called fully connected as it is just a matrix multiplication, but the result is reshaped into a 4-dimensional tensor and used as the start of the convolution stack. For the discriminator, the last convolution layer is flattened and then fed into a single sigmoid output.
* Third is Batch Normalization which stabilizes learning by normalizing the input to each unit to have zero mean and unit variance.This helps deal with training problems that arise due to poor initialization and helps gradient flow in deeper models. This proved critical to get deep generators to begin learning, preventing the generator from collapsing all samples to a single point which is a common failure mode observed in GANs (**mode collapse**). Directly applying batchnorm to all layers however, resulted in sample oscillation and model instability. This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer. The ReLU activation is used in the generator with the exception of the output layer which uses the Tanh function. We observed that using a bounded activation allowed the model to learn more quickly to saturate and cover the color space of the training distribution. Within the discriminator we found the leaky rectified activation to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation

### Architecural Guidelines from the above discussion
* Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
* Use batchnorm in both the generator and the discriminator, except the generator output layer and discriminator input layer.
* Remove fully connected hidden layers for deeper architectures.
* Use ReLU activation in generator for all layers except for the output, which uses Tanh.
* Use LeakyReLU activation in the discriminator for all layers.

# Model

In [24]:
import torch, torchvision
import torch.utils as tUtils
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torch.optim as optim
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torchvision.utils as vutils

In [9]:
class Disriminator(nn.Module):
    def __init__(self, nc, ndf):
        super(Disriminator, self).__init__()
        self.main = nn.Sequential(
            #input size : nc * H * W
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            # No BatchNorm layer for the input of the Disciminator
            nn.LeakyReLU(0.2, inplace=True),
            # Shape: nf * H/2 * W/2
            nn.Conv2d(ndf, ndf *2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf*2),
            nn.LeakyReLU(0.2, inplace=True),
            # Shape: nf*2 * H/4 * W/4
            nn.Conv2d(ndf*2, ndf *4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf*4),
            nn.LeakyReLU(0.2, inplace=True),
            # Shape: nf*4 * H/8 * W/8
            nn.Conv2d(ndf*4, ndf *8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf*8),
            nn.LeakyReLU(0.2, inplace=True),
            # Shape: nf*8 * H/16 * W/16
            nn.Conv2d(ndf*8, 1 , 4, 1, 0, bias=False),
            #Shape: 1 * 1 * 1 for H/W = 64
            nn.Sigmoid()
        )        

    def forward(self,x):
        return self.main(inp)

class Generator(nn.Module):
    def __init__(self, nz, ngf, nc):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            #Input noise: batch_size * Z
            nn.ConvTranspose2d(nz, ngf*8, 4, 1, 0,bias=False),
            nn.BatchNorm2d(ngf*8),
            nn.ReLU(True),
            #Shape: ngf*8 x 4 x 4
            nn.ConvTranspose2d(ngf*8, ngf*4, 4, 2, 1,bias=False),
            nn.BatchNorm2d(ngf*4),
            nn.ReLU(True),
            #Shape: ngf*4 x 8 x 8
            nn.ConvTranspose2d(ngf*4, ngf*2, 4, 2, 1,bias=False),
            nn.BatchNorm2d(ngf*2),
            nn.ReLU(True),
            #Shape: ngf*8 x 16 x 16
            nn.ConvTranspose2d(ngf*2, ngf, 4, 2, 1,bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            #Shape: ngf*8 x 32 x 32
            nn.ConvTranspose2d(ngf, nc, 4,2,1, bias=False),
            nn.Tanh()
            #Shape: nc * 64 * 64   
        )

    def forward(self,inp):
        return self.main(inp)

In [8]:
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        m.weight.data.normal_(0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)

In [10]:
nc =3; nz = 100; ngf = 64; ndf =64

In [15]:
netG = Generator(nz, ngf, nc)
netG.apply(weights_init)
print(netG)

Generator (
  (main): Sequential (
    (0): ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False)
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU (inplace)
    (3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (5): ReLU (inplace)
    (6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (8): ReLU (inplace)
    (9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
    (11): ReLU (inplace)
    (12): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (13): Tanh ()
  )
)


In [16]:
netD = Disriminator(nc, ndf)
netD.apply(weights_init)
print(netD)

Disriminator (
  (main): Sequential (
    (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): LeakyReLU (0.2, inplace)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
    (4): LeakyReLU (0.2, inplace)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
    (7): LeakyReLU (0.2, inplace)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
    (10): LeakyReLU (0.2, inplace)
    (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), bias=False)
    (12): Sigmoid ()
  )
)


In [17]:
criterion = nn.BCELoss()

In [19]:
# setup optimizer
optimizerD = optim.Adam(netD.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=0.0002, betas=(0.5, 0.999))

In [21]:
def get_noise(shape):
    return torch.randn(shape)

In [25]:
outf = os.getpwd() + '/results'

In [27]:
def train(epochs, nz ):
    for epoch in range(epochs):
        for i, (images,_) in enumerate(dataloader, 0):
            batchsize = images.size[0]
            inp = images.cuda()
            inpv = Variable(inp)            
            real_labels = Variable(torch.ones(batchsize))
            
            outp = netD(inpv)
            errD_real = criterion(outp, real_labels)
            errD_real.backward()
            D_x = outp.data.mean()
            
            noise = get_noise((batchsize, nz, 1, 1))
            noiseV = Variable(noise)
            fake_inp = netG(noiseV)
            fake_labels = Variable(torch.zeros(batchsize))
            
            outp = netD(fake_inp.detach())
            errD_fake = criterion(outp, fake_labels)
            errD_fake.backward()
            D_G_z1 = outp.data.mean()
            
            errD = errD_real + errD_fake
            optimizerD.step()
            
            #Update Generator
            netG.zero_grad()
            labelsv = Variable(torch.ones(batchsize))
            outp = netD(fake_inp)
            errG = criterion(outp, labelsv)
            errG.backward()
            D_G_z2 = outp.data.mean()
            
            optimizerG.step()
            
            print('[%d/%d][%d/%d] Loss_D: %.4f Loss_G: %.4f D(x): %.4f D(G(z)): %.4f / %.4f'
              % (epoch, epochs, i, len(dataloader),
                 errD.data[0], errG.data[0], D_x, D_G_z1, D_G_z2))
            if i % 100 == 0:
                vutils.save_image(inp,
                    '%s/real_samples.png' % outf,
                    normalize=True)
                fake = netG(Variable(get_noise((batchsize, nz, 1, 1))))
                vutils.save_image(fake.data,
                    '%s/fake_samples_epoch_%03d.png' % (outf, epoch),
                    normalize=True)
        torch.save(netG.state_dict(), '%s/netG_epoch_%d.pth' % (outf, epoch))
        torch.save(netD.state_dict(), '%s/netD_epoch_%d.pth' % (outf, epoch))
            
            
            
            