# Introduction to Generative Adversarial Networks (GAN)

Generative adversarial networks (GANs), introduced in 2014 by Goodfellow are an alternative to VAEs for learning latent spaces of images. They enable the generation of fairly realistic synthetic images by forcing the generated images to be statistically
almost indistinguishable from real ones.

An intuitive way to understand GANs is to imagine a forger trying to create a fake Picasso painting. At first, the forger is pretty bad at the task. He mixes some of his fakes with authentic Picassos and shows them all to an art dealer. The art dealer makes an authenticity assessment for each painting and gives the forger feedback about what makes a Picasso look like a Picasso. The forger goes back to his studio to prepare some new fakes. As times goes on, the forger becomes increasingly competent at imitating the style of Picasso, and the art dealer becomes increasingly expert at spotting fakes. In the end, they have on their hands some excellent fake Picassos.

That’s what a GAN is: a forger network and an expert network, each being trained to best the other. As such, a GAN is made of two parts:
 - ***Generator network—*** Takes as input a random vector (a random point in the latent space), and decodes it into a synthetic image
 - ***Discriminator network (or adversary)—*** Takes as input an image (real or synthetic), and predicts whether the image came from the training set or was created by the generator network.

The generator network is trained to be able to fool the discriminator network, and thus it evolves toward generating increasingly realistic images as training goes on: artificial images that look indistinguishable from real ones, to the extent that it’s impossible for the discriminator network to tell the two apart. Meanwhile, the discriminator is constantly adapting to the gradually improving capabilities of the generator, setting a high bar of realism for the generated images. Once training is
over, the generator is capable of turning any point in its input space into a believable image. Unlike VAEs, this latent space has fewer explicit guarantees of meaningful structure; in particular, it isn’t continuous.

![capture](https://user-images.githubusercontent.com/13174586/52191499-f004c680-286a-11e9-88f0-0842352d5750.JPG)

Remarkably, a GAN is a system where the optimization minimum isn’t fixed. Normally, gradient descent consists of rolling down hills in a static loss landscape. But with a GAN, every step taken down the hill changes the entire landscape a little. It’s a dynamic system where the optimization process is seeking not a minimum, but an equilibrium between two forces. For this reason, GANs are notoriously difficult to train—getting a GAN to work requires lots of careful tuning of the model architecture and training parameters.

![capture](https://user-images.githubusercontent.com/13174586/52191555-496cf580-286b-11e9-9aab-c4f7cd5be266.JPG)

### A Schematic GAN Implementation
In this section, we’ll explain how to implement a GAN in Keras, in its barest form. The specific implementation is a deep convolutional GAN (DCGAN): a GAN where the generator and discriminator are *deep convnets*. In particular, it uses a `Conv2DTranspose` layer for image upsampling in the generator.

We’ll train the GAN on images from CIFAR10, a dataset of 50,000 32 × 32 RGB images belonging to 10 classes (5,000 images per class). To make things easier, we’ll only use images belonging to the class “frog.”

Schematically, the GAN looks like this:
 - A generator network maps vectors of shape `(latent_dim,)` to images of shape `(32, 32, 3)`.
 - A discriminator network maps images of shape (32, 32, 3) to a binary score estimating the probability that the image is real.
 - A gan network chains the generator and the discriminator together: `gan(x) = discriminator(generator(x))`. Thus this gan network maps latent space vectors to the discriminator’s assessment of the realism of these latent vectors as decoded by the generator.
 - We train the discriminator using examples of real and fake images along with “real”/“fake” labels, just as we train any regular image-classification model.
 - To train the generator, we use the gradients of the generator’s weights with regard to the loss of the gan model. This means, at every step, we move the weights of the generator in a direction that makes the discriminator more likely to classify as “real” the images decoded by the generator. In other words, we train the generator to fool the discriminator
 
### Tricks for using GANs

The process of training GANs and tuning GAN implementations is notoriously difficult. There are a number of known tricks we should keep in mind. Like most things in deep learning, it’s more alchemy than science: these tricks are heuristics, not
theory-backed guidelines. They’re supported by a level of intuitive understanding of the phenomenon at hand, and they’re known to work well empirically, although not necessarily in every context.

Here are a few of the tricks used in the implementation of the GAN generator and discriminator in this section. It isn’t an exhaustive list of GAN-related tips; we’ll find many more across the GAN literature:
 - We use `tanh` as the last activation in the generator, instead of `sigmoid`, which is more commonly found in other types of models.
 - We sample points from the latent space using a `normal distribution (Gaussian distribution)`, not a `uniform distribution`.
 - Stochasticity is good to induce robustness. Because GAN training results in a dynamic equilibrium, GANs are likely to get stuck in all sorts of ways. Introducing randomness during training helps prevent this. We introduce randomness in two ways: *by using dropout in the discriminator* and *by adding random noise to the labels for the discriminator*.
 - Sparse gradients can hinder GAN training. In deep learning, sparsity is often a desirable property, but not in GANs. Two things can induce gradient sparsity: `max pooling` operations and `ReLU` activations. Instead of max pooling, we recommend using *strided convolutions for downsampling*, and we recommend using a *`LeakyReLU` layer instead of a `ReLU` activation*. It’s similar to ReLU, but it relaxes sparsity constraints by allowing small negative activation values.
 - In generated images, it’s common to see checkerboard artifacts caused by unequal coverage of the pixel space in the generator. To fix this, we use a kernel size that’s divisible by the stride size whenever we use a strided `Conv2DTranpose` or `Conv2D` in both the generator and the discriminator.
 
![capture](https://user-images.githubusercontent.com/13174586/52193210-0a8f6d80-2874-11e9-8f87-a6071d70e34f.JPG)

### The Generator
First, let’s develop a `generator` model that turns a vector (from the latent space— during training it will be sampled at random) into a candidate image. One of the many issues that commonly arise with GANs is that the generator gets stuck with generated images that look like noise. A possible solution is to use dropout on both the discriminator and the generator.

### GAN Generator Network

In [3]:
import keras
from keras import layers
import numpy as np

latent_dim=32
channels= 3
height=32
width=32

generator_input = keras.Input(shape=(latent_dim,))

x= layers.Dense(128*16*16)(generator_input)          #Transforms the input into
x= layers.LeakyReLU()(x)                             #a 16 × 16 128-channel
x= layers.Reshape((16,16,128))(x)                    #feature map

x= layers.Conv2D(256, 5, padding='same')(x)
x= layers.LeakyReLU()(x)

x= layers.Conv2DTranspose(256, 4, strides=2, padding='same')(x)          #Upsamples
x= layers.LeakyReLU()(x)                                                 #to 32 × 32

x= layers.Conv2D(256,5, padding='same')(x)
x= layers.LeakyReLU()(x)
x= layers.Conv2D(256,5, padding='same')(x)
x= layers.LeakyReLU()(x)

#Produces a 32 × 32 1-channel feature map (shape of a CIFAR10 image)
x= layers.Conv2D(channels, 7, activation='tanh', padding='same')(x)
generator= keras.models.Model(generator_input, x) #Instantiates the generator model, which maps 
                                                  #the input of shape (latent_dim,) into an image of shape (32, 32, 3)
generator.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 32768)             1081344   
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 32768)             0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 16, 16, 256)       819456    
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 16, 16, 256)       0         
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 32, 32, 256)       1048832   
__________

### The Discriminator
Next, we’ll develop a discriminator model that takes as input a candidate image (real or synthetic) and classifies it into one of two classes: “generated image” or “real image that comes from the training set.”

### The GAN Discriminator Network

In [4]:
discriminator_input= layers.Input(shape=(height, width, channels))

x= layers.Conv2D(128, 3)(discriminator_input)
x= layers.LeakyReLU()(x)
x= layers.Conv2D(128, 4, strides=2)(x)
x= layers.LeakyReLU()(x)
x= layers.Conv2D(128, 4, strides=2)(x)
x= layers.LeakyReLU()(x)
x= layers.Conv2D(128, 4, strides=2)(x)
x= layers.LeakyReLU()(x)
x= layers.Flatten()(x)

x= layers.Dropout(0.4)(x) #One dropout layer: an important trick!

x= layers.Dense(1, activation='sigmoid')(x) #Classification layer

discriminator= keras.models.Model(discriminator_input, x) #nstantiates the discriminator model, which turns a (32, 32, 3) input 
                                                          #into a binary classifi-cation decision (fake/real)
discriminator.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 30, 30, 128)       3584      
_________________________________________________________________
leaky_re_lu_6 (LeakyReLU)    (None, 30, 30, 128)       0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 14, 14, 128)       262272    
_________________________________________________________________
leaky_re_lu_7 (LeakyReLU)    (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 6, 6, 128)         262272    
_________________________________________________________________
leaky_re_lu_8 (LeakyReLU)    (None, 6, 6, 128)         0         
__________

In [5]:
discriminator_optimizer= keras.optimizers.RMSprop(lr= 0.0008,
                                                 clipvalue=1.0, #Uses gradient clipping (by value) in the optimizer
                                                 decay=1e-8)    #To stabilize training, uses learning-rate decay

In [6]:
discriminator.compile(optimizer=discriminator_optimizer, loss='binary_crossentropy')

### The Adversarial Network

Finally, we’ll set up the GAN, which chains the generator and the discriminator. When trained, this model will move the generator in a direction that improves its ability to fool the discriminator. This model turns latent-space points into a classification decision—“fake” or “real”—and it’s meant to be trained with labels that are always “these are real images.” So, training gan will update the weights of generator in a way that makes discriminator more likely to predict “real” when looking at fake images. It’s very important to note that we set the discriminator to be frozen during training (non-trainable): its weights won’t be updated when training gan. If the discriminator weights could be updated during this process, then we’d be training the discriminator to always predict “real,” which isn’t what we want!

### Adversarial Network

In [7]:
discriminator.trainable= False #Sets discriminator weights to non-trainable (this will only apply to the gan model)

gan_input= keras.Input(shape=(latent_dim,))
gan_output= discriminator(generator(gan_input))
gan= keras.models.Model(gan_input, gan_output)

In [8]:
gan_optimizer= keras.optimizers.RMSprop(lr=0.0004, clipvalue=1.0, decay=1e-8)
gan.compile(gan_optimizer, loss='binary_crossentropy')

### How to Train DCGAN

Now we can begin training. To recapitulate, this is what the training loop looks like schematically. For each epoch, we do the following:
 - Draw random points in the latent space (random noise).
 - Generate images with generator using this random noise.
 - Mix the generated images with real ones.
 - Train discriminator using these mixed images, with corresponding targets: either “real” (for the real images) or “fake” (for the generated images).
 - Draw new random points in the latent space.
 - Train gan using these random vectors, with targets that all say “these are real images.” This updates the weights of the generator (only, because the discriminator is frozen inside gan) to move them toward getting the discriminator to predict “these are real images” for generated images: this trains the generator to fool the discriminator.


### Implement GAN Training

In [9]:
import os
from keras.preprocessing import image

(x_train, y_train), (_,_)= keras.datasets.cifar10.load_data()  #Loads CIFAR10 data

In [37]:
x_train= x_train[y_train.flatten()==7] #store horses

x_train= x_train.reshape((x_train.shape[0],)+ (height, width, channels)).astype('float32')/255.0 #Normalizes data

In [38]:
iterations=10000
batch_size=20
save_dir= 'GAN_images'  #Specifies where we want to save generated images

In [40]:
start=0
for step in range(iterations):
    random_latent_vectors= np.random.normal(size=(batch_size, latent_dim)) #Samples random points in the latent space
    
    generated_images= generator.predict(random_latent_vectors) #Decodes them to fake images
    
    stop= start+batch_size
    real_images= x_train[start:stop]                                 #Combines them
    combined_images= np.concatenate([generated_images, real_images]) #with real images
    
    labels= np.concatenate([np.ones((batch_size,1)),  #Assembles labels, discriminating
                           np.zeros((batch_size,1))]) #real from fake images
    
    labels+= 0.05*np.random.random(labels.shape) #Adds random noise to the labels—an important trick!
    
    d_loss= discriminator.train_on_batch(combined_images, labels) #Trains the discriminator
    
    random_latent_vectors= np.random.normal(size=(batch_size, latent_dim)) #Samples random points in the latent space
    
    misleading_targets= np.zeros((batch_size, 1)) #Assembles latent space labels that say 
                                                  #“these are all real images” (it’s a lie!)
        
    a_loss= gan.train_on_batch(random_latent_vectors, misleading_targets) #Trains the generator (via the gan model, 
                                                                          #where the discriminator weights are frozen)
        
    start += batch_size
    if start> len(x_train) -batch_size:
        start=0
        
    if step %100 == 0:            #Occasionally saves and plots (every 100 steps)
        gan.save_weights('gan.h5')#Saves model weights
        
        print('discriminator loss:', d_loss)  #Prints 
        print('adversarial_loss:', a_loss)    #metrics
        
        img= image.array_to_img(generated_images[0]*255., scale=False)
        img.save(os.path.join(save_dir, 'generated_horse'+str(step)+'.png')) #Saves one generated image
        img= image.array_to_img(real_images[0]*255., scale=False)
        img.save(os.path.join(save_dir, 'real_horse'+str(step)+'.png'))      #Saves one real image for comparison

  'Discrepancy between trainable weights and collected trainable'


discriminator loss: 0.6617702
adversarial_loss: 0.68149436
discriminator loss: 0.69179994
adversarial_loss: 0.7019526
discriminator loss: 0.69367456
adversarial_loss: 0.74562055
discriminator loss: 0.6808406
adversarial_loss: 0.7324556
discriminator loss: 0.6894037
adversarial_loss: 0.7302474
discriminator loss: 0.6715142
adversarial_loss: 1.2362105
discriminator loss: 0.69639766
adversarial_loss: 0.75128037
discriminator loss: 0.6497221
adversarial_loss: 1.1229557
discriminator loss: 0.6870656
adversarial_loss: 0.7571552
discriminator loss: 0.7025717
adversarial_loss: 0.74121445
discriminator loss: 0.70527303
adversarial_loss: 0.72985065
discriminator loss: 0.6891298
adversarial_loss: 0.78328544
discriminator loss: 0.70616543
adversarial_loss: 0.7662188
discriminator loss: 0.72330284
adversarial_loss: 0.8188443
discriminator loss: 0.8074007
adversarial_loss: 0.7552217
discriminator loss: 0.6813648
adversarial_loss: 0.7658
discriminator loss: 0.6993841
adversarial_loss: 0.7587241
discr

When training, we may see the adversarial loss begin to increase considerably, while the discriminative loss tends to zero—the discriminator may end up dominating the generator. If that’s the case, try reducing the discriminator learning rate, and increase the dropout rate of the discriminator.