# <div style="text-align: right">Generative Adversarial Networks to create new Simpsons character face</div>

<div style="text-align: right">by Neel Indap(indap.n@husky.neu.edu)</div>

## Introduction
***

### What is a Generative Adversarial Network (GAN)?
Generative Adversarial Network (referred to as GAN) is a network that generates new data with the same internal structure as the training data. They can be described as generative models based on supervised learning.
It consists of 2 Neural Networks models, the generator (which defines takes random noise and generates samples), and a discriminator (which takes the above sample, and tries to determine if it is fake or real). At each step, we try to minimize the loss for both models, until the point where the generator produces samples virtually indistinguishable from the real images, for the discriminator.
The network itself can be thought of as a game between the 2 models, both competing to win.
To gain more insight into this, refer to the following paper published by Ian Goodfellow and his colleagues, explaining their motivation behind this.<br>
[GAN paper](https://arxiv.org/abs/1406.2661)

If you are new to Neural Networks, checkout this video. It is a good starting point in understanding what they are and how do they work.<br>
[What is a Neural Network?](https://www.youtube.com/watch?v=aircAruvnKk)

## GAN Architecture
***
![GAN Architecture](./images/GAN_architecture.png)

### Why GAN?
GANs are cited as the most interesting idea in the last ten years by the Yann LeCun, the director of AI at Facebook. This intrigued me to understand the working of this algorithm.
<br>Since its inception, there have been various improvements published. Most of these are around image generation.
In this paper, I am trying to train the model using a custom image set of only 100 images as training data. The original paper used a CelebA dataset provided by imagenet consisting of 200k images.

Trying to get a stable working model using a small dataset, I am trying to see the impact of changing the hyper parameters, as well as modifying the neural network itself.

## Improvements on GAN - DCGAN
***

Shortly after its inception there was a paper published called [Unsupervised Learning using Deep Convolution GAN](https://arxiv.org/abs/1511.06434).

This paper talks about the use of batch normalization in the CNN layers to improve preformance of the network.

## Setting up code
***

The code is hosted on [Github](https://github.com/neelindap/DCGAN-tensorflow)

Clone the repository using ``` git clone https://github.com/neelindap/DCGAN-tensorflow```

After cloning the repository, please install the following dependency:
``` pip install Pillow ```

**_NOTE_**:<br>

It is assumed the system already has Tensorflow env set up. It not, refer to the this [tutorial](https://www.tensorflow.org/install/).

## Running the Code
***

To run the code, on your terminal navigate to the installed path and run
``` python main.py --train --crop ```

This will automatically pick-up the training images present in the folder ```./Data/Simpsons_64```.
In order to use a different data set, place the images in the folder ```./Data``` folder and change the name of the "dataset" flag in ```main.py``` file.

## GAN model - Tensorboard Visualization
***

![Tensorflow Visualization](./images/GAN.png)

## Code Snippets
***

### Generator Model

![Generator](./images/Generator.png)

### Discriminator Model

![Discriminator](./images/Discriminator.png)

### Generator and Discriminator models

The Generator and Discriminator models are formed as follows:

---------<br>Variables: name (type shape) [size]<br>---------<br>generator/g_h0_lin/Matrix:0 (float32_ref 100x16384) [1638400, bytes: 6553600]<br>generator/g_h0_lin/bias:0 (float32_ref 16384) [16384, bytes: 65536]<br>generator/g_bn0/beta:0 (float32_ref 1024) [1024, bytes: 4096]<br>generator/g_bn0/gamma:0 (float32_ref 1024) [1024, bytes: 4096]<br>generator/g_h1/w:0 (float32_ref 5x5x512x1024) [13107200, bytes: 52428800]<br>generator/g_h1/biases:0 (float32_ref 512) [512, bytes: 2048]<br>generator/g_bn1/beta:0 (float32_ref 512) [512, bytes: 2048]<br>generator/g_bn1/gamma:0 (float32_ref 512) [512, bytes: 2048]<br>generator/g_h2/w:0 (float32_ref 5x5x256x512) [3276800, bytes: 13107200]<br>generator/g_h2/biases:0 (float32_ref 256) [256, bytes: 1024]<br>generator/g_bn2/beta:0 (float32_ref 256) [256, bytes: 1024]<br>generator/g_bn2/gamma:0 (float32_ref 256) [256, bytes: 1024]<br>generator/g_h3/w:0 (float32_ref 5x5x128x256) [819200, bytes: 3276800]<br>generator/g_h3/biases:0 (float32_ref 128) [128, bytes: 512]<br>generator/g_bn3/beta:0 (float32_ref 128) [128, bytes: 512]<br>generator/g_bn3/gamma:0 (float32_ref 128) [128, bytes: 512]<br>generator/g_h4/w:0 (float32_ref 5x5x64x128) [204800, bytes: 819200]<br>generator/g_h4/biases:0 (float32_ref 64) [64, bytes: 256]<br>generator/g_bn4/beta:0 (float32_ref 64) [64, bytes: 256]<br>generator/g_bn4/gamma:0 (float32_ref 64) [64, bytes: 256]<br>generator/g_h5/w:0 (float32_ref 5x5x3x64) [4800, bytes: 19200]<br>generator/g_h5/biases:0 (float32_ref 3) [3, bytes: 12]<br>discriminator/d_h0_conv/w:0 (float32_ref 5x5x3x64) [4800, bytes: 19200]<br>discriminator/d_h0_conv/biases:0 (float32_ref 64) [64, bytes: 256]<br>discriminator/d_h1_conv/w:0 (float32_ref 5x5x64x128) [204800, bytes: 819200]<br>discriminator/d_h1_conv/biases:0 (float32_ref 128) [128, bytes: 512]<br>discriminator/d_bn1/beta:0 (float32_ref 128) [128, bytes: 512]<br>discriminator/d_bn1/gamma:0 (float32_ref 128) [128, bytes: 512]<br>discriminator/d_h2_conv/w:0 (float32_ref 5x5x128x256) [819200, bytes: 3276800]<br>discriminator/d_h2_conv/biases:0 (float32_ref 256) [256, bytes: 1024]<br>discriminator/d_bn2/beta:0 (float32_ref 256) [256, bytes: 1024]<br>discriminator/d_bn2/gamma:0 (float32_ref 256) [256, bytes: 1024]<br>discriminator/d_h3_conv/w:0 (float32_ref 5x5x256x512) [3276800, bytes: 13107200]<br>discriminator/d_h3_conv/biases:0 (float32_ref 512) [512, bytes: 2048]<br>discriminator/d_bn3/beta:0 (float32_ref 512) [512, bytes: 2048]<br>discriminator/d_bn3/gamma:0 (float32_ref 512) [512, bytes: 2048]<br>discriminator/d_h4_conv/w:0 (float32_ref 5x5x512x1024) [13107200, bytes: 52428800]<br>discriminator/d_h4_conv/biases:0 (float32_ref 1024) [1024, bytes: 4096]<br>discriminator/d_bn4/beta:0 (float32_ref 1024) [1024, bytes: 4096]<br>discriminator/d_bn4/gamma:0 (float32_ref 1024) [1024, bytes: 4096]<br>discriminator/d_h5_lin/Matrix:0 (float32_ref 16384x1) [16384, bytes: 65536]<br>discriminator/d_h5_lin/bias:0 (float32_ref 1) [1, bytes: 4]<br>Total size of variables: 36507524<br>Total bytes of variables: 146030096

### Loss Functions
***

Next, we define the loss functions for the 2 models as follows:

 ```python
d_loss_real = tf.reduce_mean(
      sigmoid_cross_entropy_with_logits(D_logits, tf.ones_like(D)))
      
d_loss_fake = tf.reduce_mean(
      sigmoid_cross_entropy_with_logits(D_logits_, tf.zeros_like(D_)))

g_loss = tf.reduce_mean(
      sigmoid_cross_entropy_with_logits(D_logits_, tf.ones_like(D_)))
```

Where, <br>
d_loss_real is the loss for the real images passing through the discriminator<br>
d_loss_fake is the loss for the fake images passing through the discriminator<br>
g_loss is the loss for the images generated by the generator<br>

The total loss of the discriminator (d_loss) is the sum of d_loss_real and d_loss_fake

### Optimizer
***

We use Adam Optimizer to optimize the generator and discriminator models. They are defined as follows:<br>
```python
d_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1).minimize(self.d_loss, var_list=self.d_vars)
g_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1).minimize(self.g_loss, var_list=self.g_vars)
```
<br>
Learning Rate is 0.0001 and decay(beta1) is 0.5.

### Hyper-parameters
***

1.	Activation Function: tanH for Generator and Sigmoid for Discriminator
2.	Cost Function: Sigmoid with Cross Entorpy
3.	Gradient Descent: Adam Optimizer with learning rate: 0.0001 & beta1(decay rate of 1st moment estimation): 0.5
4.	Network Architecture: 5-layer Neural Network
5.	Network Initializer: random normal initializer
6.	Batch Size: 25
7.	Total Images: 100
8.	Epochs: 10000

## Model in training
***

The gist of the model training code is as follows:

``` python
    # Update D network
     _, summary_str = self.sess.run([d_optim, self.d_sum],
    feed_dict={ self.inputs: batch_images, self.z: batch_z })
    # Update G network
    _, summary_str = self.sess.run([g_optim, self.g_sum],
    feed_dict={ self.z: batch_z })

    # Run g_optim twice to make sure that d_loss does not go to zero (different from paper)
    _, summary_str = self.sess.run([g_optim, self.g_sum],
    feed_dict={ self.z: batch_z })
          
    errD_fake = self.d_loss_fake.eval({ self.z: batch_z })
    errD_real = self.d_loss_real.eval({ self.inputs: batch_images })
    errG = self.g_loss.eval({self.z: batch_z})
```

At every step, we try to optimize the sum of the network, where the sum is the sum of the losses. In the discriminator's case, it is the loss of the real images, and in generator's case, it is the loss of the fake images.<br>
It is defined as follows:

```python
self.g_sum = tf.summary.merge([self.z_sum, self.d__sum,
      self.G_sum, self.d_loss_fake_sum, self.g_loss_sum])
self.d_sum = tf.summary.merge([self.z_sum, self.d_sum, 
      self.d_loss_real_sum, self.d_loss_sum])
```

The losses are captured in Tensorboard:

![Discriminator loss](./images/d_loss.PNG)
**Fig. 1 : Discriminator Loss**

![Generator loss](./images/g_loss.PNG)
**Fig. 2 : Generator Loss**

As you can see the loss of the 2 are kind of inversely related (Like adversaries). <br>
If the discriminator has a lower loss, it means it can distinguish the fake images from the real ones, which in turn means the generator cannot produce good quality output, and vice-versa.


### Training Output
***

![GAN](./images/GAN.gif)

## Test Output
***

![Test Output](./images/test_20180425010142.png)

## Conclusion
***

With the output generated by the test, it is evident that the model had started to distinguish between various Simpson’s characters and tried to generate a new face based off the existing ones.
The model did lose its track around 600 epoch, where in started generating noise instead of faces. It stabilized in some 400 epochs, and eventually started producing better outputs again around the 1000 epoch.

With enough images and more training, I think the model would be stable enough to generate better output.


## Future Scope
***

The GAN model while have many applications still isn’t stable enough to generate definitive results.<br>
Model is susceptible to mode collapse, where in once the generator can fool the discriminator, it keeps on producing similar results again and again. <br><br>
GAN models also suffer from convergence, and therefore we don’t know when to stop training. To overcome this, there was a paper proposing use of Wasserstein distance instead of Jensen-Shannon divergence to understand the loss function better, which can be correlated to image quality.

![Test Output](./images/WGAN.png)
**Fig. 3: Loss functions in WGAN**

Another new search in the field of neural networks gave rise to Capsule Networks, which are evidently much better than CNNs in training models.<br>
These networks can be used in place of CNNs in the GAN architecture.

## References
***

1.	Generative Adversarial Networks (https://arxiv.org/abs/1406.2661)<br>
2.	GAN tutorial: https://medium.com/@awjuliani/generative-adversarial-networks-explained-with-a-classic-spongebob-squarepants-episode-54deab2fce39.<br>
3.	Generative models: https://en.wikipedia.org/wiki/Generative_model<br>
4.	Discriminative models : https://en.wikipedia.org/wiki/Discriminative_model<br>
5.	CNN: http://cs231n.github.io/convolutional-networks/<br>
6.	DCGAN: https://github.com/carpedm20/DCGAN-tensorflow<br>
7.	Hacks for GAN: https://github.com/soumith/ganhacks<br>
8.	https://arxiv.org/abs/1511.06434<br>
9.	https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6<br>
10. https://prateekvjoshi.com/2016/03/29/understanding-xavier-initialization-in-deep-neural-networks/<br>
11. http://gluon.mxnet.io/chapter14_generative-adversarial-networks/dcgan.html<br>
12. WGAN https://arxiv.org/abs/1701.07875<br>
13. https://hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc<br>


## Licenses
***

The text in the document by Neel Indap is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/us/

The code in the document by Neel Indap is licensed under the MIT License https://opensource.org/licenses/MIT

![License](https://licensebuttons.net/l/by/3.0/us/88x31.png)