# Generative Adversarial Networks 
Based on tutorials by Ian Goodfellow et al.






### By Jeremy Hartmann 

<img src="./images/Yann%20LeCun%20-%20Best%20thing%20since%20sliced%20bread.png" alt="Drawing" style="width: 100%;"/>


Build some hype
- Yann LeCun
- most interesting idea in the last ten year in Machine Learning
- After reading about it, Agree. 


# Summary

- Motivation
- What are GANs?
- Taxonomy of Generative Models
- Cost Function
- Intuition
- Future Work and Research



# Motivation

- Simulate many possible futures for planning or simulated RL (Reinforcement Learning)
- Semi-Supervised learning (Missing data)
- Multi-model output
- Realistic generation tasks

Simulated Futures:
- model learns to project different states of the world from the current states 

Semi-supervised learning:
- Train on incomplete or missing data. Say if you have data that is only partially labeled. Labeling data is one of the most time consume efforts in ML. Needs to be done by humans (Mechanical Turk)

Multi-Model output:
- A single input can correspond to many correct answers from a set of answers. All equally valid (sitting down in a room, all chairs are valid)

Realistic Generation tasks:
- Generate realsitic images. Audio, Speech, etc...

Some current research examples of GANs follows...

### Interactive Image Generation (Zhu et al. 2016)

<img src="./images/iGAN_Zhu2016.gif" alt="Drawing" style="height: 100%;"/>


iGAN Zhu et al 2016 ( Generative visual manipulation on the natural image manifold)
- The user edits in the canvas which updates the projection from the latent space into the image manifold. 
- Colors, line contours are used to interpolate the latent space. 
- Many possible outputs are on the right as the user updates. 


### 3D GAN (Wu et al. 2016)
![3DGan_Wu2016.gif](./images/3DGan_Wu2016.gif)

3D GAN Wu et al. 2016 (MIT) Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling
- Projects the latent space into a 3-dimensional voxel space using a volumetric CNN.


# What are GANs?

![generative-adversarial-network_Diagram_KDNuggets.png](./images/generative-adversarial-network_Diagram_KDNuggets.png)

Generator: 
- samples from the prior distribution of the latent space. 
- Latent space -- essential some randomness to produce images. 
- Generates samples for a minibatch is fed into the discriminator
- Usually is some form of deconvolution (transpose stride) net. 

Discriminator:
- essentially a binary classifier that determines weather the input is fake or real
- Can be a feed-forward (ConvNet)

Simultaneous Gradient Descent
- Both minibatches are processed. 
- And both the generator and discriminator will update there weights based on the cost function. 

Analogy (Grad student/Supervisor):
- The grad student and supervisor
- Grad student produces research papers (Generator) to the supervisor (Discriminator). 
- Grad student wants to maximise the supervisors grade/evaulation.
- Supervisor is very meticulous, and tries to find absolutly every details wrong with it, lowering your grade. 
- Feedback is given to Grad student to improve. 
- Supervisor improves evaulation techniques.
- And so the cycle continues...

Image form KDNuggets: (http://www.kdnuggets.com/2017/01/generative-adversarial-networks-hot-topic-machine-learning.html)


### The Latent Space
- Source of randomness but more meaningful. 
- High dimensional vector space of the real data's manifold. 

![LatentSpaceExample-TomWhite.PNG](./images/LatentSpaceExample-TomWhite.PNG)

The high dimension data:
- Key important thing to understand in GANs
- Could be categories like gender, smiling, etc....
- People wearing glasses. 
- Image: (Tom White, 2016) Sampling Generative Networks


### Vector Arithmetic on the Latent Space

![VectorArithmetic_1_Radford-2016.PNG](./images/VectorArithmetic_1_Radford-2016.PNG)


Vector arithmetic:
- Alec Radford et al. 2016 LCLR. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
- Introduced DCGAN (Deep Convolutional GAN) Which are really popular and one of the most stable implementation of GANs on images.

- The concept of the latent space is very important to understanding GANs

- have a representation of a man with glasses (not the actual image!)
- have a representaion of a man without glasses. 
- Plus a women without glasses

- Theses are the vectors in the latent space. 

What do we get?

### Vector Arithmetic

- Women with glasses (obviously!)

![VectorArithmetic_2_Radford-2016.PNG](./images/VectorArithmetic_2_Radford-2016.PNG)

Women with glasses
- the latent vector representation of the this concept. 
- each image is a perturbation on the latent vector. 


- Easy to understand at a high level
- Lots of intrecracies at the low level
- Many interesting research questions (later...)

# Taxonomy of Generative Models

![TaxonomyOfGenerativeModels.PNG](./images/TaxonomyOfGenerativeModels.PNG)

Taxonomy of Generative Models:
- The three most popular generative models are 
 - FVBN (Fully Visible Belief Nets) - WaveNet
 - GAN (What we are discussing!)
 - Variational Autoencoders
 
 
- Explicit Models
 - Defines an explicit density function.
 - Captures all the complexity of the model. 
 - Fairly straight forward to train, plug the model definition of the density function into the expression for the likelihood and follow the gradient uphill. 
 - Tractabile (model is computationally computatble)
 - Approximate (need to invoke some kinde of Monte Carlo or variational approximation). 
 - VAE (Kingma et al. 2014)
 - FVBN (WaveNet - Oord 2016)
 
- Implicit Models
 - Does not explicitly represent the probability distrubution over the space where the data lies. 
 - Provides a means to interact less directly usually by sampling from it. 
 - Generative Stochastic Network (Bengio et al. 2014). 
 
 Markov Chains:
 - Do not scale well to higher dimensions
 
 NonLinear ICA
 - Ivertable function G (Cost)
 - z and x must be same size. 
 
 FVBN:
 - Slow to generate samples
 
 
 
  

### Fully Visible Belief Nets (FVBN) - WaveNet (Oord et al. 2016)
![WaveNet_BlogPost-Fig2-Anim-160908-r01.gif](./images/WaveNet_BlogPost-Fig2-Anim-160908-r01.gif)

Disadvantage:
- (FVBN) Not parallel - GANs can generate all samples in parallel. 
- Each sample is generated sequentially. 2 min for 1 seccond of audio. 


### Variational AutoEncoders (Kingma et al. 2013)
![variational-autoencoder-faces._AlecRadford.jpg](./images/variational-autoencoder-faces._AlecRadford.jpg)

Variational Autoencoders:
- Produce blurry images and lower quality samples (GANs produce subjectivly better examples, hard to prove)
- Might not be asymptotically consistent (Yet to be proven).
 - Even with infinite data, the gap between the model and the data will not converge

# Cost Function

### Discriminator
   $J^{(D)} (\theta^{(D)}, \theta^{(G)}) = \\
   -\frac{1}{2} \mathbb{E}_{x \sim p_{data}} \log{(D(x))} - \frac{1}{2} \mathbb{E}_{z} \log{(1 - D(G(z)))}$

The Cost Function
- Minimize
- What's happening here is really interesting. 
- It is a standard cross entropy cost that is minimized when training a standard binary classifier. 
- The Discriminator wants to maximize the log probability of D(x)
- Minimize the log probability of D(G(z))

- Expected value of x in the data for the log probability of the discriminator
- Expected value of z for the log probability of the discriminator with respect to the generator. 



### The real samples
# $-\frac{1}{2} \mathbb{E}_{x \sim p_{data}} \log{(D(x))}$

Here we have the expected value of x from the data probability distrubition which are our real images. 
- This could be from MNIST, ImageNet etc....
- Wants to maximize log probability of D(x)

### Latent Space Samples
# $- \frac{1}{2} \mathbb{E}_{z} \log{(1 - D(G(z)))}$

This is the expectrd value z over the prior probability distrubition of the latent space. 
- Z is a (latent space) source of noise for the generator. 
- the latent space vector can be larger or smaller then x. 
- Wants to minimize D(G(z)) as these are the ones that need improvement.


### Another View (Gradient Ascent)

### $\nabla_{\theta_d} \frac{1}{m} \sum^{m}_{i=1} \big[ \log{D(x^{(i)})} +  \\
\log{(1 - D(G(z^{(i)})}) \big]$

- Key: The cost function evaluates two  minibatches

KEY!! Crux: the cost function for the discriminator takes a minibatch from the Generator and a minibatch from the real data distrubution. 
- Nabla representing gradient ascent
- m training examples
- Images... or something else
- Average over the log probabilities of the training set (minibatch)

- Medium article from Julien (https://medium.com/@awjuliani/generative-adversarial-networks-explained-with-a-classic-spongebob-squarepants-episode-54deab2fce39). 
- Gradient ascent (maximize!)



### Minimax Game

- zero-sum Game

 $J^{(D)} = -\frac{1}{2} \mathbb{E}_{x \sim p_{data}} \log{(D(x))} -  \\ 
 \frac{1}{2} \mathbb{E}_{z} \log{(1 - D(G(z)))}$
 
 $J^{(G)} = -J^{(D)}$
 
- Where the saddle point is the Nash equilibrium.  
- Generator minimizes the log probability of the discriminator being correct

Minimax Game
- The generator recieves the negative of what the discriminator gernerates
- The equilbrium is the saddle point of the discriminator loss
- Resembles the Jensen-Shannon divergence instead of the KL divergence. 
- Heurstics are used to help the generator avoid non-convergence. (See Ian Goodfellow Tutorial Paper)
- Issues are created by this as the minmax is not garunteed and could form a maxmin....
 - Mode collapse problem (later..)

# Intuition behind GANs

### MSE vs GANs
- GANs do not use Mean Squared Error

![IntuitiveMSE_HowGANsWork.PNG](./images/IntuitiveMSE_HowGANsWork.PNG)

MSE vs GANs
- Unlike most other supervised learning techniques, GANs do not use mean squared error. 
- Given the natural image manifold (red). 
- MSE will averagea over all possible solutions (Blue) resulting in a blurry  and low quality sample. 
- GANs will choose one correct solution out of all correct solutions. 
- Could learn blurry images. 
- When the discriminator recieves any one correct solution, it will prop it up as a good thing, unlike MSE. 
- Image (Ledig et al. 2016  -- SRGAN)



# Future Work

- Mode collapse problem
- Other Games: Understand how continuous high-dimensional non-convex games converge
- The Convergence problem
- Evaluation of generative models. 
- Discrete outputs
- Using the latent code $z$

Other games
- Game theory in general and convergence. 

Evaulation
- How do we quantitatively rate the generated results

Discrete outputs:
- For NLP (Natural language processing)

Using the latent code
- Latent Code has very high level semantic information from the real data distrubution. 
- Not clear how to use this. 

# Questions

# References

## Papers
See BibTex in Repo


## Repo
https://github.com/jjhartmann/NN-GAN-Presentation


## Websites 


- http://www.deeplearningbook.org/contents/generative_models.html // Ians book on Deep Learning
- https://affinelayer.com/pixsrv/ // Image to Image Tensorflow implementation for the paper by Isola et al.
- https://phillipi.github.io/pix2pix/ // Image to Image webstie. 
- https://github.com/goodfeli // Ian Goodfellow GitHub
- https://github.com/damianavila/RISE // Jupyter notebook slide extension
- https://github.com/soumith/ganhacks/blob/master/README.md // Awesome hacks/tricks for GAN training.
- https://www.youtube.com/playlist?list=PLJscN9YDD1buxCitmej1pjJkR5PMhenTF // 2016 NIPS tutorial
- https://github.com/adeshpande3/Generative-Adversarial-Networks // GAN tensortflow implementation
- https://www.tensorflow.org/get_started/mnist/pros // Tensorflow Tutorial
- https://blog.openai.com/generative-models/ // OpenAI blog
- https://deepmind.com/blog/wavenet-generative-model-raw-audio/ // WaveNet
- http://distill.pub/2016/deconv-checkerboard/ // Blog on why generators porudce checkerboard artifacts
- http://3dgan.csail.mit.edu/ // 3D Gan
- http://www.kdnuggets.com/2017/01/generative-adversarial-networks-hot-topic-machine-learning.html // Julien on GANs
- https://github.com/awjuliani/TF-Tutorials // Tensorflow impl for DCGAN
- https://jaan.io/what-is-variational-autoencoder-vae-tutorial/ // Tutorial on VAEs


# ORPHANS

### Unrolled GAN (Metz et al. 2016)
![ModeCollapse_solutions_unrolledGAN.PNG](./images/ModeCollapse_solutions_unrolledGAN.PNG)

### Original GAN Flow Chart
![GAN%20Flow%20Graph.PNG](./images/GAN%20Flow%20Graph.PNG)

### Coordinating Global Structure
![Gans%20on%20imageNet_coordinating%20global%20structure.PNG](./images/Gans%20on%20imageNet_coordinating%20global%20structure.PNG)

### Perspective
![Gans%20on%20imageNet_perspective.PNG](./images/Gans%20on%20imageNet_perspective.PNG)

### Counting
![Gans%20on%20imageNet.PNG](./images/Gans%20on%20imageNet.PNG)

### Problem With Batch Normalization
![Problem%20with%20batch%20normalization%20-%20correlations..PNG](./images/Problem%20with%20batch%20normalization%20-%20correlations..PNG)

### Text to Image Synthesis
![text-to-image-synthesis.PNG](./images/text-to-image-synthesis.PNG)

### Text to Image Synthesis: Network
![TextToImageGANetwork_Reed2016.png](./images/TextToImageGANetwork_Reed2016.png)

### How does this work in code...

In [None]:
#These two placeholders are used for input into the generator and discriminator, respectively.
z_in = tf.placeholder(shape=[None,z_size],dtype=tf.float32) #Random vector
real_in = tf.placeholder(shape=[None,32,32,1],dtype=tf.float32) #Real images

Gz = generator(z_in) #Generates images from random z vectors
Dx = discriminator(real_in) #Produces probabilities for real images
Dg = discriminator(Gz,reuse=True) #Produces probabilities for generator images

#These functions together define the optimization objective of the GAN.
d_loss = -tf.reduce_mean(tf.log(Dx) + tf.log(1.-Dg)) #This optimizes the discriminator.
g_loss = -tf.reduce_mean(tf.log(Dg)) #This optimizes the generator.

In [1]:
## See References for full code and tensorflow examples...

### Super Resolution GAN (Ledig et al. 2015)

![SuperResolutionGAN_Ledig.PNG](./images/SuperResolutionGAN_Ledig.PNG)

Ledig et al (2016)  SRGAN (Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network)
- bicubic interpolation, 
- deep residual network optimized for MSE, 
- deep residual generative adversarial network optimized for a loss more sensitive to human perception, 
- original HR image. 



### Predicting Next Video Frame  (Lotter et al. 2016)

![NextFramePrediction_MSEvsGAN_Lotter-2016.PNG](./images/NextFramePrediction_MSEvsGAN_Lotter-2016.PNG)

- MSE - the image does not hold much detail as the algorithm blurs over many possible answers
- AL (adverarial loss) Chooses one possible correct solution. Image is sharper. 


### Mode Collapse Problem

- The GAN remains stuck cycling through modes. 
- $\min_G \max_D V(G, D) \neq  \max_D \min_G V(G, D)$

![ModeCollapse_issues_Metz-2016.PNG](./images/ModeCollapse_issues_Metz-2016.PNG)



Min Max is not gauranteed. 
- Could also produce max min, which is a result of the simulataneous  gradient descent. 
- Max Min: then the generator tries to minimize the error for a particular set (minibatch) of samples which results in the focus on a single mode. 

Here we can see that at various iterations in the training, different modes are targeted. 
- There has been research in Unrolled GANs by Metz et al. 2016 that look to have minimized this issue. But is it solved?



### KL and Reverse KL Divergence
![KL%20v%20Reverse%20KL%20diagram.PNG](./images/KL%20v%20Reverse%20KL%20diagram.PNG)

Jensen-Shannan Divergence (Symmetric)
- Can be thought of as the reverse of the KL (Kullback–Leibler) divergence
- Which measures how one probability diverges from the other. 

- P - real probability distribution
- q - model
- Maximum Likelihood - averages over the desnsity
- Reverse KL - minimizes the divergence for a single mode in the desnsity distrubtion. 
