# Introduction 
This tutorial will introduce you the basic of Generative Adversarial Networks(GANs) and provide a simple implementation for generating MNIST pictures. Generative Adversarial Networks, which was proposed by Ian Goodfellow in 2014, are set of models to create data that is similar to the input data. Formally, GANs are generative models which can learn the data distribution of data from the training examples. By using the distribution, we can create greate outputs. In data science, sometimes we can not get enough data to train our models. GANs can help us to generate more data and we can utilize semi-supervised learning algorithm to obtain better model.  



# Tutorial content
In this tutorial, we will introduce the basis of GANs and provide a simple example implementing with [Tensorflow](https://www.tensorflow.org/).

We'll take an overview of the architecture of GANs. Then we'll go a little deeper into the formal definition of GANs. After that we implement GANs with [Tensorflow](https://www.tensorflow.org/) and try to generate some MNIST images. Finally, we'll cover the problems of GANs and some interesting applications of GANs.

We will cover the following topics in this tutorial.
- [Architecture](#Architecture)
- [Formal definition](#Formal-definition)
- [Implementation](#Implementation)
- [Problems](#Problems)
- [More applications](#More-applications)
- [Summary and reference](#Summary-and-reference)


# Architecture 
Generative Adversarial Networks consists two networks, one is generative network and another is discriminator network. The training process is like playing a game: the generative network tries to generate data that is indistinguishable from the training data and the discriminator tries to identify whether the data is from the training data or from the generative network. As the gaming playing, the generative network becomes better and better at making fake data and it becomes harder for the discriminator network to distinguish whether the data is fake. According to the game theory, finally they might reach the Nash equilibrium where the generative network can perfectly model the distribution of training data and the discriminator network can not identify samples from the generative model or from training data. The general architecture of GANs looks like this:
![image.png](./pic/2.jpg)




# Formal definition
Formally, let $G(z; \theta_g)$ denotes the generative model, where $z$ is noise and $\theta_g$ is parameters of model, and $D(x; \theta_d)$ denotes the discriminator model. The $G(z; \theta_g)$ outputs the same form of data as training data and the $D(x; \theta_d)$ outputs a single scala which represents the probability that $x$ from training data. We train both networks through stochastic gradient descent and for D, we are trying to maximize the probability of output the right label for both training data and generated data. In the meantime, we train G to minimize the $log(1 - D(G(z)))$. And the whole game can be represents as: $$\min_{G}\max_D V(D, G) = E_{x - p_{data}(x)}[logD(x)] + E_{z - p_{z}(z)}[log(1 - D(G(z)))]$$
The training process can be expressed as Algorithm 1.
![image.png](./pic/3.png)

# Implementation
In this section, we present a example implementation of GANs based on Tensorflow, and try to generate some MNIST images.
First, we import the required libraries.

In [None]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
import matplotlib.pyplot as plt
import scipy

We also import MNIST datasets.

In [None]:
mnist = input_data.read_data_sets("MNIST_data/")

Then we define a helper function for conv2d. The conv2d() will create a variable w and forward the call to tf.nn.conv, after that, it will create a bias variable and add it to the result. The barch_norm() function just forwards calls to tf.contrib.layers.batch_norm.

In [None]:
def conv2d(x, df_dim, name = 'conv2d'):
    with tf.variable_scope(name):
        w = tf.get_variable('w', [5, 5, x.get_shape()[-1], df_dim],
                            initializer=tf.truncated_normal_initializer(stddev=0.04))
        conv = tf.nn.conv2d(x, w, strides = [1, 2, 2, 1], padding = 'SAME')
        b = tf.get_variable('b', [df_dim], 
                            initializer=tf.constant_initializer(0.0))
        conv = tf.nn.bias_add(conv, b)
        return conv
    
def batch_norm(input_, name):
    return tf.contrib.layers.batch_norm(input_, 
                                       center=True,
                                       scale=True,
                                       is_training=True,
                                       scope=name)

We first define our discriminator model, it's just like the Tensorflow sample CNN model for MNIST, you can refer [here](https://www.tensorflow.org/tutorials/layers) for details. 

In this model, we have 3 conv layers. For each layer, there is a pipeline, which contains conv + batch_norm + relu. And Finally, we create two variables and get the output of the network, which is a \[batch_size, 1\] vector consists possibilities of the image to be in training data.

In [None]:
def discriminator(image, reuse = False):
    with tf.variable_scope('dis') as scope:
        if (reuse):
            tf.get_variable_scope().reuse_variables()
        # number of filters in first conv layer
        df_dim = 64

        # with three conv layer
        h0 = tf.nn.relu(conv2d(image, df_dim, 'd_h0'))
        h1 = tf.nn.relu(batch_norm(conv2d(image, df_dim * 2, 'd_h1'), 'd_bn1'))
        h2 = tf.nn.relu(batch_norm(conv2d(image, df_dim * 4, 'd_h2'), 'd_bn2'))
        h3 = tf.nn.relu(batch_norm(conv2d(image, df_dim * 8, 'd_h3'), 'd_bn3'))

        h3 = tf.reshape(h3, [batch_size, -1])

        # Final layer
        w = tf.get_variable('d_w', [h3.get_shape().as_list[1], 1], 
                            initializer=tf.truncated_normal_initializer(stddev=0.04))
        b = tf.get_variable('d_b', [1], initializer=tf.constant_initializer(0))
        y_pred = tf.matmul(h3, w) + b

        return y_pred

Now, we can define our generator network. The network takes a vector of noise and unsamples it to build a 28x28 image. On the opposite, the generator performs this operation through a convolutional transpose layer. 

We also define a helper function here to make our code look cleaner. In deconv2d(), it just creates a new variable and forwards the call to tf.nn.conv2d_transpose.

In [None]:
def deconv2d(input_, out_shape, name = 'deconv2d'):
    with tf.variable_scope(name):
        w = tf.get_variable('w', [5, 5, out_shape[-1], input_.get_shape()[-1]], 
                           initializer=tf.random_normal_initializer(stddev=0.04))
        deconv = tf.nn.conv2d_transpose(input_, w, output_shape = out_shape, 
                                        strides = [1, 2, 2, 1])
        return deconv

The generator consists four conv transposes, which is just a pipeline of deconv2d + batch_norm + relu. And we gradually increase the output until it becomes a 28x28 image.

In [None]:
def generator(z, batch_size, reuse = False):
    with tf.variable_scope('gen') as scope:
        if (reuse):
            tf.get_variable_scope().reuse_variables()
        s = 28 # Output size of the image
        def out_size(size, stride):
            return int(math.ceil(float(size) / stride))
        s2 = out_size(s, 2)
        s4 = out_size(s, 4)
        s8 = out_size(s, 8)
        s16 = out_size(s, 16)
        gf_dim = 64 #Dimension of gen filters in first conv layer

        h0 = tf.reshape(z, [-1, s16, s16, gf_dim * 8])
        h0 = tf.nn.relu(batch_norm(h0, 'g_bn0'))

        # First Deconv layer
        h1 = deconv2d(h0, [batch_size, s8, s8, gf_dim * 4], 'g_h1')
        h1 = tf.nn.relu(batch_norm(h1, 'g_bn1'))

        # Second Deconv layer
        h2 = deconv2d(h1, [batch_size, s4, s4, gf_dim * 2], 'g_h2')
        h2 = tf.nn.relu(batch_norm(h2, 'g_bn2'))

        # Third Deconv layer
        h3 = deconv2d(h2, [batch_size, s2, s2, gf_dim], 'g_h3')
        h3 = tf.nn.relu(batch_norm(h3, 'g_bn3'))

        # Forth Deconv Layer
        h4 = deconv2d(h3, [batch_size, s, s, 1], 'g_h4')
        return tf.nn.tanh(h4)

So, now we have both generator model and discriminator model. We can write code to train our model. In tensorflow, we first should create a sesssion. We also create placeholders for the input of generator and discriminator. 

In [None]:
session = tf.Session()
z_dim = 100 # dim for vector z
batch_size = 64
x = tf.placeholder("float", [None, 28, 28, 1])
z = tf.placeholder("float", [None, z_dim])

Then we create the output of our networks.

In [None]:
D = discriminator(x) # predicted probabilities for real images
G = generator(z, batch_size) # generated images
Dg = discriminator(G, reuse = True) # predicted probabilities for generated images

Now we can generate loss function for both networks. For the loss function of generator model, the model wants to generate images looks real(labeled 1 by discriminator model). Therefore, the loss function can be computed with Dg and 1. We use tf.nn.sigmoid_cross_entropy_with_logits to calculate our loss function. The function takes 2 parameters, which are logits(x) and label(z), and returns $z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))$. And the reduce_mean function just calculates the mean value in the matrix reutrned by tf.nn.sigmoid_cross_entropy_with_logits. So the loss is a value rather than a vector or matrix.

In [None]:
# Calculate the loss function for generator model
g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = Dg, 
                                                                labels = tf.ones_like(Dg)))

As for loss function of discriminator model, it consists two parts, one is the loss of predicting the real image and another is the loss of predicting the fake one. We just add them together.

In [None]:
# Loss function for predicting real data
d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = D, 
                                                                    labels = tf.ones_like(D)))
# Loss function for predicting fake data
d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = Dg, 
                                                                    labels = tf.ones_like(Dg)))
d_loss = d_loss_real + d_loss_fake

Now, we build two optimizers with variables. We choose Adam to perform SGD for our model.

In [None]:
t_vars = tf.trainable_variables()
# Optimizer for discriminator
d_optim = tf.train.AdamOptimizer()\
    .minimize(d_loss, var_list = [var for var in t_vars if 'd_' in var.name])
# Optimizer for generator
g_optim = tf.train.AdamOptimizer()\
    .minimize(g_loss, var_list= [var for var in t_vars if 'g_' in var.name])

Next, we can train our model.

In [None]:
# init the variables
tf.global_variables_initializer().run()

# run 20 epoches
epoch = 20
# each epoches with 1000 iteractions
iteration = 1000
for i in xrange(epoch):
    for j in xrange(iteration):
        # generate random z vector
        batch_z = np.random.uniform(-1, 1, [batch_size, z_dim]).astype(np.float32)
        # get the real image
        batch_images = mnist.train.next_batch(batch_size)
        batch_images = np.reshape(batch_images[0], [batch_size, 28, 28, 1])
        # update the discriminator
        _, dLoss = session.run([d_optim, d_loss], feed_dict = {
            z: batch_z, 
            x: batch_images
        })
        
        # update the generator
        _, gLoss = session.run([g_optim, g_loss], feed_dict = {
            z: batch_z
        })

After training, we can see the outputs. The training will take a while, especially on computers without GPU support for tensorflow. Here is a sample output.
![image.png](./pic/4.png)
Looks just like the real training data. Here is a sample output in the middle of training process.
![image.png](./pic/5.png)
Looks not as good as the previous one, right? 


# Problems
It seems that our model get a good result, but there are some problems related to GANs.
1. It's hard to define what is a good result. Unlike other models, which we can define how well the model perform, GANs do not have a really clear criterion of goodness. 
2. GANs are hard to train. Since we use SGD to train our model, there is no guarantee that we can reach the global optimal. We should be really careful for picking the right hyperparameters and training process. Otherwise one network can overpower another so that we can not get the desired outputs.

# More applications
GANs can be used more than generate images. Here are some applications.
- [Text-to-image](https://github.com/reedscot/icml2016): generate images from plain text.
- [Generating time-lapse videos with image](https://arxiv.org/abs/1709.07592): predict the next few frames of image
- [Generating new data with GANs](https://www.toptal.com/machine-learning/generative-adversarial-networks): generate new data with GANs to improve the performance of model

# Summary and reference
This tutorial only highlights the basic of GANs and give a very simple example. For further detail about GANs, please refer the following links.
1. [GANs tutorial by Ian Goodfellow, NIPS 2016](https://channel9.msdn.com/Events/Neural-Information-Processing-Systems-Conference/Neural-Information-Processing-Systems-Conference-NIPS-2016/Generative-Adversarial-Networks)
2. [DCGAN implementation](https://github.com/carpedm20/DCGAN-tensorflow)
3. [GANs paper](https://arxiv.org/pdf/1406.2661.pdf)
4. [Brandon Amos's Image Completion Project](https://bamos.github.io/2016/08/09/deep-completion/)
5. [Tensorflow](https://www.tensorflow.org/)
6. [MNIST datasets](http://yann.lecun.com/exdb/mnist/)
