<h1>Deep Convolutional GAN (DCGAN) and Image Reconstruction using MNIST</h1>

<h1>Authors</h1>

<ul>
    <li>Davide Anghileri</li>
    <li>Denis Dushi</li>
    <li>Nathan Consuegra</li>
</ul>

<h1>Introduction</h1>

<p>There are 2 main goals for this project:
<ul>
    <li>The first one is to implement a Deep Convolutional Generative Adversarial Network (DCGAN) that is composed by the following two modules: 1) a discriminator to distinguish between real and fake images, which is a binary Convolutional Neural Network classifier and 2) a generator, which is a Deconvolutional Neural Network, that starting from 100 random numbers between 0 and 1 is able to generate a new picture that is similar to the ones in the dataset in order to fool the discriminator. (See original paper [1]).</li>
    <br/>
    <li>The second goal, as done in [4], is to be able to reconstruct an image with missing pixels, a task for which we use the generator previously described by giving it as input an array of 100 random pixels. After this, we use Adam Optimization to adjust the input for producing an image able to reconstruct the missing pixels. (See original paper [4]).</li>
</ul>
<br/>

<h1>Architecture</h1>

<br/>

![Image7-Monitor.png](../../images/network.jpg)
 The network architecture for the Generator (G) and the Descriminator (D)

<br/>
<p>The following are the training details:

<ul>
    <li>Batch size: 100</li>
    <li>Learning rate: 0.0002</li>
    <li>Training epoch: 20</li>
    <li>Leaky ReLU: 0.2</li>
    <li>Adam optimizer:</li>
        <ul>
            <li>beta1: 0.5</li>
        </ul>
    <li>Dataset normalization:</li>
        <ul>
            <li>(range: -1 ~ 1)</li>
            <li>(pix_val - 0.5) / 0.5</li>
        </ul>
    <li>Weight init:</li>
        <ul>
            <li>Normal distribution</li>
        </ul>
</ul>

The following code is adapted from [2] and [3].

In [None]:
def wrapper(learning_rate = 0.0002, seed = 1234, beta1 = 0.5, batch_size = 100, maskType = 'center'):

    import tensorflow as tf
    import numpy as np
    
    # Use this module to get the TensorBoard logdir
    from hops import tensorboard
    tensorboard_logdir = tensorboard.logdir()
    
    # use this module to get the project path in hdfs
    from hops import hdfs
    project_path = hdfs.project_path()

    import datetime #logging the time for model checkpoints and training
    
    # set the parameters
    batch_size = batch_size
    lr = learning_rate
    seed = seed
    beta1 = beta1
    maskType = maskType
    train_epoch = 2
    image_size = 64
    image_shape = [image_size,image_size,1]
    lam = 0.2
    centerScale = 0.3
    

    # Step 1 - Collect dataset
    # MNIST - handwritten character digits ~50K training and validation images + labels, 10K testing.
    from tensorflow.examples.tutorials.mnist import input_data
    # will ensure that the correct data has been downloaded to your 
    # local training folder and then unpack that data to return a dictionary of DataSet instances.
    mnist = input_data.read_data_sets("MNIST_data/", one_hot=True, reshape=[])
    
    # to have deterministic results
    tf.set_random_seed(seed)
    
    ########-----------------------------------DISCRIMINATOR --------------------------------------------------########
    # Take a [batch_size, 64, 64, channels] tensor and output a probability that each image is real (and not fake).
    # LRelu is used and it work well for higher resolution modeling.
    
    def discriminator(x, isTrain=True, reuse=False):
        with tf.variable_scope('discriminator', reuse=reuse):
            
            # 1st hidden layer
            conv1 = tf.layers.conv2d(x, 128, [4, 4], strides=(2, 2), padding='same')
            lrelu1 = lrelu(conv1, 0.2)
    
            # 2nd hidden layer
            conv2 = tf.layers.conv2d(lrelu1, 256, [4, 4], strides=(2, 2), padding='same')
            lrelu2 = lrelu(tf.layers.batch_normalization(conv2, training=isTrain), 0.2)
    
            # 3rd hidden layer
            conv3 = tf.layers.conv2d(lrelu2, 512, [4, 4], strides=(2, 2), padding='same')
            lrelu3 = lrelu(tf.layers.batch_normalization(conv3, training=isTrain), 0.2)
    
            # 4th hidden layer
            conv4 = tf.layers.conv2d(lrelu3, 1024, [4, 4], strides=(2, 2), padding='same')
            lrelu4 = lrelu(tf.layers.batch_normalization(conv4, training=isTrain), 0.2)
    
            # output layer
            conv5 = tf.layers.conv2d(lrelu4, 1, [4, 4], strides=(1, 1), padding='valid')
            o = tf.nn.sigmoid(conv5)
    
            return o, conv5
    
    ########-------------------------------------GENERATOR ----------------------------------------------------########
    # The generator takes a d-dimensional noise vector and upsamples it to become a 64 x 64 image. 
    # So, it takes random inputs, and eventually mapping them down to a [1,64,64,channels] pixel.
    # We replaces deterministic spatial pooling functions (such as maxpooling) with strided convolutions, 
    # allowing the network to learn its own spatial downsampling.
    # LReLUs are then used to stabilize the outputs of each layer.
    # Batch Normalization is used to stabilizes learning and help gradient flows.
    
    def lrelu(x, th=0.2):
        return tf.maximum(th * x, x)

    # G(z)
    def generator(x, isTrain=True, reuse=False):
        with tf.variable_scope('generator', reuse=reuse):
    
            # 1st hidden layer
            conv1 = tf.layers.conv2d_transpose(x, 1024, [4, 4], strides=(1, 1), padding='valid')
            lrelu1 = lrelu(tf.layers.batch_normalization(conv1, training=isTrain), 0.2)
    
            # 2nd hidden layer
            conv2 = tf.layers.conv2d_transpose(lrelu1, 512, [4, 4], strides=(2, 2), padding='same')
            lrelu2 = lrelu(tf.layers.batch_normalization(conv2, training=isTrain), 0.2)
    
            # 3rd hidden layer
            conv3 = tf.layers.conv2d_transpose(lrelu2, 256, [4, 4], strides=(2, 2), padding='same')
            lrelu3 = lrelu(tf.layers.batch_normalization(conv3, training=isTrain), 0.2)
    
            # 4th hidden layer
            conv4 = tf.layers.conv2d_transpose(lrelu3, 128, [4, 4], strides=(2, 2), padding='same')
            lrelu4 = lrelu(tf.layers.batch_normalization(conv4, training=isTrain), 0.2)
    
            # output layer
            conv5 = tf.layers.conv2d_transpose(lrelu4, 1, [4, 4], strides=(2, 2), padding='same')
            o = tf.nn.tanh(conv5)
    
            return o
    
    ########------------------------------------- TRAINING ----------------------------------------------------########

    sess = tf.Session() 

    # fixed z to generate images to see progress in the tensorboard
    fixed_z_ = np.random.normal(0, 1, (batch_size, 1, 1, 100))
    
    # x is the real image input tensor, z the random genereted input tensor
    x = tf.placeholder(tf.float32, shape=(None, 64, 64, 1))
    z = tf.placeholder(tf.float32, shape=(None, 1, 1, 100))
    mask = tf.placeholder(tf.float32, shape=(64, 64, 1), name='mask')
    isTrain = tf.placeholder(dtype=tf.bool)
    
    # networks : generator
    G_z = generator(z, isTrain)
    
    G_imgs = generator(z, isTrain, True)

    # networks : discriminator
    D_real, D_real_logits = discriminator(x, isTrain)
    D_fake, D_fake_logits = discriminator(G_z, isTrain, reuse=True) 
    
    # loss for each network
    D_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_real_logits, labels=tf.ones([batch_size, 1, 1, 1])))
    D_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake_logits, labels=tf.zeros([batch_size, 1, 1, 1])))
    D_loss = D_loss_real + D_loss_fake
    G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake_logits, labels=tf.ones([batch_size, 1, 1, 1])))
    
    # trainable variables for each network
    T_vars = tf.trainable_variables()
    D_vars = [var for var in T_vars if var.name.startswith('discriminator')]
    G_vars = [var for var in T_vars if var.name.startswith('generator')]
    
    # Completion.
 
    contextual_loss = tf.reduce_sum(
            tf.contrib.layers.flatten(
            tf.abs(tf.multiply(mask, G_imgs) - tf.multiply(mask, x))), 1)
    perceptual_loss = G_loss
    # we want to minimaze the error reconstruction on the not masked part but also the discrimination of the generated image
    complete_loss = contextual_loss + lam*perceptual_loss
    # compute simbolic derivatives of the complete_loss with respect to z
    grad_complete_loss = tf.gradients(complete_loss, z)
    
    # optimizer for each network
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
        D_optim = tf.train.AdamOptimizer(lr, beta1=beta1).minimize(D_loss, var_list=D_vars)
        G_optim = tf.train.AdamOptimizer(lr, beta1=beta1).minimize(G_loss, var_list=G_vars)
        
    #Outputs a Summary containing a single scalar value.
    tf.summary.scalar('Generator_loss', G_loss)
    tf.summary.scalar('Discriminator_loss_real', D_loss_real)
    tf.summary.scalar('Discriminator_loss_fake', D_loss_fake)
    tf.summary.scalar('Discriminator_loss', D_loss)
        

    
    #Output 25 generated images using fixed_z_
    images_for_tensorboard = generator(z, isTrain, True)
    tf.summary.image('Generated_images', images_for_tensorboard, 15)
    merged = tf.summary.merge_all()
    
    # summary for the completion part
    tf.summary.scalar('Complete_loss', tf.reduce_mean(complete_loss))
    tf.summary.image('before', x, 15)
    masked_images = np.multiply(x, mask)
    tf.summary.image('masked', masked_images, 15)
    # hats is equal to the previous generated images so it can be avoided
    tf.summary.image('hats', G_imgs, 15)
    inv_masked_hat_images = np.multiply(G_imgs, 1.0-mask)
    completed = masked_images + inv_masked_hat_images
    tf.summary.image('completed', completed, 15)
    merged2 = tf.summary.merge_all()
        
    writer = tf.summary.FileWriter(tensorboard_logdir, sess.graph)
    
    # create the saver in order to save the model
    saver = tf.train.Saver()

    sess.run(tf.global_variables_initializer())
     
    # MNIST resize and normalization
    train_set = tf.image.resize_images(mnist.train.images, [64, 64]).eval(session=sess)
    train_set = (train_set - 0.5) / 0.5  # normalization; range: -1 ~ 1
    val_set = tf.image.resize_images(mnist.validation.images, [64, 64]).eval(session=sess)
    val_set = (val_set - 0.5) / 0.5  # normalization; range: -1 ~ 1

    #During every iteration, there will be two updates being made, one to the discriminator and one to the generator. 
    #For the generator update, we’ll feed in a random z vector to the generator and pass that output to the discriminator
    #to obtain a probability score.
    #As we remember from our loss function, the cross entropy loss gets minimized, 
    #and only the generator’s weights and biases get updated.
    #We'll do the same for the discriminator update. We’ll be taking a batch of real MNIST images.
    #These will serve as the positive examples, while the images in the previous section are the negative ones.
    
    for epoch in range(train_epoch):
        
        for i in range(mnist.train.num_examples // batch_size):
            
            # update discriminator
            x_ = train_set[i*batch_size:(i+1)*batch_size]
            z_ = np.random.normal(0, 1, (batch_size, 1, 1, 100))
            loss_d_, _ = sess.run([D_loss, D_optim], {x: x_, z: z_, isTrain: True})
    
            # update generator
            z_ = np.random.normal(0, 1, (batch_size, 1, 1, 100))
            loss_g_, _ = sess.run([G_loss, G_optim], {z: z_, x: x_, isTrain: True})
    
            # every 100 iteration save the model and the summury on the tensorboard
            if np.mod(i,100)==0:
                #save_path = saver.save(sess, hdfs.project_path()+"Resources/pretrained_gan", global_step=i + epoch*(mnist.train.num_examples // batch_size))
                x_ = val_set[0:batch_size]
                summary = sess.run(merged, {x: x_, z: fixed_z_, isTrain: False})
                writer.add_summary(summary, i + epoch*(mnist.train.num_examples // batch_size))
                
    #START COMPLETION PART

    # number of images (better to use a validation set)
    nImgs = mnist.train.num_examples
    
    batch_idxs = int(np.ceil(nImgs/batch_size))
    batch_idxs = 1 #only for having some samples and stop, no need to reconstruct all images
    
    # define the type of mask to apply
    if maskType == 'random':
        fraction_masked = 0.3
        mask1 = np.ones(image_shape).astype(np.float32)
        mask1[np.random.random(image_shape[:2]) < fraction_masked] = 0.0
        mask1=mask1.astype(np.float32)
    elif maskType == 'center':
        mask1 = np.ones(image_shape).astype(np.float32)
        sz = image_size
        l = int(image_size*centerScale)
        u = int(image_size*(1.0-centerScale))
        mask1[l:u, l:u, :] = 0.0
        mask1=mask1.astype(np.float32)
    elif maskType == 'left':
        mask1 = np.ones(image_shape).astype(np.float32)
        c = image_size // 2
        mask1[:,:c,:] = 0.0
        mask1=mask1.astype(np.float32)
    elif maskType == 'grid':
        mask1 = np.zeros(image_shape).astype(np.float32)
        mask1[::4,::4,:] = 1.0
        mask1=mask1.astype(np.float32)
    else:
        assert(False)

    #for each batch
    for idx in range(0, batch_idxs):
        l = idx*batch_size
        u = min((idx+1)*batch_size, nImgs)
        batchSz = u-l
        batch_images = train_set[l:u]
        batch_images = np.array(batch_images).astype(np.float32)
    
        #shouldn't occur since 55000 is multiple of 100, but present for safety
        if batchSz < batch_size:
            padSz = ((0, int(batch_size-batchSz)), (0,0), (0,0), (0,0))
            batch_images = np.pad(batch_images, padSz, 'constant').astype(np.float32)

        # starts with 100 random numbers
        zhats = np.random.normal(0, 1, (batch_size, 1, 1, 100)).astype(np.float32)
        m = 0
        v = 0
        beta2 = beta1
        nIter = 10000 # number of iteration of the completion optimization for each batch

        for i in range(nIter):
            fd = {
                z: zhats,
                mask: mask1,
                x: batch_images,
                isTrain: False
            }
            run = [complete_loss, grad_complete_loss]
            loss, g = sess.run(run, feed_dict=fd)

            if i % 1000 == 0:
                # save the images before, masked, hats (generated) and completed.
                summary2 = sess.run(merged2, {z: zhats, mask: mask1, x: batch_images, isTrain: False})
                writer.add_summary(summary2, i+(idx*nIter) )
                             
            # Optimize single completion with Adam
            m_prev = np.copy(m)
            v_prev = np.copy(v)
            m = beta1 * m_prev + (1 - beta1) * g[0]
            v = beta2 * v_prev + (1 - beta2) * np.multiply(g[0], g[0])
            m_hat = m / (1 - beta1 ** (i + 1))
            v_hat = v / (1 - beta2 ** (i + 1))
            zhats += - np.true_divide(lr * m_hat, (np.sqrt(v_hat) + 0.000000001))
            #maybe change the interval to 0,1 as the input fixed_z_
            zhats = np.clip(zhats, -1, 1)



In [None]:
from hops import util

#Define dict for hyperparameters
args_dict = {'learning_rate': [ 0.0002], 'seed' : [5678], 'beta1' : [0.5], 'batch_size' : [100], 'maskType' : ['left', 'grid']}

# Generate a grid for the given hyperparameters
args_dict_grid = util.grid_params(args_dict)

print(args_dict_grid)

In [None]:
from hops import experiment

tensorboard_hdfs_logdir = experiment.launch(spark, wrapper, args_dict_grid, name='mnist gan')

In [None]:
from hops import tensorboard

tensorboard.visualize(spark, tensorboard_hdfs_logdir)

<h1>Results</h1>

<h3>Image generation using DCGAN</h3>

Random images generated after 20 epochs.
![Image7-Monitor.png](../../images/final_grid.jpg)
<p>After training the algorithm for <b>20 epochs</b> the generator was able to generate images very similar to our training data. The time required to train this model for 20 epochs is about 3 days on a single CPU and 1 hour on a GPU.
<br/>
<br/>
We can see the <b>generator and the discriminator losses</b> evaluated on the validation set every 100 iteration. (On the y axis we have the loss while on the x axis we have the iteration number, and every 550 iteration correspond to 1 epoch since we use a batch size of 100 and the training dataset consists of 55000 images).<br/><br/>
    
![Image7-Monitor.png](../../images/DCGAN_Mnist_tensorboard_scalars_50kIter.jpg)
<br/>
We can also observe that the generator starts to generate quite good images after few iteration and then it try to continue to improve the images with a lot of <b>oscillations without converging to a fixed solution</b>. We can observe how the generated images evolve at each epoch. (Note that at epoch 7 the generated images are random pixels even if before we collected images similar to numbers and after the generator start again to generate good images). The following image shows the 10 epochs image generation process. <br/><br/>

![Image7-Monitor.png](../../images/gif_speed_low.gif)

<br/>
Finally we did <b>hyperparameter tuning</b> and we obtained the following results:
<ul>
<li>No clear difference in using a Learning Rate of 0.0005 or 0.0002, probably due to the fact that we used Adam Optimizer that automatically adapt the learning rate for each parameter,</li>
<li>The Momentum parameter <i>beta1</i> set to 0.5 helps in reducing the oscillations during training compared to 0.9,</li>
<li>A bacth size of 100 images is a good compromise between quality of the solution and time to train for the MNIST dataset.</li>
</ul>
<br/>
These results agree with the results presented in [1].
</p>

<h3>Image Reconstruction</h3>

The objective is <b>to reconstruct an image that contains missing pixels</b>. In order to do that we use the generator of the previous step, we give as input an array of 100 random pixels and finally we do Adam Optimization to adjust the input to produce an image that is able to reconstruct the missing pixels.
In the following figure, we can see the results of these reconstruction, in the first column there is the image before the pixels were deleted, in the second column the masked image and in the third the reconstructed image.
<br/>

![Image7-Monitor.png](../../images/res.jpg)
<br/>
![Image7-Monitor.png](../../images/center.gif)
<br/>


<h1>References</h1>
<ol type="1">
  <li>Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional 
    generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015). (Full paper: https://arxiv.org/pdf/1511.06434.pdf)</li>
<li>Brandon Amos. Image Completion with Deep Learning in TensorFlow. http://bamos.github.io/2016/08/09/deep-completion. Accessed: [08/01/2018]</li>
<li>Znxlwm, "Tensorflow implementation of Generative Adversarial Networks (GAN) and Deep Convolutional Generative Adversarial Netwokrs for MNIST dataset". https://github.com/znxlwm/tensorflow-MNIST-GAN-DCGAN Accessed: [08/01/2018]</li>
  <li>Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do. "Semantic Image Inpainting with Deep Generative Models". arXiv preprint arXiv:1607.07539 (2017). (Full paper: https://arxiv.org/pdf/1607.07539.pdf)</li>
</ol>