# DCGAN

본 Notebook은 Alec Radford et al. 의 논문 'Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks' 내용을 해석하여 이해하고, 이를 모듈화하는 것을 목적으로 합니다. 

*Learning reusable feature representations from large unlabeled datasets has been an area of active research. In the context of computer vision, one can leverage the prectically unlimited amount of unlabeled images and videos to learn good intermediate representations, which can then be used on a variety of supervised learning tasks such as image classifiacation. *

*We propose that one way to build good image representations is by training Generative Adversarial Networks (GANs, Goodfellow et al, 2014), and later reusing parts of the generator and discriminator networks as feature extractors for supervised tasks. GANs provide an attractive alternative to maximum likelihood techniques. *

Ian Goodfellow의 GAN이 발표되고 나서 다양한 연구 분야에서 GAN을 활용하기 시작한 이래로, 쓸만한 output을 얻기 위한 trainig은 notorious 라는 수식이가 자연스레 붙을 정도로, 실제 적용하는 데에는 Mode collapse 등의 불안정한 구조라는 문제가 내재되어 있다. [NIPS 2016 Tutorial](https://www.youtube.com/watch?v=AJVyzd0rqdc)

*One can additionally argue that their learning process and the lack of a heuristic cost function (such as pixel-wise independent mean-square error) are attractive to representation learning. <span style="color:red"> GANs have been known to be unstable to train, often resulting in generators that produce nonsensical outputs. </span> There has been very limited published research in tyring to understand and visualize what GANs learn, and the intermediate representations of multi-layer GANs. *


In [2]:
import tensorflow as tf

  from ._conv import register_converters as _register_converters


### Dataset

논문 中 4. Details of Adversarial training

*We trained DCGANs on three datasets, Large-scale Scene Understanding (LSUN) (Yu et al. 2015), Imagenet-1k and a newly assembled Faces dataset. Details on the usage of each of these datasets are given below.*

### Framework

논문 中 4. Details of Adversarial training

*No pre-processing was applied to training images besides scaling to the range of the <span style="font-weight:bold;color:black">tanh activation</span> function <span style="font-weight:bold;color:blue">[-1,1]</span>. All models were trained with mini-batch <span style="font-weight:bold;color:black">stochastic gradient descent (SGD)</span> with a mini-batch size of <span style="font-weight:bold;color:blue">128</span>. All weights wew initialized from a <span style="font-weight:bold;color:blue">zero</span>-centered <span style="font-weight:bold;color:black">Normal distribution </span>with standard deviation <span style="font-weight:bold;color:blue">0.02</span>. In the <span style="font-weight:bold;color:black">LeakyReLU</span>, the slope of the leak was set to <span style="font-weight:bold;color:blue">0.2</span> in all models. While previous GAN work has used momentum to accelerate training, <span style="font-weight:bold;color:black">we used the Adam optimizer</span>(Kingma & Ba, 2014) with tuned hyperparameters. We found the suggested <span style="font-weight:bold;color:black">learning rate</span> of <span style="font-weight:bold;color:blue">0.001</span>, to be too high, using 0.0002 instread. Additionally, we found leaving the momentum term **β1** at the suggested value of 0.9 resulted in training oscillation and instability while reducing it to <span style="font-weight:bold;color:blue"> 0.5 </span> helped stabilize training. *

In [None]:
learning_rate = 0.001
size_of_batch = 128

### Generator

논문 中 Figure 1

DCGAN generator used for LSUN scene modeling. A 100 dimensional uniform distribution Z is projected to a samll spatial extent convolutional representation with many feature maps. A series of four fractionally-strided convolutions (in some recent paper, these are wrongly called devonvolutions) then convert this high level representation into a 64 X 64 pixel image. Notably, no fully connected or pooling layers are used

![](imgs/figure1.jpg)

In [7]:
def generator(z):    
    
    with tf.variable_scope('generator') as scope:

        output_dimension = 64
        conv4_dimension = int(output_dimension/ 2) # 32
        conv3_dimension = int(conv4_dimension / 2) # 16
        conv2_dimension = int(conv3_dimension / 2) # 8
        conv1_dimension = int(conv2_dimension / 2) # 4
        projection_dimension = conv1_dimension

        output_channel = 3
        projection_channel = 1024
        conv1_channel = projection_channel
        conv2_channel = int(conv1_channel / 2) # 512
        conv3_channel = int(conv2_channel / 2) # 256
        conv4_channel = int(conv3_channel / 2) # 128

        size_filter = 5

        # Project and reshape Layer
        hidden_layer = tf.reshape(z, [size_of_batch, projection_dimension, projection_dimension, projection_channel ])
        hidden_layer = tf.nn.leaky_relu(hidden_layer, alpha=0.2)

        # Convolusional Layer 1
        output_shape1 = [size_of_batch, conv2_dimension, conv2_dimension, conv2_channel]
        w_conv1 = tf.get_variable('w_conv1',[size_filter, size_filter, output_shape1[-1], int(hidden_layer.get_shape()[-1])],initializer=tf.random_normal_initializer(mean=0, stddev=0.02))
        b_conv1 = tf.get_variable('b_conv1',[output_shape1[-1]],initializer=tf.constant_initializer(0.5))

        conv1 = tf.nn.conv2d_transpose(hidden_layer, w_conv1, output_shape=output_shape1, strides=[1,2,2,1], padding='SAME') + b_conv1
        conv1 = tf.layers.batch_normalization(inputs=conv1, center=True, scale=True, is_training=True, scope='g_bn1')
        conv1 = tf.nn.leaky_relu(conv1, alpha=0.2)

        # Convolusional Layer 2
        output_shape2 = [size_of_batch, conv3_dimension, conv3_dimension, conv3_channel]
        w_conv2 = tf.get_variable('w_conv2',[size_filter, size_filter, output_shape2[-1], int(conv1.get_shape()[-1])],initializer=tf.random_normal_initializer(mean=0, stddev=0.02))
        b_conv2 = tf.get_variable('b_conv2',[output_shape2[-1]],initializer=tf.constant_initializer(0.5))

        conv2 = tf.nn.conv2d_transpose(conv1, w_conv2, output_shape=output_shape2, strides=[1, 2, 2, 1],padding='SAME') + b_conv2
        conv2 = tf.layers.batch_normalization(inputs=conv2, center=True, scale=True, is_training=True, scope='g_bn2')
        conv2 = tf.nn.leaky_relu(conv2, alpha=0.2)

        # Convolusional Layer 3
        output_shape3 = [size_of_batch, conv4_dimension, conv4_dimension, conv4_channel]
        w_conv3 = tf.get_variable('w_conv3',[size_filter, size_filter, output_shape3[-1], int(conv2.get_shape()[-1])],initializer=tf.random_normal_initializer(mean=0, stddev=0.02))
        b_conv3 = tf.get_variable('b_conv3',[output_shape3[-1]],initializer=tf.constant_initializer(0.5))

        conv3 = tf.nn.conv2d_transpose(conv2, w_conv3, output_shape=output_shape3, strides=[1, 2, 2, 1],padding='SAME') + b_conv3
        conv3 = tf.layers.batch_normalization(inputs=conv3, center=True, scale=True, is_training=True, scope='g_bn3')
        conv3 = tf.nn.leaky_relu(conv3, alpha=0.2)

        # Convolusional Layer 4
        output_shape4 = [size_of_batch, conv4_dimension, conv4_dimension, conv4_channel]
        w_conv4 = tf.get_variable('w_conv4',[size_filter, size_filter, output_shape4[-1], int(conv3.get_shape()[-1])],initializer=tf.random_normal_initializer(mean=0, stddev=0.02))
        b_conv4 = tf.get_variable('b_conv4',[output_shape4[-1]],initializer=tf.constant_initializer(0.5))

        conv4 = tf.nn.conv2d_transpose(conv3, w_conv4, output_shape=output_shape4, strides=[1,2,2,1],padding='SAME') + b_conv4
        conv4 = tf.layers.batch_normalization(inputs=conv4, center=True, scale=True, is_training=True, scope='g_bn4')
        conv4 = tf.nn.leaky_relu(conv4, alpha=0.2)

        return conv4
 