# Generative Image to Image Translation with CycleGAN  
[Parag K. Mital](https://pkmital.com)  
[Creative Applications of Deep Learning](https://www.kadenze.com/programs/creative-applications-of-deep-learning-with-tensorflow)  
[Kadenze, Inc.](https://kadenze.com)  

In [1]:
# Bit of formatting because inline code is not styled very good by default:
from IPython.core.display import HTML
HTML("""
<style> 
div.text_cell_render h1 {
    font-size: 18pt;
}
div.text_cell_render h2 {
    font-size: 14pt;
}
div.input {
    width: 115ex;
}
div.text_cell {
    width: 115ex;
    border: 0px;
}
div.text_cell_render {
    font-family: Menlo, Courier, monospace;
    line-height: 145%;
    width: 115ex;
}
.rendered_html code { 
    padding: 2px 4px;
    color: #c7254e;
    background-color: #f9f2f4;
    border-radius: 4px;
}
</style>""")

# Introduction

This workshop introduces you to the following work:

<img src="cycle-gan-paper.png" alt="CycleGAN" style="height: 500px;"/>

Image to image translation covers a very wide set of applications in computer graphics, computer vision, and deep learning with image and video.  The basic idea is to translate an input image into an output image.  This is different to say autoencoding an image, where both the input and the output are exactly the same.  In this case, they are expected to be different sets of images.  For instance, the input image might be a landscape photo and the output could be an artistic stylization of that photo.  Or perhaps the input is a photo of a horse, and the output should be photos of zebras instead.  Let's say instead you have a maps showing the outlines of streets and highways, and you'd like to apply a texture to these images so that it looks like a satellite image instead.  Or maybe you want to recreate the Google AI Experiment where they convert sketches to pictures of cats.  Or perhaps you want to recreate the app FaceApp which adds smiles to people's faces.  These are all examples of image to image translation. 

One of the earliest demonstrations of this idea is in a paper in 2001 called Image Analogies.  Aaron Hertzmann and colleagues showed the basic idea of image to image translation by using analogous image pairs.  You would give an example analogy, such as an image of a street map, and the corresponding satellite image, and then you could give it any other street map image, and it would give you a new satellite image for it.

<img src="image-analogies-paper.png" alt="Image Analogies" style="height: 300px;"/>

How could we build a neural network to do something similar?  In the last course, we saw how to use autoencoder and generative adversarial networks for some pretty impressive unsupervised modeling of image collections.  Some of you even explored what might happen if you created an autoencoder where the input was different from the output.  That sort of network can essentially learn to translate one image into another.  Let's say for instance if you had black and white images, and wanted to convert them to color images, you could create a dataset of either image, and then feed in pairs of each type of image to an autoencoder.  The resulting loss function would then take as output some color image, and input a black and white image, and the network would need to learn to essentially colorize the image.

Though as we continued further in the course, we saw that autoencoders generally have a pretty big issue with images: their loss function generally uses a L2 or squared loss function, and this often results in really blurry reconstructions.  We then saw that the generative adversarial network did not have this issue since it learned its own loss function by training a separate network, a discriminator, to say whether an image was real or fake.   So could we build a generative adversarial network that could learn to colorize an image?  Or to convert street maps into satellite maps?  Or any other potential application of image to image translation?  This session will cover one of the state of the art techniques for image to image translation called CycleGAN.

# CycleGAN

There have been some pretty impressive developments since the time of GAN's release only 2 years ago.  In that time, there have been a number of developments that try to build an image to image translation network.  In this session, we'll look at one of the most impressive networks currently out there called CycleGAN.

CycleGAN builds on earlier work called Pix2Pix.  The Pix2Pix network requires paired translations, which means that while training, for each input, you need to specify exactly what the output should look like.  CycleGAN instead just requires two _unpaired_ collections of images and will do its best to find the mapping between them, without you having to specify what the pairs are.  It's a really impressive network and pretty simple to build.  We'll need many of the same components that we needed for the DCGAN and VAEGAN networks we built in Course 1 Session 5.

<img src="cycle-gan-figure-2.png" alt="CycleGAN Figure 2" style="height: 300px;"/>

We're going to need to build a set of operations that map one image collection to another, and build those operations for each image collection $X$ and $Y$.  We'll also have to build another set of operations so that there is a complete cycle of operations that attempt to map our first collection of images, $X$, to the second collection, $Y$, so a fake $Y$ which we denote mathematically by $\hat{Y}$, and then another generator to map both the real $Y$ and the fake $\hat{Y}$ back to $\hat{X}$ again. In the Figure below, we can sort of see how this works graphically with two collections, $X$ and $Y$:

<img src="cycle-gan-figure-3.png" alt="CycleGAN Figure 3" style="height: 300px;"/>

# Encoder

This network essentially trains 2 autoencoders using a set of generators and discriminators.  The idea is pretty easy to grasp if you are already familiar with generative adversarial networks, and most of the tricky bits are in the details of implementation.

The Generator is slightly different though.  Instead of starting with a random feature vector of say 100 values, we'll actually start with an image and then compose something not unlike an Autoencoder.  The structure of this Autoencoder is a little different.  We'll have three major components: an encoder, transformer, and decoder. 

The encoder is composed of a few convolutional layers with stride 2 which will downsample the image with each layer.  The authors create it with 3 layers, and use padding on the first layer.

Let's see it written up as code.  First some imports.  We'll include TensorFlow as well as a contrib package which makes writing layers a lot easier, similar to Keras.  We'll also be using Instance Normalization (http://arxiv.org/abs/1607.08022) for our layer normalization and Leaky ReLus as the authors of CycleGAN have done.  I've included my implementation of CycleGAN and all utility functions in the `cycle_gan.py` module which you can also find as part of the `cadl` repo: https://github.com/pkmital/pycadl (easily pip installed by: `pip install cadl`):

In [2]:
import numpy as np
import tensorflow as tf
import tensorflow.contrib.layers as tfl
from cadl.cycle_gan import lrelu, instance_norm

  return f(*args, **kwds)
This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.



Now let's write a function for the encoder, following the CycleGAN architecture.  We want a padding layer, then 3 convolutional layers with strides 1, then 2, then another 2.  The first convolutional layer will have a 7x7 convolution, and the rest will be 3x3.  Also, we'll initialize the weights to a standard deviation of 0.02 using a normal distribution.  For an activation function, we'll be using a Leaky ReLu with 0.2 leakage.  We'll also exponentially increase the number of filters with each layer, starting with 32, then going to 64 and finally 128.  Lastly, we'll also use something like batch normalization called instance normalization, and use the TensorFlow layers module to do the entire convolution operation for us in a single convenient function:

In [3]:
def encoder(x, n_filters=32, k_size=3, normalizer_fn=instance_norm,
        activation_fn=lrelu, scope=None, reuse=None):
    with tf.variable_scope(scope or 'encoder', reuse=reuse):
        h = tf.pad(x, [[0, 0], [k_size, k_size], [k_size, k_size], [0, 0]],
                "REFLECT")
        h = tfl.conv2d(
                inputs=h,
                num_outputs=n_filters,
                kernel_size=7,
                stride=1,
                padding='VALID',
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                normalizer_fn=normalizer_fn,
                activation_fn=activation_fn,
                scope='1',
                reuse=reuse)
        h = tfl.conv2d(
                inputs=h,
                num_outputs=n_filters * 2,
                kernel_size=k_size,
                stride=2,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                normalizer_fn=normalizer_fn,
                activation_fn=activation_fn,
                scope='2',
                reuse=reuse)
        h = tfl.conv2d(
                inputs=h,
                num_outputs=n_filters * 4,
                kernel_size=k_size,
                stride=2,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                normalizer_fn=normalizer_fn,
                activation_fn=activation_fn,
                scope='3',
                reuse=reuse)
    return h

We're also being explicit about our scope and reuse as we'll need to reuse these variables a few times which we'll see in a few moments.

# Residual Blocks and Transformer

The next part of the Generator is the Transformer.  These are going to be 6 or 9 Residual Blocks, which is a really powerful module that comes up a lot in almost every new architecture out there.  Instead of simply having a convolutional layer, we'll have a convolutional layer and then sum together the original output.  So it's a sort of residual function of the input which should be learned, residual meaning what's left over.  The residual blocks allow the base activation to persist, but then learns a simple addition on top of that layer.  This is useful because it ensures the original activation has a path to the output.  And similarly, it is useful for backpropagation since the gradient has less chance of exploding or vanishing, as they typically can do in very deep networks.  To read more about residual networks, check the original paper which shows how to create a network of 1000s of layers, all without having issues with vanishing or exploding gradients!

Alright let's code up the residual block.  All the convolutions are going to be single stride, and 128 channels.  Each block will have a pad layer, a 3x3 convolution with Leaky ReLu and Instance Normalization, another pad layer, another convolution with Instance Normalization, except no nonlinearity, and then an addition with the starting activation.

In [4]:
def residual_block(x, n_channels=128, normalizer_fn=instance_norm,
        activation_fn=lrelu, kernel_size=3, scope=None, reuse=None):
    with tf.variable_scope(scope or 'residual', reuse=reuse):
        h = tf.pad(x, [[0, 0], [1, 1], [1, 1], [0, 0]], "REFLECT")
        h = tfl.conv2d(
                inputs=h,
                num_outputs=n_channels,
                kernel_size=kernel_size,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                normalizer_fn=normalizer_fn,
                padding='VALID',
                activation_fn=activation_fn,
                scope='1',
                reuse=reuse)
        h = tf.pad(x, [[0, 0], [1, 1], [1, 1], [0, 0]], "REFLECT")
        h = tfl.conv2d(
                inputs=h,
                num_outputs=n_channels,
                kernel_size=kernel_size,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                normalizer_fn=normalizer_fn,
                padding='VALID',
                activation_fn=None,
                scope='2',
                reuse=reuse)
        h = tf.add(x, h)
    return h

Now we can compose many residual blocks to create our Transformer:

In [5]:
def transform(x, img_size=256, reuse=None):
    h = x
    if img_size >= 256:
        n_blocks = 9
    else:
        n_blocks = 6
    for block_i in range(n_blocks):
        with tf.variable_scope('block_{}'.format(block_i), reuse=reuse):
            h = residual_block(h, reuse=reuse)
    return h

# Decoder

Great, now the last piece we need to code up our Generator is the Decoder.  This is basically going to do the complete opposite of our Encoder.  We'll have three deconvolutional layers with stride 2, stride 2, and stride 1, and kernel sizes 3, 3, and 7.  Before the last layer we'll also pad to avoid boundary artifacts with the larger 7x7 kernel, and our last activation will be a tanh, meaning our images will be in the range of -1 and 1.  Generally this is an ideal normalization for images as it means the starting point is basically a grey image.  We'll need to keep this in mind when we feed our data into the network and ensure we are using images in the same range.

In [6]:
def decoder(x, n_filters=32, k_size=3, normalizer_fn=instance_norm,
        activation_fn=lrelu, scope=None, reuse=None):
    with tf.variable_scope(scope or 'decoder', reuse=reuse):
        h = tfl.conv2d_transpose(
                inputs=x,
                num_outputs=n_filters * 2,
                kernel_size=k_size,
                stride=2,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                normalizer_fn=normalizer_fn,
                activation_fn=activation_fn,
                scope='1',
                reuse=reuse)
        h = tfl.conv2d_transpose(
                inputs=h,
                num_outputs=n_filters,
                kernel_size=k_size,
                stride=2,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                normalizer_fn=normalizer_fn,
                activation_fn=activation_fn,
                scope='2',
                reuse=reuse)
        h = tf.pad(h, [[0, 0], [k_size, k_size], [k_size, k_size], [0, 0]],
                "REFLECT")
        h = tfl.conv2d(
                inputs=h,
                num_outputs=3,
                kernel_size=7,
                stride=1,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                padding='VALID',
                normalizer_fn=normalizer_fn,
                activation_fn=tf.nn.tanh,
                scope='3',
                reuse=reuse)
    return h

Putting it all together, our Generator will first encode, then transform, and then finally decode like so:

In [7]:
def generator(x, scope=None, reuse=None):
    img_size = x.get_shape().as_list()[1]
    with tf.variable_scope(scope or 'generator', reuse=reuse):
        h = encoder(x, reuse=reuse)
        h = transform(h, img_size, reuse=reuse)
        h = decoder(h, reuse=reuse)
    return h

# PatchGAN, Receptive Field Sizes, and the Discriminator

The other major component is the Discriminator.  This network will take as input an image and then output a single value.  In the case of a true image, it should output 1, and in the case of a false image, it should output 0.  For the generator, we want the opposite to be true.  In any case, the discriminator should saturate at 0 or 1, so will need a sigmoid as its final activation.  The network will take as input a 256 x 256 pixel image and use a series of 5 convolutional layers not unlike the ones we've already used.  The first three layers will be stride 2, and then the last 2 will be stride 1.  We'll increase the number of outputs exponentially until the last layer which will have a single channel as output.

Unlike a typical GAN, what we're creating is what the Pix2Pix and CycleGAN authors call a PatchGAN discriminator.  This network doesn't actually reduce down the image to a single value, but instead will reduce down the 256 x 256 pixel image to a spatial map with 1 channel as output.  The resulting map effectively has individual discriminators which we average together to get the final result.  The authors show a few possible combinations of stride and layer sizes to get effectively different receptive field sizes in the final layer, and show that this combination of 5 layers seems to have the best performance and a receptive field size of 70.

Let's break it down a bit more and see how they come up with a receptive field size of 70:

<img src="receptive-field-sizes.png" alt="Receptive Field Sizes" style="height: 250px;"/>

In the image above, we can see the input layer at the top, and the final layer at the bottom.  Working form the final layer back to the top, we can see how 1 neuron contributes to an increasing number of neurons in preceding layers.  The receptive field for each layer for a single neuron in the last layer is written in the right margin: [1, 4, 7, 16, 34, 70]

The code for the discriminator looks like so:

In [8]:
def discriminator(x, n_filters=64, k_size=4, activation_fn=lrelu,
        normalizer_fn=instance_norm, scope=None, reuse=None):
    with tf.variable_scope(scope or 'discriminator', reuse=reuse):
        h = tfl.conv2d(
                inputs=x,
                num_outputs=n_filters,
                kernel_size=k_size,
                stride=2,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                activation_fn=activation_fn,
                normalizer_fn=None,
                scope='1',
                reuse=reuse)
        h = tfl.conv2d(
                inputs=h,
                num_outputs=n_filters * 2,
                kernel_size=k_size,
                stride=2,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                activation_fn=activation_fn,
                normalizer_fn=normalizer_fn,
                scope='2',
                reuse=reuse)
        h = tfl.conv2d(
                inputs=h,
                num_outputs=n_filters * 4,
                kernel_size=k_size,
                stride=2,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                activation_fn=activation_fn,
                normalizer_fn=normalizer_fn,
                scope='3',
                reuse=reuse)
        h = tfl.conv2d(
                inputs=h,
                num_outputs=n_filters * 8,
                kernel_size=k_size,
                stride=1,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                activation_fn=activation_fn,
                normalizer_fn=normalizer_fn,
                scope='4',
                reuse=reuse)
        h = tfl.conv2d(
                inputs=h,
                num_outputs=1,
                kernel_size=k_size,
                stride=1,
                weights_initializer=tf.truncated_normal_initializer(stddev=0.02),
                biases_initializer=None,
                activation_fn=tf.nn.sigmoid,
                scope='5',
                reuse=reuse)
        return h

# Connecting the Pieces

Now we've got all the major components we need to create a CycleGAN.  We just need to connect them up using a few placeholders, create our loss functions, and finally build our training method.  Let's start with our placeholders.

We'll start with placeholders for each of the two collections which I'll call `X` and `Y`:

In [9]:
img_size = 256
X_real = tf.placeholder(name='X', shape=[1, img_size, img_size, 3], dtype=tf.float32)
Y_real = tf.placeholder(name='Y', shape=[1, img_size, img_size, 3], dtype=tf.float32)

To get the "fake" outputs of these "real" inputs, we give them to a corresponding generator.  We'll have one generator for each direction that we'd like to go in.  One which converts the X style to a Y style, and vice-versa.

In [10]:
X_fake = generator(Y_real, scope='G_yx')
Y_fake = generator(X_real, scope='G_xy')

Because this is a CycleGAN, we'll enforce an additional constraint on the generated output to match the original image quality with an L1-Loss.  This will effectively test both generators by generating from X to Y and then back to X again. Similarly, for Y, we'll generate to X, and again to Y.  To get these images, we simple reuse the existing generators and create the cycle images:

In [11]:
X_cycle = generator(Y_fake, scope='G_yx', reuse=True)
Y_cycle = generator(X_fake, scope='G_xy', reuse=True)

Our discriminators will then act on the `real` and `fake` images like so:

In [12]:
D_X_real = discriminator(X_real, scope='D_X')
D_Y_real = discriminator(Y_real, scope='D_Y')
D_X_fake = discriminator(X_fake, scope='D_X', reuse=True)
D_Y_fake = discriminator(Y_fake, scope='D_Y', reuse=True)

To create our generator's loss, we'll compute the L1 distance between the `cycle` and `real` images, and test how well the generator "fools" the discriminator:

In [13]:
l1 = 10.0
loss_cycle = tf.reduce_mean(l1 * tf.abs(X_real - X_cycle)) + \
             tf.reduce_mean(l1 * tf.abs(Y_real - Y_cycle))
loss_G_xy = tf.reduce_mean(tf.square(D_Y_fake - 1.0)) + loss_cycle
loss_G_yx = tf.reduce_mean(tf.square(D_X_fake - 1.0)) + loss_cycle

The authors suggest to use a constant weighting on the L1 cycle loss of 10.0.

Finally, we'll need to compute the loss for our discriminators.  Unlike the generators which use the current generation of fake images, we'll actually use a history buffer of generated images, and randomly sample a generated image from this history buffer.  Previous work on GANs has shown this can help training and the CycleGAN authors suggest using it as well.  We'll take care of keeping track of this history buffer on the CPU side of things and create a placeholder for the TensorFlow graph to help send the history image into the graph:

In [14]:
X_fake_sample = tf.placeholder(name='X_fake_sample',
        shape=[None, img_size, img_size, 3], dtype=tf.float32)
Y_fake_sample = tf.placeholder(name='Y_fake_sample',
        shape=[None, img_size, img_size, 3], dtype=tf.float32)

Now we'll ask the discriminator to assess these images:

In [15]:
D_X_fake_sample = discriminator(X_fake_sample, scope='D_X', reuse=True)
D_Y_fake_sample = discriminator(Y_fake_sample, scope='D_Y', reuse=True)

And now we can create our loss for the discriminator.  Unlike the original GAN implementation, we use a square loss instead of binary cross entropy loss.  This turns out to be a bit less prone to errors:

In [16]:
loss_D_Y = (tf.reduce_mean(tf.square(D_Y_real - 1.0)) + \
            tf.reduce_mean(tf.square(D_Y_fake_sample))) / 2.0
loss_D_X = (tf.reduce_mean(tf.square(D_X_real - 1.0)) + \
            tf.reduce_mean(tf.square(D_X_fake_sample))) / 2.0

# Training

Let's now take a look at how to train such a model.  I've wrapped everything we've just done into a convenient module called `cycle_gan`.  We can create the entire network like so:

In [17]:
tf.reset_default_graph()
from cadl.cycle_gan import cycle_gan
net = cycle_gan(img_size=img_size)

This will return the entire network in a dict for us:

In [18]:
list(net.items())

[('loss_cycle_Y', <tf.Tensor 'mul_1:0' shape=() dtype=float32>),
 ('X_real', <tf.Tensor 'X:0' shape=(1, 256, 256, 3) dtype=float32>),
 ('l1', 10.0),
 ('G_yx_vars',
  [<tf.Variable 'G_yx/encoder/1/Conv/weights:0' shape=(7, 7, 3, 32) dtype=float32_ref>,
   <tf.Variable 'G_yx/encoder/1/instance_norm/scale:0' shape=(32,) dtype=float32_ref>,
   <tf.Variable 'G_yx/encoder/1/instance_norm/offset:0' shape=(32,) dtype=float32_ref>,
   <tf.Variable 'G_yx/encoder/2/Conv/weights:0' shape=(3, 3, 32, 64) dtype=float32_ref>,
   <tf.Variable 'G_yx/encoder/2/instance_norm/scale:0' shape=(64,) dtype=float32_ref>,
   <tf.Variable 'G_yx/encoder/2/instance_norm/offset:0' shape=(64,) dtype=float32_ref>,
   <tf.Variable 'G_yx/encoder/3/Conv/weights:0' shape=(3, 3, 64, 128) dtype=float32_ref>,
   <tf.Variable 'G_yx/encoder/3/instance_norm/scale:0' shape=(128,) dtype=float32_ref>,
   <tf.Variable 'G_yx/encoder/3/instance_norm/offset:0' shape=(128,) dtype=float32_ref>,
   <tf.Variable 'G_yx/block_0/residual/1/C

Just like in the original GAN implementation, we'll create individual optimizers which can only update certain parts of the network.  The original GAN had two optimizers, one for the generator and one for the discriminator.  Even though the discriminator depends on input from the generator, we would only optimize the variables belonging to the discriminator when training the discriminator.  If we did not do this, we'd be making the generator *worse*, when what we really want to happen is for both networks to get better.  We'll do the same thing here, except now we actually have 4 networks to optimize, and so we'll need 4 optimizers: `G_xy`, `G_yx`, `D_X`, and `D_Y`, and they should update only their respective parts of the computational graph, `G_xy_vars`, `G_yx_vars`, `D_X_vars`, and `D_Y_vars`.

First let's get the variables:

In [19]:
training_vars = tf.trainable_variables()
D_X_vars = [v for v in training_vars if v.name.startswith('D_X')]
D_Y_vars = [v for v in training_vars if v.name.startswith('D_Y')]
G_xy_vars = [v for v in training_vars if v.name.startswith('G_xy')]
G_yx_vars = [v for v in training_vars if v.name.startswith('G_yx')]

And then build the optimizers:

In [20]:
learning_rate = 0.001
D_X = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(
        net['loss_D_X'], var_list=D_X_vars)
D_Y = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(
        net['loss_D_Y'], var_list=D_Y_vars)
G_xy = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(
        net['loss_G_xy'], var_list=G_xy_vars)
G_yx = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(
        net['loss_G_yx'], var_list=G_yx_vars)

As part of the discriminator training, we test how it classifies real images and generated images.  For the generated images, the discriminator takes a randomly generated image from the last 50 some generated images.  This is to make the training a bit more stable, according to: Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2016). Learning from Simulated and Unsupervised Images through Adversarial Training. Retrieved from http://arxiv.org/abs/1612.07828 - see Section 2.3 for details.  The idea here is the discriminator should still be able to say that older generated images are fake.  It may be the case that the generator just re-learns things the discriminator has forgotten about, and this might help with making things more stable.

To set this up, we determine our `capacity`, such as 50 images, and create a list of images all initialized to 0:

In [21]:
# How many fake generations to keep around
capacity = 50

# Storage for fake generations
fake_Xs = capacity * [np.zeros((1, img_size, img_size, 3), dtype=np.float32)]
fake_Ys = capacity * [np.zeros((1, img_size, img_size, 3), dtype=np.float32)]

# Examples


## Terrain Generation

A classic image generation problem is turning labeled images into real life ones.  For instance, we may have a labeled image of a street scene depicting roads, sidewalks, lakes, etc... and want it to fill in all our low poly labels into a rich high resolution texture.  We can explore this process using Google Maps for instance, since it provides high quality textures of streets and satellite imagery.  The resulting CycleGAN looks like this:

<img src="terrain-generation.png" alt="Terrain Generation" style="height: 150px;"/>

## Character/Hero Generation

We experimented with an image collection of Pokemon and another image collection of 8-bit low resolution heros.  By applying the encoder model on the low resolution collection, we expect to get high resolution, Pokemon-esque images.  For the reverse process, we expect to turn our Pokemon images into 8-bit renderings.  Of course this is simple enough to do without Deep Learning, so the first process is the more interesting one.  Interestingly, the colors tend to completely change.  Though, there are likely some extensions to the loss that we could add to help enforce that colors stay more constant, and these have been explored in the literature.

<img src="character-generation.png" alt="Character Generation" style="height: 250px;"/>

# Train Your Own

Can take a few hours

`cycle_gan.train`

# Pre-Trained Models

# Conclusion

Was this useful or did you make something awesome with CycleGAN?  Share it with me at https://twitter.com/pkmital - I'd love to hear!

Also, if you are interested in learning more about these networks and related techniques, including Seq2Seq, DRAW, MDN, WaveNet, and plenty more at https://www.kadenze.com/programs/creative-applications-of-deep-learning-with-tensorflow