# Converting photos to paintings using CycleGAN

We just learned how CycleGAN works and how it converts images from one domain to another without any paired datasets. Now, we will learn how to implement CycleGAN in TensorFlow. We will see how to convert the pictures to paintings using CycleGAN as shown:

![image](images/1.png)

The dataset used in this section can be downloaded from here https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/monet2photo.zip. Once you downloaded the dataset, unzip the archive and place it in data folder. The unziped archive will consist of four folders trainA, trainB, testA, and testB with training and testing images. 


The folder trainA consists of paintings (Monet) and the folder trainB consists of photos. Since we are mapping photos (x) to paintings (y), folder trainB which consists of photos will be our source image and the folder trainA which consists of paintings will be our target. 

## Import the required libraries

import the required libraries:

In [1]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)

import matplotlib.pyplot as plt
%matplotlib inline

import scipy.misc as misc
from PIL import Image
import os

## Defining Convolution


Define the convolution operation for the discriminator:

In [2]:
def conv(inputs, nums_out, kernel_size, stride, padding, is_dis=False):
    c = int(inputs.shape[-1])

    weight = tf.get_variable("weight", shape=[kernel_size, kernel_size, c, nums_out], initializer=tf.random_normal_initializer(stddev=0.02))
    bias = tf.get_variable("bias", shape=[nums_out], initializer=tf.constant_initializer([0]))
    
    if is_dis:
        return tf.nn.conv2d(inputs, spectral_norm("SN",weight), [1, stride, stride, 1], padding) + bias
    else:
        return tf.nn.conv2d(inputs, weight, [1, stride, stride, 1], padding) + bias    

## Defining Deconvolution


Define the deconvolutional operation for the generator:

In [3]:
def deconv(inputs, nums_out, kernel_size, stride):
    c = int(inputs.shape[-1])
    batch = int(inputs.shape[0])
    height = int(inputs.shape[1])
    width = int(inputs.shape[2])
    
    weight = tf.get_variable("weight", shape=[kernel_size, kernel_size, nums_out, c], initializer=tf.random_normal_initializer(stddev=0.02))
    bias = tf.get_variable("bias", shape=[nums_out], initializer=tf.constant_initializer([0.]))
    
    return tf.nn.conv2d_transpose(inputs, weight, output_shape=[batch, height*stride, width*stride, nums_out], strides=[1, stride, stride, 1]) + bias

## Instance Normalization


We know that in batch normalization we normalize images across the batches whereas in instance normalization we normalize each batch independently:

In [4]:
epsilon = 1e-8
def InstanceNorm(inputs):
    mean, var = tf.nn.moments(inputs, axes=[1, 2], keep_dims=True)
    scale = tf.get_variable("scale", shape=mean.shape[-1], initializer=tf.constant_initializer([1.]))
    shift = tf.get_variable("shift", shape=mean.shape[-1], initializer=tf.constant_initializer([0.]))
    return (inputs - mean) * scale / tf.sqrt(var + epsilon) + shift

## Sepctral Normalization



We apply spectral normalization on the convolutional layers. The spectral norm is the maximum singular value of a matrix. So, we regularize the weights of the convolutional layer with the largest singular value of weights in that layer.


How do we do that?  Miyato et al introduce a method called the power iteration to estimate the spectral norm of each layer. In the power iteration method, we compute the L2 distance between the linear combination of the vector u and the convolutional weights. 


Let us say we have a random vector $v$ in the domain of our matrix and another vector $u$ in the codomain. Now, we apply the power iteration method:


$\hat{u} = \frac{W^Tu}{|| W^T u||}$


$\hat{v} = \frac{W^Tv}{|| W^T v||}$



The final weights can be calcuated as:


$\sigma = (\hat{v} W) \hat{u}^T$


$W = \frac{W}{\sigma(W)}$

In [5]:
def spectral_norm(name, w, iteration=1):
  
    w_shape = w.shape.as_list()
    w = tf.reshape(w, [-1, w_shape[-1]])
  
    with tf.variable_scope(name, reuse=False):
        u = tf.get_variable("u", [1, w_shape[-1]], initializer=tf.truncated_normal_initializer(), trainable=False)
    u_hat = u
    v_hat = None

    
    def l2_norm(v, eps=1e-12):
        return v / (tf.reduce_sum(v ** 2) ** 0.5 + eps)

    for i in range(iteration):
        v_ = tf.matmul(u_hat, tf.transpose(w))
        v_hat = l2_norm(v_)
        u_ = tf.matmul(v_hat, w)
        u_hat = l2_norm(u_)
    sigma = tf.matmul(tf.matmul(v_hat, w), tf.transpose(u_hat))
    w_norm = w / sigma
    
    with tf.control_dependencies([u.assign(u_hat)]):
        w_norm = tf.reshape(w_norm, w_shape)
    
    return w_norm

## Defining Activation function


Define the Leaky ReLU activation function:

In [6]:
def leaky_relu(inputs, slope=0.2):
    return tf.maximum(slope*inputs, inputs)

## Defining Discriminator

Discriminator consists of four convolutional layer followed by a fully connected layer:

In [7]:
class discriminator:
    def __init__(self, name):
        self.name = name

    def __call__(self, inputs, reuse = False):
        #discriminator
        inputs = tf.random_crop(inputs, [batchsize, 70, 70, 3])
        with tf.variable_scope(self.name, reuse=reuse):
            with tf.variable_scope("c64"):
                inputs = leaky_relu(conv(inputs, 64, 5, 2, "SAME", True))
            with tf.variable_scope("c128"):
                inputs = leaky_relu(InstanceNorm(conv(inputs, 128, 5, 2, "SAME", True)))
            with tf.variable_scope("c256"):
                inputs = leaky_relu(InstanceNorm(conv(inputs, 256, 5, 2, "SAME", True)))
            with tf.variable_scope("c512"):
                inputs = leaky_relu(InstanceNorm(conv(inputs, 512, 5, 2, "SAME", True)))
            with tf.variable_scope("fully_conv"):
                kernel_size = np.size(inputs, 1)
                inputs = tf.squeeze(conv(inputs, 1, kernel_size, 1, "VALID", True), axis=[1, 2, 3])
        return inputs

    @property
    def var(self):
        return tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)

## Defining Generator

Generator consists of deconvolutional layers:

In [8]:
class generator:
    def __init__(self, name):
        self.name = name

    def __call__(self, inputs, reuse=False):
        with tf.variable_scope(name_or_scope=self.name, reuse=reuse):
            inputs = tf.pad(inputs, tf.constant([[0, 0], [3, 3], [3, 3], [0, 0]]))
            with tf.variable_scope("c7s1-32"):
                inputs = tf.nn.relu(InstanceNorm(conv(inputs, 32, 7, 1, "VALID")))
            with tf.variable_scope("d64"):
                inputs = tf.nn.relu(InstanceNorm(conv(inputs, 64, 3, 2, "SAME")))
            with tf.variable_scope("d128"):
                inputs = tf.nn.relu(InstanceNorm(conv(inputs, 128, 3, 2, "SAME")))
            for i in range(6):
                with tf.variable_scope("R"+str(i)):
                    temp = inputs
                    with tf.variable_scope("R_conv1"):
                        inputs = tf.pad(inputs, tf.constant([[0, 0], [1, 1], [1, 1], [0, 0]]), "REFLECT")
                        inputs = tf.nn.relu(InstanceNorm(conv(inputs, 128, 3, 1, "VALID")))
                    with tf.variable_scope("R_conv2"):
                        inputs = tf.pad(inputs, tf.constant([[0, 0], [1, 1], [1, 1], [0, 0]]), "REFLECT")
                        inputs = InstanceNorm(conv(inputs, 128, 3, 1, "VALID"))
                    inputs = temp + inputs
            with tf.variable_scope("u64"):
                inputs = tf.nn.relu(InstanceNorm(deconv(inputs, 64, 3, 2)))
            with tf.variable_scope("u32"):
                inputs = tf.nn.relu(InstanceNorm(deconv(inputs, 32, 3, 2)))
            with tf.variable_scope("c7s1-3"):
                inputs = tf.nn.tanh((deconv(inputs, 3, 7, 1)))
            return (inputs + 1.) * 127.5

    @property
    def var(self):
        return tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)


Define image width, height, number of images and batch size

In [9]:
image_width = 128
image_height = 128


num_images = 8000
batchsize = 1

## Cycle GAN 

We will see how to implement cycle gan. Check the comment on each line of code for better understanding:

In [11]:
class CycleGAN:
    
    def __init__(self):
        self.X = tf.placeholder("float", shape=[batchsize, image_height, image_width, 3])
        self.Y = tf.placeholder("float", shape=[batchsize, image_height, image_width, 3])
        
        #maps x to y 
        G = generator("G")
        
        #maps y to x
        F = generator("F")
        
        #discriminate between real source image and fake source image
        self.Dx = discriminator("Dx")
        
        #discriminate between real target image and fake target image
        self.Dy = discriminator("Dy")
        
        #fake source image
        self.fake_X = F(self.Y)
        
        #fake target image
        self.fake_Y = G(self.X)
        
        #real source image logits
        self.Dx_logits_real = self.Dx(self.X)
        
        #fake source image logits
        self.Dy_logits_real = self.Dy(self.Y)
        
        #real target image logits
        self.Dx_logits_fake = self.Dx(self.fake_X, True)
        
        #fake target image logits
        self.Dy_logits_fake = self.Dy(self.fake_Y, True)
        
        #cycle consistency loss
        self.cycle_loss = tf.reduce_mean(tf.abs(F(self.fake_Y, True) - self.X)) + \
                        tf.reduce_mean(tf.abs(G(self.fake_X, True) - self.Y))
      
        
        #discriminator loss (refer equations in the chapter)
        self.Dy_loss = -tf.reduce_mean(self.Dy_logits_real) + tf.reduce_mean(self.Dy_logits_fake)
        self.Dx_loss = -tf.reduce_mean(self.Dx_logits_real) + tf.reduce_mean(self.Dx_logits_fake)
        
        
        #generator loss
        self.G_loss = -tf.reduce_mean(self.Dy_logits_fake) + 10. * self.cycle_loss
        
        #discriminator loss
        self.F_loss = -tf.reduce_mean(self.Dx_logits_fake) + 10. * self.cycle_loss
        
        #optimize discriminator
        self.Dx_optimizer = tf.train.AdamOptimizer(2e-4, beta1=0., beta2=0.9).minimize(self.Dx_loss, var_list=[self.Dx.var])
        self.Dy_optimizer = tf.train.AdamOptimizer(2e-4, beta1=0., beta2=0.9).minimize(self.Dy_loss, var_list=[self.Dy.var])
        
        #optimize generator
        self.G_optimizer = tf.train.AdamOptimizer(2e-4, beta1=0., beta2=0.9).minimize(self.G_loss, var_list=[G.var])
        self.F_optimizer = tf.train.AdamOptimizer(2e-4, beta1=0., beta2=0.9).minimize(self.F_loss, var_list=[F.var])

        self.sess = tf.Session()
        self.sess.run(tf.global_variables_initializer())
        
        self.train()


    def train(self):
        
        
        #real Images (source images)
        X_path = 'data/monet2photo/trainB//'
        
        #paintings (target images)
        Y_path = 'data/monet2photo/trainA//'
               
        Y = os.listdir(Y_path)[:num_images]
        X = os.listdir(X_path)[:num_images]
        
        #we use only 100 images
        X = X[1:100]
        Y = Y[1:100]
        
        nums = Y.__len__()
        
        saver = tf.train.Saver()
        
        #create a folder for saving the fake paintings and images generated by the generator
        if not os.path.exists('fake_painiting'):
            os.makedirs('fake_painiting')

        if not os.path.exists('fake_real_image'):
            os.makedirs('fake_real_image') 
            
        print 'started training...'
        
        
        #for each epoch
        for epoch in range(100000):
            
            #for every batch
            for i in range(int(nums / batchsize) - 1):
                
                #select batch of images
                X_img = np.zeros([batchsize, image_height, image_width, 3])
                Y_img = np.zeros([batchsize, image_height, image_width, 3])
                
                for j in np.arange(i * batchsize, i * batchsize + batchsize, 1):
                    
                    #resize the source image
                    img = misc.imresize(np.array(Image.open(X_path + X[j])), [image_height, image_width])
                    X_img[j - i * batchsize, :, :, :] = img
                    
                    #resize the target image
                    img = misc.imresize(np.array(Image.open(Y_path + Y[j])), [image_height, image_width])
                    Y_img[j - i * batchsize, :, :, :] = img
                
                #train the discriminator
                self.sess.run(self.Dy_optimizer, feed_dict={self.X: X_img, self.Y: Y_img})
                self.sess.run(self.Dx_optimizer, feed_dict={self.X: X_img, self.Y: Y_img})
                
                #train the generator
                self.sess.run(self.G_optimizer, feed_dict={self.X: X_img, self.Y: Y_img})
                self.sess.run(self.F_optimizer, feed_dict={self.X: X_img, self.Y: Y_img})
                
                #compute loss and generate new image on every 50th epochs
                if i % 50 == 0:
                    
                    #compute loss
                    [Dx_loss, Dy_loss, G_loss, F_loss, fake_X, fake_Y, cyc_loss] = \
                    self.sess.run([self.Dx_loss, self.Dy_loss, self.G_loss, self.F_loss, \
                       self.fake_X, self.fake_Y, self.cycle_loss], feed_dict={self.X: X_img, self.Y: Y_img})
                    
                    print("Epoch: {}, iteration: {}, Dx Loss: {}, Dy Loss: {}, G Loss: {}, F Loss: {}, Cycle Loss: {}". \
                          format(epoch, i, Dx_loss, Dy_loss, G_loss, F_loss, cyc_loss))

                    print('\n')
                    
                    #store the generated images
                    Image.fromarray(np.uint8(fake_Y)[0, :, :, :]).save(".//fake_painiting//"+str(epoch)+"_"+str(i)+".jpg")
                    Image.fromarray(np.uint8(fake_X)[0, :, :, :]).save(".//fake_real_image//" + str(epoch) + "_" + str(i) + ".jpg")

## Start Training 

In [None]:
cyc = CycleGAN()

started training...
Epoch: 0, iteration: 0, Dx Loss: -4.33659744263, Dy Loss: -2.71015405655, G Loss: 1475.7109375, F Loss: 1474.48071289, Cycle Loss: 147.429473877


Epoch: 0, iteration: 50, Dx Loss: 8.18372154236, Dy Loss: -11.3952150345, G Loss: 669.335083008, F Loss: 627.104492188, Cycle Loss: 64.9336242676




Credits for the code used in this section goes to [MingtaoGuo](https://github.com/MingtaoGuo/CycleGAN).


Applications of CycleGAN are endless since it can be use to translate images from one domain to another without the paired dataset. In the next section, we will learn how StackGAN is used for converting the text descriptions to photo realistic images.