# CycleGAN, Monet-ization of Photohgraphs

This notebook is going to describe how one can use a CycleGAN to transform images from a domain $X$ to another domain $Y$. In the case of this notebook, we are going to be working on transforming real life photos into paintings in the style of [Claude Monet][1].

A Generative Adverserial Network (GAN) made up of two subnetworks, a generative ($G$) and a discriminative network ($D_x$). The task of the generative network is to transform a photo ($X$) into a painting ($Y$). The generative process can be described simply as $G(X) -> Y$, where $G$ is a generative network that has learnt the mapping from domain $X$ to domain $Y$. While training, the two networks are constantly playing a game of cat and mouse. The generative network is trying to generate photos that can pass as paintings while the discriminative network is trying to detect which paintings are real and which painting have been generated by the generative network. 


What stands out from CycleGANs compared to regular GANs is that a CycleGAN introduces two additional networks, an additional generative network ($F$) and an additional discriminator ($D_y$ that can identify real photographs from generated photographs. The task of generator $F$ is to find the mapping $Y -> X$. The adverserial loss for generator $F$ is trained in the same way as that of generator $G$. The two additional networks create a cyclic connection where the mapping $F(G(X)) = X$ and $G(F(Y)) = Y$. The reasoning for the cyclic mapping is to make sure that no two inputs $X$ can map to the same $Y$, which is a problem called mode collapse. To train the cyclic mapping of the networks, another type of loss called cyclic loss has to be introduced.

I would recommend you to read the [original paper][2] for a more correct and in-depth explanation of the model

[1]: https://en.wikipedia.org/wiki/Claude_Monet
[2]: https://arxiv.org/pdf/1703.10593.pdf


## Load Kaggle Data

Locating and Loading data from the Kaggle dataset 'gan-getting-started'. We need images of real photos and images of monet paintings for the CycleGAN. 

The datasets are saved as BatchData with batch size 1 and image size 256x256.

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
from functools import partial

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from kaggle_datasets import KaggleDatasets

import tensorflow_datasets as tfds

GCS_PATH = KaggleDatasets().get_gcs_path('gan-getting-started')

FILENAMES_MONET = tf.io.gfile.glob(GCS_PATH + "/monet_tfrec/" + "*.tfrec")
FILENAMES_REAL = tf.io.gfile.glob(GCS_PATH + "/photo_tfrec/" + "*.tfrec")
#FILENAMES_MONET_TEST = tf.io.gfile.glob('../input/gan-getting-started/monet_tfrec/' + "*.tfrec")
#FILENAMES_REAL_TEST = tf.io.gfile.glob('../input/gan-getting-started/photo_tfrec/' + "*.tfrec")

print("Train TFRecord Files:", len(FILENAMES_MONET))
print("Train TFRecord Files:", len(FILENAMES_REAL))

In [None]:
AUTOTUNE = tf.data.AUTOTUNE
BATCH_SIZE = 1

IMAGE_SIZE = [256,256]

def decode_image(image,IMAGE_SIZE=256):
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.cast(image, tf.float32)
    #image = tf.reshape(image, [*IMAGE_SIZE, 3])
    return image

def normalize_img(img):
    img = tf.cast(img, dtype=tf.float32)
    return (img / 127.5) - 1.0



def read_tfrecord(example, labeled=False):
    tfrecord_format = (
        {
            "image": tf.io.FixedLenFeature([], tf.string)
        }
    )
    example = tf.io.parse_single_example(example, tfrecord_format)
    image = decode_image(example["image"])
    image = normalize_img(image)

    return image


def load_dataset(filenames, labeled=False):
    ignore_order = tf.data.Options()
    ignore_order.experimental_deterministic = False  # disable order, increase speed
    dataset = tf.data.TFRecordDataset(
        filenames
    )  
    dataset = dataset.with_options(
        ignore_order
    )  
    dataset = dataset.map(
        partial(read_tfrecord, labeled=labeled), num_parallel_calls=AUTOTUNE
    )
    return dataset

def get_dataset(filenames, labeled=True):
    dataset = load_dataset(filenames, labeled=labeled)
    #dataset = dataset.shuffle(2048)
    dataset = dataset.prefetch(buffer_size=AUTOTUNE)
    dataset = dataset.batch(BATCH_SIZE)
    return dataset

def data():
    return get_dataset(FILENAMES_REAL),get_dataset(FILENAMES_MONET)

## Augmenting Data

Data augmentation has been proven to improve performance of neural networks for images in order to prevent overfitting. The augmentation methods used for this problem is random mirroring along with resizing and random cropping.

Source for data Augmentation:
https://colab.research.google.com/github/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/CycleGAN.ipynb

In [None]:
from keras.preprocessing.image import ImageDataGenerator
import imgaug.augmenters as iaa

class DataAugmenter:
    def __init__(self):
        data_generator = ImageDataGenerator(brightness_range=[0.8,1.2])
    
    def normalize_img(self,img):
        img = tf.cast(img, dtype=tf.float32)
        return (img / 127.5) - 1.0

    def random_crop(self,image):
        cropped_image = tf.image.random_crop(
        image, size=[1,IMAGE_SIZE[0],IMAGE_SIZE[1], 3])

        return cropped_image
    
    """def augment_color(self):
        aug = iaa.Sequential([
            iaa.MultiplyHue((0.1,9.9)),
            iaa.MultiplyBrightness(mul=(0.1,9.9)),
            iaa.LogContrast(gain=(0.1,9.9))
        ])
        return aug"""
    
    def aug_color(self,image):
        image = tf.image.random_brightness(image, 0.2)
        image = tf.image.random_contrast(image, 0.8,1.2)
        return tf.image.random_saturation(image, 0.8,1.2)

    def augment(self,image):
        # resizing to 300 x 300 x 3
        image = tf.image.resize(image, [300, 300],
                              method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
        image = self.random_crop(image)

        image = tf.image.random_flip_left_right(image)
        #image = self.aug_color(image)
        return image

### Sample some photos
The code snippet below samples random images of real photographs and monet paintings respectively.

In [None]:
train_real, train_monet = data()
_, ax = plt.subplots(2, 2, figsize=(15, 15))
for i, samples in enumerate(zip(train_real.take(2), train_monet.take(2))):
    real = (((samples[0][0] * 127.5) + 127.5).numpy()).astype(np.uint8)
    monet = (((samples[1][0] * 127.5) + 127.5).numpy()).astype(np.uint8)
    ax[i, 0].imshow(monet)
    ax[i, 0].set_title("Monet Sample")
    ax[i, 1].imshow(real)
    ax[i, 1].set_title("Real Photo Sample")

plt.show()


## Network Architecture

We follow the architectures used in the original paper, [Unpaired Image-to-Image Translation
using Cycle-Consistent Adversarial Networks][1] 

[1]: https://arxiv.org/pdf/1703.10593.pdf.

CycleGANs require two network architectures, one for the generator and one for the discriminator. 

### Discriminative Network Architecture
Discriminator: C64->C128->C256->C512 as described in the paper. Ck is a convolutional layer uses LeakyReLU as activation along with an InstanceNormalization layer where k is the amount of filters in the convolution.




In [None]:
from tensorflow import keras
from tensorflow.keras.layers import Input, LeakyReLU, Conv2D
from tensorflow_addons.layers import InstanceNormalization
from tensorflow.keras.initializers import RandomNormal

from tensorflow.keras.models import Model

#  C64->C128->C256->C512
class Discriminator:
    def __init__(self,padding ='valid',strides=(2,2),kernel=(4,4),initializer = RandomNormal(mean=0.,stddev=0.02),alpha=0.2):
        img_inp = Input(shape = (256, 256, 3))
        conv_1 = Conv2D(64,kernel,strides=2,use_bias=False,kernel_initializer=initializer,padding=padding)(img_inp)
        act_1 = LeakyReLU(alpha)(conv_1)
    
        conv_2 = Conv2D(128,kernel,strides=strides,use_bias=False,kernel_initializer=initializer,padding=padding)(act_1)
        
        batch_norm_2 = InstanceNormalization(gamma_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02))(conv_2)
        act_2 = LeakyReLU(alpha)(batch_norm_2)
    
        conv_3 = Conv2D(256,kernel,strides=strides,use_bias=False,kernel_initializer=initializer,padding=padding)(act_2)
        batch_norm_3 = InstanceNormalization(gamma_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02))(conv_3)
        act_3 = LeakyReLU(alpha)(batch_norm_3)
    
        conv_4 = Conv2D(512,kernel,strides=(1,1),use_bias=False,kernel_initializer=initializer)(act_3)
        batch_norm_4 = InstanceNormalization(gamma_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02))(conv_4)
        act_4 = LeakyReLU(alpha)(batch_norm_4)
    
        #zero_pad_1 = ZeroPadding2D()(act_4)
        outputs = Conv2D(1,kernel,strides=1,use_bias=False,kernel_initializer=initializer)(act_4)
    
        self.model = Model(img_inp, outputs)


### Generative Network Architecture
The generative network follows a different structure to the disicriminative network. The network has a few downsampling layers, followed by residual skip blocks ending with upsampling layers.

Note: `c7s1-k` means a convolutional layer where `kernel_size=7`, `strides=1` and `filters=k`

`dk` is a downsampling convolutional layer with `strides=2`, `kernel_size=3` and `filters=k` 

`uk` is an upsampling layer with `strides=1/2`, `kernel_size=2` and `filters=k`. (`Conv2dTranspose` with `strides=(2,2)`)

`Rk` is a residual block with `filters=k`, `strides=1` and `kernel_size=3`.

**Downsampling layers:** `c7s1-64->d128->d256`

**Residual blocks:** 
`R256->R256->R256->R256->R256->R256->R256->R256->R256`

**Upsampling layers:** `u128->u64->c7s1-3`

**Residual skip connections** are used useful for several reasons, one of those being to avoid the problem of vanishing gradients. Another problem that is solved by skip connections is the degradation problem in deep neural networks. Instead of learning the mapping between input and output, the network learns the residual/difference between input and the output function. [Here is a link][1] to a nice video explaining the concept and advantages of residual block.

Pixels on the border of an image are convolved less frequently than pixels more to the center of an image and will therefore not be preserved very well by the network. To combat this, we introduce **reflection padding** where the images get an additional layer added on top of the borders.

Sources for some of the code:
https://keras.io/examples/generative/cyclegan/
https://theailearner.com/tag/patchgan/
https://machinelearningmastery.com/how-to-develop-cyclegan-models-from-scratch-with-keras/


[1]:https://www.youtube.com/watch?v=rya-1nX8ktc&t=521s&ab_channel=TheCodingLib

In [None]:
### https://stackoverflow.com/questions/50677544/reflection-padding-conv2d
from tensorflow.keras.layers import Layer, InputSpec

class ReflectionPadding2D(Layer):
    def __init__(self, padding=(1, 1), **kwargs):
        self.padding = tuple(padding)
        self.input_spec = [InputSpec(ndim=3)]
        super(ReflectionPadding2D, self).__init__(**kwargs)

    def get_output_shape_for(self, s):
        """ If you are using "channels_last" configuration"""
        return (s[0], s[1] + 2 * self.padding[0], s[2] + 2 * self.padding[1], s[3])

    def call(self, x, mask=None):
        w_pad,h_pad = self.padding
        return tf.pad(x, [[0,0], [h_pad,h_pad], [w_pad,w_pad], [0,0] ], 'REFLECT')



In [None]:
from keras.layers import Reshape, Dense, Input, ReLU, Conv2D, Conv2DTranspose, Concatenate, ReLU, ZeroPadding2D
from tensorflow_addons.layers import InstanceNormalization
class Generator:
    def __init__(self,k=64,n_res=8):
        img_inp = Input(shape = (256, 256, 3))
        c7s164 = ReflectionPadding2D(padding = (3,3))(img_inp)
        c7s164 = Conv2D(64,(7,7),(1,1),kernel_initializer=tf.random_normal_initializer(0., 0.02))(c7s164)
        #c7s164 = InstanceNormalization(gamma_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02))(c7s164)
        c7s164 = ReLU()(c7s164)

        d128 = Conv2D(128,(3,3),(2,2),kernel_initializer=tf.random_normal_initializer(0., 0.02),padding="same")(c7s164)
        d128 = InstanceNormalization(gamma_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02))(d128)
        d128 = ReLU()(d128)

        d256 = Conv2D(256,(3,3),(2,2),kernel_initializer=tf.random_normal_initializer(0., 0.02),padding="same")(d128)
        d256 = InstanceNormalization(gamma_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02))(d256)
        d256 = ReLU()(d256)


        # RESIDUAL BLCOKS

        curr = d256
        res = d256
        k=256
        for _ in range(n_res):
            res = ReflectionPadding2D()(res)
            res = Conv2D(k,(3,3),kernel_initializer=tf.random_normal_initializer(0., 0.02))(res)
            res = InstanceNormalization(gamma_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02))(res)
            res = ReLU()(res)

            res = ReflectionPadding2D()(res)
            res = Conv2D(k,(3,3),kernel_initializer=tf.random_normal_initializer(0., 0.02))(res)
            res = InstanceNormalization(gamma_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02))(res)
            res = Concatenate()([res,curr])
            curr = res
            

        u128 = Conv2DTranspose(128,(3,3),(2,2),kernel_initializer=tf.random_normal_initializer(0., 0.02),padding="same")(res)
        u128 = InstanceNormalization(gamma_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02))(u128)
        u128 = ReLU()(u128)

        u64 = Conv2DTranspose(64,(3,3),(2,2),kernel_initializer=tf.random_normal_initializer(0., 0.02),padding="same")(u128)
        u64 = InstanceNormalization(gamma_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.02))(u64)
        u64 = ReLU()(u64)

        c7s13 = ReflectionPadding2D(padding=(3,3))(u64)
        c7s13 = Conv2D(3,(7,7),activation='tanh')(c7s13)

        self.model = Model(img_inp, c7s13)


## Training and Loss Function

Training can take some time, use a GPU accelerator.

In [None]:
import tensorflow as tf

device_name = tf.test.gpu_device_name()
if "GPU" not in device_name:
    print("GPU device not found")
print('Found GPU at: {}'.format(device_name))

### Loss Functions
There are three types of losses for the generator, **adversarial**-, **identity** and **cyclic consistensy loss**.

**Adverserial loss** forces generated images to be as indistinguishable from Monet paintings as possible. The adversarial loss can be described as a least squares loss which should be minimized according to $\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[(D(G(x))-1)^{2}\right]$

To prevent **Mode Collapse** from domain $X -> Y$, we add a cyclic transformation. Both transformations $F : X->Y$ and $G: Y->X$ must be satisfied. The transformation loss, known as **cycle consistency loss** is added onto the adverserial loss in training so that $G(F(x))≈x$ and $F(G(x))≈x$. Cycle consistency loss is minimized through $\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[\|F(G(x))-x\|\right]$ for $G$ and $\mathbb{E}_{y \sim p_{\text {data }}(y)}\left[\|G(F(y))-y\|\right]$ for $F$.

Lastly is the **identity loss**. Identity loss is useful for color composition preservation when mapping between input and output. Identity loss for G should be minimized through $\left.\mathbb{E}_{y \sim p_{\text {data }}(y)}[\| G(y)-y \|\right]$.

There is only one relevant loss function for the discriminators and that is an **adversarial loss**. For the mapping $G: X->Y$, the discriminator $D_y$ minimizes $\mathbb{E}_{y \sim p_{\text {data }}(y)}\left[\log D_{Y}(y)\right]$ $+$ $\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[\log \left(1-D_{Y}(G(x))\right]\right.$




In [None]:
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.losses import MeanAbsoluteError, MeanSquaredError,BinaryCrossentropy
import tensorflow as tf
from tensorflow.keras import backend as K

class CycleGAN(Model):

    def __init__(self,shape=((256, 256, 3))):#,batch):
        super(CycleGAN,self).__init__()
        x = Input(shape=shape)
        y = Input(shape=shape)
        self.cycle_weight = 10
        self.identity_weight = 0.5
        self.augmenter = DataAugmenter()
        
        super(CycleGAN,self).compile()
        self.genG = Generator().model
        self.genF = Generator().model
        self.discX = Discriminator().model
        self.discY = Discriminator().model

        self.genG_optimizer = tf.keras.optimizers.Adam(learning_rate=2e-4,beta_1=0.5)
        self.genF_optimizer = tf.keras.optimizers.Adam(learning_rate=2e-4,beta_1=0.5)
        self.discX_optimizer = tf.keras.optimizers.Adam(learning_rate=2e-4,beta_1=0.5)
        self.discY_optimizer = tf.keras.optimizers.Adam(learning_rate=2e-4,beta_1=0.5)

        self.cycle_loss = MeanAbsoluteError()
        self.identity_loss = MeanAbsoluteError()
        self.gen_loss = BinaryCrossentropy(from_logits=True)
        self.disc_loss = BinaryCrossentropy(from_logits=True)

    @tf.function
    def train_step(self,data_batch):
        x,y = data_batch
        x = self.augmenter.augment(x)
        y = self.augmenter.augment(y)
        with tf.GradientTape(persistent=True) as tape:
            gen_y = self.genG(x, training=True)
            gen_x = self.genF(y, training=True)
            recon_x = self.genF(gen_y, training=True)
            recon_y = self.genG(gen_x, training=True)

            # Identity
            identity_x = self.genF(x, training=True)
            identity_y = self.genG(y, training=True)

            # disc
            predict_x = self.discX(x, training=True)
            predict_gen_x = self.discX(gen_x, training=True)

            predict_y = self.discY(y, training=True)
            predict_gen_y = self.discY(gen_y, training=True)

            G_identity_loss =  self.identity_loss(y,identity_y)* self.identity_weight * self.cycle_weight
            F_identity_loss = self.identity_loss(x, identity_x)* self.identity_weight * self.cycle_weight

            G_cycle_loss = self.cycle_loss(x, recon_x)* self.cycle_weight
            F_cycle_loss = self.cycle_loss(y, recon_y)* self.cycle_weight

            G_gen_loss = self.gen_loss(tf.ones_like(predict_gen_y),predict_gen_y)
            F_gen_loss = self.gen_loss(tf.ones_like(predict_gen_x),predict_gen_x,)

            Y_disc_loss = self.disc_loss(tf.ones_like(predict_y),predict_y)/2 + self.disc_loss(tf.zeros_like(predict_gen_y),predict_gen_y)/2
            X_disc_loss = self.disc_loss(tf.ones_like(predict_x),predict_x)/2 + self.disc_loss(tf.zeros_like(predict_gen_x),predict_gen_x)/2
            G_total_loss = G_cycle_loss+G_identity_loss+G_gen_loss
            F_total_loss = F_cycle_loss+F_identity_loss+F_gen_loss
    
        gradsG = tape.gradient(G_total_loss, self.genG.trainable_variables)
        gradsF = tape.gradient(F_total_loss, self.genF.trainable_variables)

        discX_grads = tape.gradient(X_disc_loss, self.discX.trainable_variables)
        discY_grads = tape.gradient(Y_disc_loss, self.discY.trainable_variables)

        self.genG_optimizer.apply_gradients(
            zip(gradsG, self.genG.trainable_variables)
        )
        self.genF_optimizer.apply_gradients(
            zip(gradsF, self.genF.trainable_variables)
        )

        # Update the weights of the discriminators
        self.discX_optimizer.apply_gradients(
            zip(discX_grads, self.discX.trainable_variables)
        )
        self.discY_optimizer.apply_gradients(
            zip(discY_grads, self.discY.trainable_variables)
        )
        

        return {
            "G_loss": G_cycle_loss,
            "F_loss": F_total_loss,
            "D_X_loss": X_disc_loss,
            "D_Y_loss": Y_disc_loss,
        }

### Keeping track of the training using Callbacks

We can monitor the evolution of the networks generative prowess by implementing our own keras Callback object. The ```on_epoch_begin(self,epoch)``` function is called before every single epoch of training.

In [None]:
class ShowProgressCallback(keras.callbacks.Callback):
    def __init__(self):
        super(keras.callbacks.Callback,self).__init__()
        self.photo = zip(train_real.take(1))
        for i, image in enumerate(self.photo):
            self.image=image
    
    def generated_monet(self,epoch):
        f, axarr = plt.subplots(1,2,figsize=(10,10))
        generated_image = cgan.genG(self.image)
        real = (((self.image[0][0] * 127.5) + 127.5).numpy()).astype(np.uint8)
        gen = (((generated_image[0] * 127.5) + 127.5).numpy()).astype(np.uint8)
        axarr[0].imshow(gen)
        axarr[1].imshow(real)
        axarr[1].set_title("epoch: " + str(epoch) + ', Real photo' )
        axarr[0].set_title("epoch: " + str(epoch) + ', Generated Painting' )
        plt.show()    
        
    def generated_real(self,epoch,images_to_show = 2):
        f, axarr = plt.subplots(images_to_show,2,figsize=(10,10))
        for i, image in enumerate(zip(train_monet.take(images_to_show))):
            generated_image = cgan.genF(image)
            real = (((image[0][0] * 127.5) + 127.5).numpy()).astype(np.uint8)
            monet = (((generated_image[0] * 127.5) + 127.5).numpy()).astype(np.uint8)
            axarr[i,1].imshow(monet)
            axarr[i,0].imshow(real)
        plt.show()
    
    def on_epoch_begin(self, epoch, logs=None):
        if epoch%5 ==0:
            print("Epoch:", epoch)
            #self.generated_real(epoch)
            self.generated_monet(epoch)
            

I'm loading an already trained network from a previous session, set ```new_model = True``` if you want to train a new model from scratch.

In [None]:
# TRAINING
new_model = True
cgan = CycleGAN()
epochs=25

cgan.built=True
cgan.load_weights('../input/cgan-model/cycleGAN_BCE_250.h5')

if new_model:
    cgan.fit(tf.data.Dataset.zip((train_real, train_monet)),epochs=epochs, callbacks=[ShowProgressCallback()],verbose=1)
    cgan.save_weights('cycleGAN.h5')
else:
    cgan.built=True
    cgan.load_weights('../input/cgan-model/cycleGAN_BCE_250.h5')


**Some generated samples**

In [None]:
images_to_show = 10
for i, image in enumerate(zip(train_real.take(images_to_show))):
    f, axarr = plt.subplots(1,2,figsize=(15,15))
    gen = cgan.genG(image)
    real = (((image[0][0] * 127.5) + 127.5).numpy()).astype(np.uint8)
    gen = (((gen[0] * 127.5) + 127.5).numpy()).astype(np.uint8)

    axarr[1].imshow(real)
    axarr[1].set_title('Real Photohraph')
    axarr[0].imshow(gen)
    axarr[0].set_title('Generated Painting')
    plt.show()


### Predicting and Saving

In [None]:
from PIL import Image
os.makedirs('./images')
for i,element, in enumerate(train_real.as_numpy_iterator()):
    if i % 200 == 0:
        print(i)
    generated_image = cgan.genG(element)
    gen = (((generated_image[0] * 127.5) + 127.5).numpy()).astype(np.uint8)
    im = Image.fromarray(gen)
    im.save( './images/' + str(i) +".jpg")


In [None]:
import shutil
shutil.make_archive("/kaggle/working/images", 'zip', "./images")

Thanks for reading!\
Youssef Taoudi

Author links:\
[Github][2]\
[LinkedIn][1]

[1]:https://www.linkedin.com/in/youssef-taoudi-4ba43b128/
[2]:https://github.com/Taoudi