# AIM
<div style = "text-align: justify"><b>WARNING : This notebook assumes that you are familiar with the basic concepts of GANs and only focuses on the implementation of CycleGAN.</b> The authors of the CycleGAN paper claimed that their model can "paint" photos in Monet style. They have also shown great results backing their claim. <b>We will implement the CycleGAN architecture as mentioned in the paper itself (that is, use ResNet as generator in place of U-net).</b></div>

# Paper can be found [here](https://arxiv.org/pdf/1703.10593.pdf)
# Dataset has been taken from [Kaggle Competition](https://www.kaggle.com/c/gan-getting-started)

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import tensorflow_addons as tfa
from tensorflow import keras
import tensorflow as tf

# Performance

In [None]:
img = cv2.imread('../input/another-image/Capture.PNG')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.figure(figsize = (20,20))
plt.imshow(img)
plt.axis('off')

# Model
## Definition
<div style = "text-align: justify">CycleGAN was first introduced in August, 2020. This was model could perform image-to-image translation without paired data. <b>Paired data means having a source and corresponding target image.</b> First we need to know what is Neural style transfer. <b>Neural style transfer is an optimization technique that takes in two sets of images - source image and style reference image (usually a painting), and draws the source image in the fashion of style reference image.</b> And that is what we aim to do as well.</div>

## Methodology
<div style = "text-align: justify">CycleGAN uses the same adverasrial loss (minimax between Discriminator and Generator), used in vanilla GANs. However, as you might have guessed this is not enough. <b>Strictly speaking in terms of Neural style transfer, what we want is to retain the basic structure of source image, but style it in terms of Monet paintings. However, what the model will try to do is completely convert the image to Monet painting.</b></div>

#### *Confused !! Look at the example below,*

In [None]:
eg = ['../input/examples/poppies.jpg', '../input/examples/hith-eiffel-tower-istock_000016468972large-2.jpg']

img1 = cv2.cvtColor(cv2.resize(cv2.imread(eg[0]), (256,256)), cv2.COLOR_BGR2RGB)
img2 = cv2.cvtColor(cv2.resize(cv2.imread(eg[1]), (256,256)), cv2.COLOR_BGR2RGB)
plt.figure(figsize = (20,30))
plt.subplot(1,2,1)
plt.imshow(img2)
plt.axis('off')
plt.title('Eiffel tower Photo')

plt.subplot(1,2,2)
plt.imshow(img1)
plt.axis('off')
plt.title('Poppy field Painting by Claude Monet')

<div style = "text-align: justify">Let's say, your image in X domain is the image of <b>Eiffel tower</b> and the corresponding image in Y domain is <b>Poppy Field painting.</b> If you only consider the adversarial loss to minimize, then the Generator will aim to make the two images indistinguishable, and you will end up with the painting of poppy field. <b>However, what we want is a painting of Eiffel tower painted by Monet.</b> I hope you understand the difference.</div>

## Cyclic Consistency
<div style = "text-align: justify">To handle the above issue, we add another loss function called cyclic consistency. <b>What we do is generate an image using the Generator G and then feed the generated image to another Generator F whose aim is to regenerate the original image.</b> In continuation with the above example, G takes in the photo of Eiffel tower and outputs the Poppy Field painting. Then this painting is fed back to F which tries to output original photo of Eiffel tower. <b>What is the loss in all this ?</b> Now, if we only focus on G, there can be an additional L1 loss between the regenerated photo of Eiffel tower and the original photograph <b>(which will surely not be the same before training.)</b> <b>How does that help ?</b> This will prevent G from blindly "transforming the image completely" and with enough training only the style of the target image will incorporated to the source image. (<b>Painting of Eiffel tower by Monet</b>)</div>

In [None]:
img = cv2.cvtColor(cv2.imread('../input/loss-diagram/Capture.PNG'), cv2.COLOR_BGR2RGB)
plt.figure(figsize = (30,30))
plt.imshow(img)
plt.axis('off')

<div style = "text-align: justify">The generators G and F are same as explained above. The discriminator <b>Dy</b> aims to distinguish between actual painting y and generated painting G(x). The discriminator <b>Dx</b> aims to distinguish between actual photo x and regenerated photo F(y). As you can see in the 2nd and 3rd pictures, there is cycle-consistency loss that calcultes L1 distance between original image and regenerated image.</div>

# Create Dataset

In [None]:
import os
from PIL import Image

In [None]:
paths = ['../input/gan-getting-started/photo_jpg/',
         '../input/gan-getting-started/monet_jpg/']

In [None]:
from keras.preprocessing.image import img_to_array

In [None]:
def get_img (path) :
    
    x = []
    for img_path in os.listdir(path) :
        x.append(img_to_array(Image.fromarray(cv2.cvtColor(cv2.imread(
            os.path.join(path,img_path)), cv2.COLOR_BGR2RGB))))
        
    x = np.array(x)
    x/= 255.0
    return x

In [None]:
X = get_img(paths[0])
y = get_img(paths[1])

# Dataset overview and Visualization

In [None]:
print(X.shape[1:])
print(y.shape[1:])

In [None]:
print(X.min())
print(X.max())

In [None]:
print(y.min())
print(y.max())

In [None]:
plt.figure(figsize = (10,50))

i = 0

while i < 16 :
    
    plt.subplot(8,2,i+1)
    plt.imshow(X[i])
    plt.axis('off')
    plt.title('Photo images')
    
    plt.subplot(8,2,i+2)
    plt.imshow(y[i])
    plt.axis('off')
    plt.title('Monet images')
    
    i += 2

# Train validation split

In [None]:
photo_images = X[:300]
monet_images = y

In [None]:
print(photo_images.shape)
print(monet_images.shape)

<div style = "text-align: justify">What matters in only the training data, because all of the photo images will be converted to Monet style paintings during testing phase. Now we must convert the train set to <b>tf.dataset</b>, with a buffer size of 1000 and batch size of 1 (Stochastic gradient descent applied)</div>

In [None]:
train_dataset = tf.data.Dataset.from_tensor_slices((photo_images, monet_images))
train_dataset = train_dataset.shuffle(1000).batch(1)

In [None]:
print(f'Number of training samples : {len(train_dataset)}.')

In [None]:
plt.figure(figsize = (10,50))

for n, (photo, monet) in train_dataset.enumerate() :
    plt.subplot(1,2,1)
    plt.imshow(photo[0])
    plt.title('Photo Image')
    plt.axis('off')
    
    plt.subplot(1,2,2)
    plt.imshow(monet[0])
    plt.title('Monet Image')
    plt.axis('off')
    
    if n == 10 :
        break
plt.show()

# Model Architecture
## Discriminator (PatchGAN)
<br/>

![img](https://miro.medium.com/max/1050/1*46CddTc5JwkFW_pQb4nGZQ.png)

<div style = "text-align: justify">The discriminator is a PatchGAN model, where each output in the arrays refers to a 70x70 overlapping patch in the input image. The model uses the principle of effectice receptive field, where a certain number of pixels in the input image or a patch can be mapped to a single output in the array. The output values are between 0 and 1, and this tells the probability that a given patch in the input image is real or fake. The output for all cells can be averaged to get the probability for the entire image. <b>Which patch is represented by which output value ?</b> This can be learnt by backtracking from the output image to the input image, tracing back the receptive fields.</div>

In [None]:
from keras.models import Model
from keras.layers import Input
from keras.layers import Conv2D
from keras.layers import Dropout
from keras.layers import LeakyReLU
from keras.layers import Activation
from keras.layers import Concatenate
from keras.layers import ZeroPadding2D
from keras.layers import Conv2DTranspose
from keras.initializers import RandomNormal

In [None]:
init = RandomNormal(0., 0.02)

In [None]:
def dis () :
    
    src  = Input((256, 256, 3,))
    
    conv = Conv2D(2**6, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False) (src)
    leak = LeakyReLU(0.2)(conv)
    
    conv = Conv2D(2**7, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(leak)
    norm = tfa.layers.InstanceNormalization(gamma_initializer = init)(conv)
    leak = LeakyReLU(0.2)(norm)
    
    conv = Conv2D(2**8, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(leak)
    norm = tfa.layers.InstanceNormalization(gamma_initializer = init)(conv)
    leak = LeakyReLU(0.2)(norm)
    
    zero = ZeroPadding2D()(leak)
    
    conv = Conv2D(2**9, (4,4), strides = 1, padding = 'valid',kernel_initializer = init, use_bias = False)(zero)
    norm = tfa.layers.InstanceNormalization(gamma_initializer = init)(conv)
    leak = LeakyReLU(0.2)(norm)
    
    zero = ZeroPadding2D()(leak)
    
    conv = Conv2D(2**0, (4,4), strides = 1, padding = 'valid',kernel_initializer = init, use_bias = False)(zero)
    
    return Model(inputs = src, outputs = conv)

In [None]:
Dx = dis()
Dy = dis()

In [None]:
keras.utils.plot_model(Dx, './dis.png', show_shapes = True, dpi = 64)

# Generator (U-net)

<br/>

![img](https://miro.medium.com/max/1050/1*lvXoKMHoPJMKpKK7keZMEA.png)

<div style = "text-align: justify">We will be using U-net architecture for the Generator. <b>In the paper, the authors used a residual network, but it did not work for me.</b> The U-net architecture consists of two parts - Encoder and Decoder. In the Encoder, the image is downsampled till the bottleneck layer to a size of 1x1 and then, in the Decoder part, it is upsampled from bottleneck layer to the output layer. The main idea behind this model, is that <b>in the encoder path, the model learns what features are in the image and in the decoder part, it learns where these features are in the image.</b></div>

In [None]:
def gen () :
    
    src = Input((256, 256, 3,))
    
    conv_064_0 = Conv2D(2**6, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(src)
    leak_064_0 = LeakyReLU(0.2)(conv_064_0)
    
    conv_128_0 = Conv2D(2**7, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(leak_064_0)
    norm_128_0 = tfa.layers.InstanceNormalization(gamma_initializer = init)(conv_128_0)
    leak_128_0 = LeakyReLU(0.2)(norm_128_0)
    
    conv_256_0 = Conv2D(2**8, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(leak_128_0)
    norm_256_0 = tfa.layers.InstanceNormalization(gamma_initializer = init)(conv_256_0)
    leak_256_0 = LeakyReLU(0.2)(norm_256_0)
    
    conv_512_0 = Conv2D(2**9, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(leak_256_0)
    norm_512_0 = tfa.layers.InstanceNormalization(gamma_initializer = init)(conv_512_0)
    leak_512_0 = LeakyReLU(0.2)(norm_512_0)
    
    conv_512_1 = Conv2D(2**9, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(leak_512_0)
    norm_512_1 = tfa.layers.InstanceNormalization(gamma_initializer = init)(conv_512_1)
    leak_512_1 = LeakyReLU(0.2)(norm_512_1)
    
    conv_512_2 = Conv2D(2**9, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(leak_512_1)
    norm_512_2 = tfa.layers.InstanceNormalization(gamma_initializer = init)(conv_512_2)
    leak_512_2 = LeakyReLU(0.2)(norm_512_2)
    
    conv_512_3 = Conv2D(2**9, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(leak_512_2)
    norm_512_3 = tfa.layers.InstanceNormalization(gamma_initializer = init)(conv_512_3)
    leak_512_3 = LeakyReLU(0.2)(norm_512_3)
    
    conv_512_4 = Conv2D(2**9, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(leak_512_3)
    norm_512_4 = tfa.layers.InstanceNormalization(gamma_initializer = init)(conv_512_4)
    leak_512_4 = LeakyReLU(0.2)(norm_512_4)
    
    
    
    tran_512_3 = Conv2DTranspose(2**9, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(leak_512_4)
    norm_512_3 = tfa.layers.InstanceNormalization(gamma_initializer = init)(tran_512_3)
    drop_512_3 = Dropout(0.5)(norm_512_3)
    relu_512_3 = Activation('relu')(drop_512_3)
    conc_512_3 = Concatenate()([relu_512_3, leak_512_3])
    
    tran_512_2 = Conv2DTranspose(2**9, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(conc_512_3)
    norm_512_2 = tfa.layers.InstanceNormalization(gamma_initializer = init)(tran_512_2)
    drop_512_2 = Dropout(0.5)(norm_512_2)
    relu_512_2 = Activation('relu')(drop_512_2)
    conc_512_2 = Concatenate()([relu_512_2, leak_512_2])
    
    tran_512_1 = Conv2DTranspose(2**9, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(conc_512_2)
    norm_512_1 = tfa.layers.InstanceNormalization(gamma_initializer = init)(tran_512_1)
    drop_512_1 = Dropout(0.5)(norm_512_1)
    relu_512_1 = Activation('relu')(drop_512_1)
    conc_512_1 = Concatenate()([relu_512_1, leak_512_1])
    
    tran_512_0 = Conv2DTranspose(2**9, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(conc_512_1)
    norm_512_0 = tfa.layers.InstanceNormalization(gamma_initializer = init)(tran_512_0)
    relu_512_0 = Activation('relu')(norm_512_0)
    conc_512_0 = Concatenate()([relu_512_0, leak_512_0])
    
    tran_256_0 = Conv2DTranspose(2**8, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(conc_512_0)
    norm_256_0 = tfa.layers.InstanceNormalization(gamma_initializer = init)(tran_256_0)
    relu_256_0 = Activation('relu')(norm_256_0)
    conc_256_0 = Concatenate()([relu_256_0, leak_256_0])
    
    tran_128_0 = Conv2DTranspose(2**7, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(conc_256_0)
    norm_128_0 = tfa.layers.InstanceNormalization(gamma_initializer = init)(tran_128_0)
    relu_128_0 = Activation('relu')(norm_128_0)
    conc_128_0 = Concatenate()([relu_128_0, leak_128_0])
    
    tran_064_0 = Conv2DTranspose(2**6, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(conc_128_0)
    norm_064_0 = tfa.layers.InstanceNormalization(gamma_initializer = init)(tran_064_0)
    relu_064_0 = Activation('relu')(norm_064_0)
    conc_064_0 = Concatenate()([relu_064_0, leak_064_0])
    
    conv_003_0 = Conv2DTranspose(3**1, (4,4), strides = 2, padding = 'same', kernel_initializer = init, use_bias = False)(conc_064_0)
    
    return Model(inputs = src, outputs = conv_003_0)

In [None]:
Gg = gen()
Gf = gen()

In [None]:
keras.utils.plot_model(Gg, './gen.png', show_shapes = True, dpi = 64)

# Loss functions

In [None]:
bin_entropy = keras.losses.BinaryCrossentropy(from_logits = True)

In [None]:
LAMBDA = 10

'''
Generator loss
'''
def g_loss (dis_output) :
    return bin_entropy(np.ones(dis_output.shape), dis_output)

'''
Discriminator loss
'''
def d_loss (dis_output_real, dis_output_fake) :
    
    real_loss = bin_entropy(np.ones(dis_output_real.shape), dis_output_real)
    fake_loss = bin_entropy(np.zeros(dis_output_fake.shape),dis_output_fake)
    total_loss = real_loss + fake_loss
    
    return total_loss * 0.5

'''
Cycle Consistency loss
'''
def c_loss (original_image,regenerated_image) :
    return keras.losses.mean_absolute_error(original_image,regenerated_image) * LAMBDA

'''
Identity loss
'''
def i_loss (target_image,output_target_image) :
    return keras.losses.mean_absolute_error(target_image,output_target_image) * LAMBDA * 0.5

# Model Optimizers

In [None]:
gen_g_optimizer = keras.optimizers.Adam(learning_rate = 0.0002, beta_1 = 0.5)
gen_f_optimizer = keras.optimizers.Adam(learning_rate = 0.0002, beta_1 = 0.5)
dis_y_optimizer = keras.optimizers.Adam(learning_rate = 0.0002, beta_1 = 0.5)
dis_x_optimizer = keras.optimizers.Adam(learning_rate = 0.0002, beta_1 = 0.5)

# Training function
#### *For the sake of understanding, we named the images as 'horse' and 'zebras'*

In [None]:
@tf.function
def train_batch (src_horse_image, src_zebra_image) :
    
    with tf.GradientTape(persistent = True) as tape :
        
        # horse->zebra->horse (Gg and Dy)
        gen_zebra_image = Gg(src_horse_image, training = True)
        gen_horse_image = Gf(gen_zebra_image, training = True)
        
        sam_zebra_image = Gg(src_zebra_image, training = True)
        
        real_output_Dy  = Dy(src_zebra_image, training = True)
        fake_output_Dy  = Dy(gen_zebra_image, training = True)
        
        
        gen_g_loss = g_loss(fake_output_Dy)
        cyc_g_loss = c_loss(src_horse_image, gen_horse_image)
        idn_g_loss = i_loss(src_zebra_image, sam_zebra_image)
        dis_y_loss = d_loss(real_output_Dy , fake_output_Dy )
        
        # zebra->horse->zebra (Gf and Dx)
        gen_horse_image = Gf(src_zebra_image, training = True)
        gen_zebra_image = Gg(gen_horse_image, training = True)
        
        sam_horse_image = Gf(src_horse_image, training = True)
        
        real_output_Dx  = Dx(src_horse_image, training = True)
        fake_output_Dx  = Dx(gen_horse_image, training = True)
        
        
        gen_f_loss = g_loss(fake_output_Dx)
        cyc_f_loss = c_loss(src_zebra_image, gen_zebra_image)
        idn_f_loss = i_loss(src_horse_image, sam_horse_image)
        dis_x_loss = d_loss(real_output_Dx , fake_output_Dx )
        
        total_gen_g_loss = gen_g_loss + idn_g_loss + (cyc_g_loss + cyc_f_loss)
        total_gen_f_loss = gen_f_loss + idn_f_loss + (cyc_g_loss + cyc_f_loss)
    
    gen_g_grad = tape.gradient(total_gen_g_loss, Gg.trainable_variables)
    gen_f_grad = tape.gradient(total_gen_f_loss, Gf.trainable_variables)
    dis_y_grad = tape.gradient(dis_y_loss, Dy.trainable_variables)
    dis_x_grad = tape.gradient(dis_x_loss, Dx.trainable_variables)
    
    gen_g_optimizer.apply_gradients(zip(gen_g_grad, Gg.trainable_variables))
    gen_f_optimizer.apply_gradients(zip(gen_f_grad, Gf.trainable_variables))
    dis_y_optimizer.apply_gradients(zip(dis_y_grad, Dy.trainable_variables))
    dis_x_optimizer.apply_gradients(zip(dis_x_grad, Dx.trainable_variables))

# Custom fit() function

In [None]:
def fig_plot (sam_photo, gen_image, sam_monet) :
    
    plt.figure(figsize = (20,50))
    
    plt.subplot(1,3,1)
    plt.imshow(sam_photo[0])
    plt.title('Photo Image')
    plt.axis('off')
    
    plt.subplot(1,3,2)
    plt.imshow(gen_image[0])
    plt.title('GenerateImg')
    plt.axis('off')
    
    plt.subplot(1,3,3)
    plt.imshow(sam_monet[0])
    plt.title('Monet Image')
    plt.axis('off')
    plt.show()
    
def fit (EPOCHS) :
    
    for epoch in range(EPOCHS) :
        
        print('[',end='')
        for n, (photo, monet) in train_dataset.enumerate() :
            if (n+1)%10== 0 :
                print('#',end='')
            if (n+1) == 300 :
                print(']',end='')
            train_batch(photo, monet)
        print()
        
        for sam_photo , sam_monet in train_dataset.take(1) :
            gen_image = Gg(sam_photo, training = True)
            fig_plot(sam_photo, gen_image , sam_monet)

In [None]:
fit(40)

# Validation

In [None]:
plt.figure(figsize = (10,50))

i = 0
while i < 16 :
    
    x = np.random.randint(0,7038)
    
    plt.subplot(8,2,i+1)
    plt.imshow(X[x])
    plt.axis('off')
    plt.title('Photo')
    
    monet = Gg(np.reshape(X[x], (1, 256, 256, 3)))
    plt.subplot(8,2,i+2)
    plt.imshow(monet[0])
    plt.axis('off')
    plt.title('Monet')
    
    i += 2

In [None]:
keras.models.save_model(Gg, './Gen_g.h5')
keras.models.save_model(Gf, './Gen_f.h5')
keras.models.save_model(Dy, './Dis_y.h5')
keras.models.save_model(Dx, './Dis_x.h5')

# Submission

In [None]:
! mkdir ../images

In [None]:
i = 1
for img in X:
    
    prediction = Gg(np.reshape(img, (1,256,256,3)), training=False)[0].numpy()
    prediction = (prediction * 127.5 + 127.5).astype(np.uint8)
    
    im = Image.fromarray(prediction)
    im.save("../images/" + str(i) + ".jpg")
    i += 1

In [None]:
import shutil
shutil.make_archive("/kaggle/working/images", 'zip', "/kaggle/images")

### *You can use further augmentation to improve the results, but this was not the aim of my notebook. It was meant only for learning purposes.*