# Image-to-Image Translation with GANs
One popular application of GANs that has seen a lot of attention recently is Image-to-Image translation: transforming input images in such a way that they are not distinguishable from an image belonging to a class of target images. Depending on the specific task, two different GANs are used which will be introduced in the following. If the training data contains corresponding pairs of images belonging to the source and target distribution, so if the mapping $  x_i\rightarrow y_i $ is known for all elements $ x_i\in X $ from the source distribution and all elements $ y_i\in Y $ from the target distribution, a conditional GAN [1] can be used. If no such mapping is known or if it doesn't exist, then a cycle-consistent GAN [2] can be used for these unpaired training data. 

<img src="figures_tim/un_paired.png",width=400> |  
- | - 
Paired and unpaired training data. Taken from [2] | 

## Paired training data: conditional GAN pix2pix

<img src="figures_tim/pix2pix_example.png",width=600> |
- | - 
Example applications of a cGAN. Taken from [1] |

In a conditional GAN, both the generator and the discriminator see the input data. Since a conditional dependence between the individual output pixels is assumed, whole connected structures that differ between output and target can be penalized. The objective of a conditional GAN can be expressed as $$\mathcal{L}_{cGAN}(D,G) = \mathbb{E}_{x,y}[\log D(x,y)]+\mathbb{E}_{x,z}[1-\log D(x,G(x,z))].$$

The generator $G$ tries to minimize this objective against an adversarial $D$ which is trying to maximize it, so $$G^* = \text{argmin}_G\text{max}_D\mathcal{L}_{cGAN}(G,D).$$ It was found to be beneficial to also add a conventional loss function to the GAN's objective, so that the generator not only has to fool the discrimator but also needs to stay close to the ground truth. Usually L1 distance is used as the traditional loss function since L2 distance is more prone to produce blurry outputs. The final objective is therefore $$G^* = \text{argmin}_G\text{max}_D\mathcal{L}_{cGAN}(G,D) + \lambda\mathcal{L}_{L1}(G)$$ with some parameter $\lambda$ and where $$\mathcal{L}_{L1}(G)=\mathbb{E}_{x,y,z}[||y-G(x,z)||_1].$$

The defining feature of the pix2pix realisation of a conditional GAN is the network structure. Previous realisations mostly used Encoder-Decoder structures like in the picture below, where input is gradually downsampled until a bottleneck and then upsampled again until the final output.

<img src="figures_tim/encoder_decoder.png",width=600> |
- | - 
Usual Encoder-Decoder strucure. Taken from [3] |

In this setup all of the input's information has to pass through the complete network. But in Image-to-Image translation tasks, input and output share many low-level information that would benefit from being able to be passed through the net immediately. For example in colorization tasks, the object's shape in input and output has to stay the same. For this reason the authors implemented skip connections between the individual layers that allows information to bypass certain levels, as shown in the picture below.

<img src="figures_tim/unet.png",width=600> |
- | - 
Special U-Net strucure for pix2pix and cycleGan. Taken from [3] |


## Unpaired training data: cycle-consistent GAN cycleGAN

<img src="figures_tim/cyclegan_example.jpg",width=600> |
- | - 
Example applications for a cycle-consistent GAN. Taken from [2] |

In the case of unpaired training data, e. g. style transfer, the GAN has no ground truth to compare against. Therefore, a so-called cycle-consistency loss is introduced, which tries to minimize the difference between the original input data and its reconstruction when applying the Generator's inverse function to generated data - for both directions of the network. It is written as $$\mathcal{L}_{cyc}=\mathbb{E}_x[||F(G(x))-x||_1]+\mathbb{E}_y[||G(F(y))-y||_1].$$ The GAN loss for both directions is again written as $$\mathcal{L}_{GAN}(G,D_Y,X,Y) = \mathbb{E}_y[\log D_Y(y)]+\mathbb{E}_{x}[1-\log D_Y(G(x))]$$

The total loss function reads 
$$\mathcal{L}(G,F,D_X,D_Y)=\mathcal{L}_{GAN}(G,D_Y,X,Y)+\mathcal{L}_{GAN}(F,D_X,Y,X)+\lambda\mathcal{L}_{cyc}(G,F)$$
with some control parameter $\lambda$ and our goal is to solve $$G^*, F^* = \text{argmin}_{G,F}\text{max}_{D_X,D,_Y}\mathcal{L}(G,F,D_X,D_Y)$$

<img src="figures_tim/cyclegan_scheme.png",width=600> | <img src="figures_tim/cyclegan_reconstruction.png",width=300>
- | - 
  | cycleGAN scheme and image reconstruction. Taken from [2]

## Example: Colorizing black and white images with pix2pix
We want to use the pix2pix implementation of the conditional GAN to colorize black and white images. Usually, digital color images are represented with the additive RGB color model where values between 0 and 255 determine the intensity in each of the color channels red, green and blue. A color space which is more suitable for our task is the CIELAB color space, or short Lab color space. Its three channels comprise lightness (L) which is nothing but a greyscale/black and white representation of the image and green-red (a) and blue-yellow (b) components of the image.

<img src="figures_tim/colorspace.eps",width=400> | 
- | - 
  | Different colorspaces. Adapted from [5]

The Lab color space was designed to approximate human vision, since only about 5% of photoreception cells in the human eye are sensitive to color (cones) and the majority of 95% (rods) only detect brigthness. By using this color space, we only need to predict two color channels and can use the sharp greyscale input when calculating the final output. So we need to find a mapping 

<img src="figures_tim/lab_mapping.png",width=800> |
- | - 
 Mapping from greyscale input to color channels. Taken from [5] | 

A publicly available Tensorflow implementation [4] of pix2pix was used to train a cGAN on 9492 images from the Colornet dataset [6]. Training was done on OIST's HPC Saion GPU cluster. The training images comprise a very wide range of different objects, although a majority of them shows humans. Due to the large variety in training data, the GAN only performs well in some cases on the validation data. Most created images end up having a brown shade since brown is closest to most other colors and thus producing the smallest error on average. Some successful and failure cases as well as the generator and discriminator losses extracted from Tensorboard are shown below. 

 | | Loss functions 
- | - | -
<img src="figures_tim/discriminator_loss.png",width=300> | <img src="figures_tim/generator_loss_gan.png",width=300> | <img src="figures_tim/generator_loss_l1.png",width=300>

Input | Output | Target
- | - | -
![Test](figures_tim/colornet1-inputs.png) | ![Test](figures_tim/colornet1-outputs.png) | ![Test](figures_tim/colornet1-targets.png)
![Test](figures_tim/colornet2-inputs.png) | ![Test](figures_tim/colornet2-outputs.png) | ![Test](figures_tim/colornet2-targets.png)
![Test](figures_tim/colornet3-inputs.png) | ![Test](figures_tim/colornet3-outputs.png) | ![Test](figures_tim/colornet3-targets.png)
![Test](figures_tim/colornet4-inputs.png) | ![Test](figures_tim/colornet4-outputs.png) | ![Test](figures_tim/colornet4-targets.png)
![Test](figures_tim/colornet5-inputs.png) | ![Test](figures_tim/colornet5-outputs.png) | ![Test](figures_tim/colornet5-targets.png)
![Test](figures_tim/colornet6-inputs.png) | ![Test](figures_tim/colornet6-outputs.png) | ![Test](figures_tim/colornet6-targets.png)
![Test](figures_tim/colornet7-inputs.png) | ![Test](figures_tim/colornet7-outputs.png) | ![Test](figures_tim/colornet7-targets.png)
![Test](figures_tim/colornet8-inputs.png) | ![Test](figures_tim/colornet8-outputs.png) | ![Test](figures_tim/colornet8-targets.png)
![Test](figures_tim/colornet9-inputs.png) | ![Test](figures_tim/colornet9-outputs.png) | ![Test](figures_tim/colornet9-targets.png)
![Test](figures_tim/colornet10-inputs.png) | ![Test](figures_tim/colornet10-outputs.png) | ![Test](figures_tim/colornet10-targets.png)

In another attempt, the network was also trained on 6000 images of the Linnaeus 5 dataset [7], which contains 1200 images each in 5 categories 'berry','bird','dog','flower' and 'other'. This dataset also included a large amount of 400 validation images in each category which helped together with the similar training data in each category to produce more successful tries. Some of them as well as failure cases and the loss functions are shown below.

 | | Loss functions 
- | - | -
<img src="figures_tim/discriminator_loss_linn.png",width=300> | <img src="figures_tim/generator_loss_gan_linn.png",width=300> | <img src="figures_tim/generator_loss_l1_linn.png",width=300>

Input | Output | Target
- | - | -
![Test](figures_tim/berry11-inputs.png) | ![Test](figures_tim/berry11-outputs.png) | ![Test](figures_tim/berry11-targets.png)
![Test](figures_tim/berry169-inputs.png) | ![Test](figures_tim/berry169-outputs.png) | ![Test](figures_tim/berry169-targets.png)
![Test](figures_tim/berry280-inputs.png) | ![Test](figures_tim/berry280-outputs.png) | ![Test](figures_tim/berry280-targets.png)
![Test](figures_tim/berry286-inputs.png) | ![Test](figures_tim/berry286-outputs.png) | ![Test](figures_tim/berry286-targets.png)
![Test](figures_tim/berry301-inputs.png) | ![Test](figures_tim/berry301-outputs.png) | ![Test](figures_tim/berry301-targets.png)
![Test](figures_tim/berry315-inputs.png) | ![Test](figures_tim/berry315-outputs.png) | ![Test](figures_tim/berry315-targets.png)
![Test](figures_tim/berry335-inputs.png) | ![Test](figures_tim/berry335-outputs.png) | ![Test](figures_tim/berry335-targets.png)
![Test](figures_tim/berry342-inputs.png) | ![Test](figures_tim/berry342-outputs.png) | ![Test](figures_tim/berry342-targets.png)
![Test](figures_tim/berry350-inputs.png) | ![Test](figures_tim/berry350-outputs.png) | ![Test](figures_tim/berry350-targets.png)
![Test](figures_tim/berry374-inputs.png) | ![Test](figures_tim/berry374-outputs.png) | ![Test](figures_tim/berry374-targets.png)
![Test](figures_tim/berry385-inputs.png) | ![Test](figures_tim/berry385-outputs.png) | ![Test](figures_tim/berry385-targets.png)
![Test](figures_tim/bird11-inputs.png) | ![Test](figures_tim/bird11-outputs.png) | ![Test](figures_tim/bird11-targets.png)
![Test](figures_tim/bird95-inputs.png) | ![Test](figures_tim/bird95-outputs.png) | ![Test](figures_tim/bird95-targets.png)
![Test](figures_tim/bird99-inputs.png) | ![Test](figures_tim/bird99-outputs.png) | ![Test](figures_tim/bird99-targets.png)
![Test](figures_tim/bird105-inputs.png) | ![Test](figures_tim/bird105-outputs.png) | ![Test](figures_tim/bird105-targets.png)
![Test](figures_tim/bird110-inputs.png) | ![Test](figures_tim/bird110-outputs.png) | ![Test](figures_tim/bird110-targets.png)
![Test](figures_tim/bird128-inputs.png) | ![Test](figures_tim/bird128-outputs.png) | ![Test](figures_tim/bird128-targets.png)
![Test](figures_tim/bird150-inputs.png) | ![Test](figures_tim/bird150-outputs.png) | ![Test](figures_tim/bird150-targets.png)
![Test](figures_tim/bird211-inputs.png) | ![Test](figures_tim/bird211-outputs.png) | ![Test](figures_tim/bird211-targets.png)
![Test](figures_tim/bird250-inputs.png) | ![Test](figures_tim/bird250-outputs.png) | ![Test](figures_tim/bird250-targets.png)
![Test](figures_tim/bird261-inputs.png) | ![Test](figures_tim/bird261-outputs.png) | ![Test](figures_tim/bird261-targets.png)
![Test](figures_tim/bird394-inputs.png) | ![Test](figures_tim/bird394-outputs.png) | ![Test](figures_tim/bird394-targets.png)
![Test](figures_tim/dog68-inputs.png) | ![Test](figures_tim/dog68-outputs.png) | ![Test](figures_tim/dog68-targets.png)
![Test](figures_tim/dog101-inputs.png) | ![Test](figures_tim/dog101-outputs.png) | ![Test](figures_tim/dog101-targets.png)
![Test](figures_tim/dog118-inputs.png) | ![Test](figures_tim/dog118-outputs.png) | ![Test](figures_tim/dog118-targets.png)
![Test](figures_tim/dog153-inputs.png) | ![Test](figures_tim/dog153-outputs.png) | ![Test](figures_tim/dog153-targets.png)
![Test](figures_tim/dog168-inputs.png) | ![Test](figures_tim/dog168-outputs.png) | ![Test](figures_tim/dog168-targets.png)
![Test](figures_tim/dog181-inputs.png) | ![Test](figures_tim/dog181-outputs.png) | ![Test](figures_tim/dog181-targets.png)
![Test](figures_tim/dog198-inputs.png) | ![Test](figures_tim/dog198-outputs.png) | ![Test](figures_tim/dog198-targets.png)
![Test](figures_tim/dog202-inputs.png) | ![Test](figures_tim/dog202-outputs.png) | ![Test](figures_tim/dog202-targets.png)
![Test](figures_tim/dog243-inputs.png) | ![Test](figures_tim/dog243-outputs.png) | ![Test](figures_tim/dog243-targets.png)
![Test](figures_tim/dog304-inputs.png) | ![Test](figures_tim/dog304-outputs.png) | ![Test](figures_tim/dog304-targets.png)
![Test](figures_tim/dog336-inputs.png) | ![Test](figures_tim/dog336-outputs.png) | ![Test](figures_tim/dog336-targets.png)
![Test](figures_tim/dog387-inputs.png) | ![Test](figures_tim/dog387-outputs.png) | ![Test](figures_tim/dog387-targets.png)
![Test](figures_tim/flower66-inputs.png) | ![Test](figures_tim/flower66-outputs.png) | ![Test](figures_tim/flower66-targets.png)
![Test](figures_tim/flower125-inputs.png) | ![Test](figures_tim/flower125-outputs.png) | ![Test](figures_tim/flower125-targets.png)
![Test](figures_tim/flower133-inputs.png) | ![Test](figures_tim/flower133-outputs.png) | ![Test](figures_tim/flower133-targets.png)
![Test](figures_tim/flower135-inputs.png) | ![Test](figures_tim/flower135-outputs.png) | ![Test](figures_tim/flower135-targets.png)
![Test](figures_tim/flower158-inputs.png) | ![Test](figures_tim/flower158-outputs.png) | ![Test](figures_tim/flower158-targets.png)
![Test](figures_tim/flower163-inputs.png) | ![Test](figures_tim/flower163-outputs.png) | ![Test](figures_tim/flower163-targets.png)
![Test](figures_tim/flower198-inputs.png) | ![Test](figures_tim/flower198-outputs.png) | ![Test](figures_tim/flower198-targets.png)
![Test](figures_tim/flower217-inputs.png) | ![Test](figures_tim/flower217-outputs.png) | ![Test](figures_tim/flower217-targets.png)
![Test](figures_tim/flower223-inputs.png) | ![Test](figures_tim/flower223-outputs.png) | ![Test](figures_tim/flower223-targets.png)
![Test](figures_tim/flower224-inputs.png) | ![Test](figures_tim/flower224-outputs.png) | ![Test](figures_tim/flower224-targets.png)
![Test](figures_tim/flower242-inputs.png) | ![Test](figures_tim/flower242-outputs.png) | ![Test](figures_tim/flower242-targets.png)
![Test](figures_tim/flower261-inputs.png) | ![Test](figures_tim/flower261-outputs.png) | ![Test](figures_tim/flower261-targets.png)
![Test](figures_tim/flower315-inputs.png) | ![Test](figures_tim/flower315-outputs.png) | ![Test](figures_tim/flower315-targets.png)
![Test](figures_tim/flower331-inputs.png) | ![Test](figures_tim/flower331-outputs.png) | ![Test](figures_tim/flower331-targets.png)
![Test](figures_tim/flower360-inputs.png) | ![Test](figures_tim/flower360-outputs.png) | ![Test](figures_tim/flower360-targets.png)
![Test](figures_tim/flower368-inputs.png) | ![Test](figures_tim/flower368-outputs.png) | ![Test](figures_tim/flower368-targets.png)
![Test](figures_tim/flower398-inputs.png) | ![Test](figures_tim/flower398-outputs.png) | ![Test](figures_tim/flower398-targets.png)
![Test](figures_tim/other55-inputs.png) | ![Test](figures_tim/other55-outputs.png) | ![Test](figures_tim/other55-targets.png)
![Test](figures_tim/other93-inputs.png) | ![Test](figures_tim/other93-outputs.png) | ![Test](figures_tim/other93-targets.png)
![Test](figures_tim/other103-inputs.png) | ![Test](figures_tim/other103-outputs.png) | ![Test](figures_tim/other103-targets.png)
![Test](figures_tim/other136-inputs.png) | ![Test](figures_tim/other136-outputs.png) | ![Test](figures_tim/other136-targets.png)
![Test](figures_tim/other143-inputs.png) | ![Test](figures_tim/other143-outputs.png) | ![Test](figures_tim/other143-targets.png)
![Test](figures_tim/other173-inputs.png) | ![Test](figures_tim/other173-outputs.png) | ![Test](figures_tim/other173-targets.png)
![Test](figures_tim/other235-inputs.png) | ![Test](figures_tim/other235-outputs.png) | ![Test](figures_tim/other235-targets.png)
![Test](figures_tim/other246-inputs.png) | ![Test](figures_tim/other246-outputs.png) | ![Test](figures_tim/other246-targets.png)
![Test](figures_tim/other260-inputs.png) | ![Test](figures_tim/other260-outputs.png) | ![Test](figures_tim/other260-targets.png)
![Test](figures_tim/other277-inputs.png) | ![Test](figures_tim/other277-outputs.png) | ![Test](figures_tim/other277-targets.png)
![Test](figures_tim/other283-inputs.png) | ![Test](figures_tim/other283-outputs.png) | ![Test](figures_tim/other283-targets.png)
![Test](figures_tim/other358-inputs.png) | ![Test](figures_tim/other358-outputs.png) | ![Test](figures_tim/other358-targets.png)
![Test](figures_tim/other366-inputs.png) | ![Test](figures_tim/other366-outputs.png) | ![Test](figures_tim/other366-targets.png)
![Test](figures_tim/other400-inputs.png) | ![Test](figures_tim/other400-outputs.png) | ![Test](figures_tim/other400-targets.png)


## References
[1] Phillip Isola et al, arXiv:1611.07004v2  
[2] Jun-Yan Zhu et al., arXiv:1703.10593v4  
[3] https://affinelayer.com/pix2pix/  
[4] https://github.com/affinelayer/pix2pix-tensorflow  
[5] https://blog.floydhub.com/colorizing-b-w-photos-with-neural-networks/  
[6] https://www.floydhub.com/emilwallner/datasets/colornet  
[7] http://chaladze.com/l5/