## What is CycleGAN

The CycleGAN is a technique that involves the automatic training of image-to-image translation models without paired examples. The models are trained in an unsupervised manner using a collection of images from the source and target domain that do not need to be related in any way.

The CycleGAN is an extension of the GAN architecture that involves the simultaneous training of two generator models and two discriminator models.

One generator takes images from the first domain as input and outputs images for the second domain, and the other generator takes images from the second domain as input and generates images for the first domain.

Discriminator models are then used to determine how plausible the generated images are and update the generator models accordingly.

CycleGan is used to transfer characteristic of one image to another or can map the distribution of images to another. In CycleGAN we treat the problem as an image reconstruction problem. We first take an image input (x) and using the generator G to convert into the reconstructed image. Then we reverse this process from reconstructed image to original image using a generator F. Then we calculate the mean squared error loss between real and reconstructed image.


## What problem CycleGAN solved 

![Imgur](https://imgur.com/GdbPyH7.png)


Traditionally, training an image-to-image translation model requires a dataset comprised of paired examples. That is, a large dataset of many examples of input images X (e.g. summer landscapes) and the same image with the desired modification that can be used as an expected output image Y (e.g. winter landscapes).

The requirement for a paired training dataset is a limitation. These datasets are challenging and expensive to prepare, e.g. photos of different scenes under different conditions.

While there has been a great deal of research into this task, most of it has utilized supervised training, where we have access to (x, y) pairs of corresponding images from the two domains we want to learn to translate between.

The genius insight of this UC Berkeley group was that we do not, in fact, need perfect pairs.[1] Instead, we simply complete the cycle: we translate from one domain to another and then back again. For example, we go from summer picture (domain A) of a park to a winter one (domain B) and then back again to summer (domain A).

Now we have essentially created a cycle, and, ideally, the original picture (a) and the reconstructed picture () are the same.

If they are not, we can measure their loss on a pixel level, thereby getting the first loss of our CycleGAN: cycle-consistency loss.

The most important feature of CycleGAN architecture is that, if we have un-paired images from different domains without direct mapping between them then a cyclegan model should still be able to translate images between these two domains.

---

The model architecture is comprised of two generator models: one generator (Generator-A) for generating images for the first domain (Domain-A) and the second generator (Generator-B) for generating images for the second domain (Domain-B).

Generator-A -> Domain-A
Generator-B -> Domain-B


The generator models perform image translation, meaning that the image generation process is conditional on an input image, specifically an image from the other domain. Generator-A takes an image from Domain-B as input and Generator-B takes an image from Domain-A as input.

Domain-B -> Generator-A -> Domain-A
Domain-A -> Generator-B -> Domain-B


The first discriminator model (Discriminator-A) takes real images from Domain-A and generated images from Generator-A and predicts whether they are real or fake.

The second discriminator model (Discriminator-B) takes real images from Domain-B and generated images from Generator-B and predicts whether they are real or fake.

Domain-A -> Discriminator-A -> [Real/Fake]
Domain-B -> Generator-A -> Discriminator-A -> [Real/Fake]
Domain-B -> Discriminator-B -> [Real/Fake]
Domain-A -> Generator-B -> Discriminator-B -> [Real/Fake]

The discriminator model is responsible for taking a real or generated image as input and predicting whether it is real or fake.

The discriminator model is implemented as a PatchGAN model.

The difference between a PatchGAN and regular GAN discriminator is that, given for example 256x256 images, the regular GAN discriminator maps from a 256x256 image to a single scalar output, which signifies "real" or "fake", whereas the PatchGAN maps from 256x256 to an NxN array of outputs X, where each `X_ij` signifies whether the patch `ij` (in X) in the image is real or fake.  Here NxN can be different depending on the dimension of an input image

And what would be the patch `ij` in the input? Well, output `X_ij` is just a neuron in a convnet, and we can trace back its receptive field to see which input pixels it is sensitive to.

## What is PatchGAN

![Imgur](https://imgur.com/ltsuAV3.png)

#### In the CycleGAN architecture, the receptive fields of the discriminator turns out to be 70x70 patches in the input image!

What that means is that, PatchGAN architecture outputs a feature map of roughly 30x30 points. Each of these points on the feature map can see a patch of 70x70 pixels on the input space (this is called the receptive field size,


So to be precise, the PatchGAN architecture is equivalent to chopping up the image into 70x70 patches, making a big batch out of these patches, and running a discriminator on each patch, with batchnorm applied across the batch, then averaging the results.


The advantage of using a patchGAN over a normal GAN discriminator is, it has fewer parameters than normal discriminator also it can be applied to input images of different sizes, e.g. larger or smaller than 256×256 pixels.


The output of the model depends on the size of the input image but may be one value or a square activation map of values. Each value is a probability for the likelihood that a patch in the input image is real. These values can be averaged to give an overall likelihood or classification score if needed.

![Imgur](https://imgur.com/BsVr7vn.png)


---


## Cycle Consistency AND the concept of Translation


The CycleGAN uses an additional extension to the Normal architecture called cycle consistency. This is the idea that an image output by the first generator could be used as input to the second generator and the output of the second generator should match the original image.

The reverse is also true: that an output from the second generator can be fed as input to the first generator and the result should match the input to the second generator.

Cycle consistency is a concept from machine translation where a phrase translated from English to French should translate from French back to English and be identical to the original phrase. The reverse process should also be true.


To be able to use the cycle-consistency loss, we need to have two Generators: one translating from A to B, called GAB, sometimes referred to as simply G, and then another one translating from B to A, called GBA, referred to as F for brevity. There are technically two losses—forward cycle-consistency loss and backward cycle-consistency loss

---


## Forward Cycle Consistency Loss

* Input photo of summer (collection 1) to GAN 1
* Output photo of winter from GAN 1
* Input photo of winter from GAN 1 to GAN 2
* Output photo of summer from GAN 2
* Compare photo of summer (collection 1) to photo of summer from GAN 2


## Backward Cycle Consistency Loss
* Input photo of winter (collection 2) to GAN 2
* Output photo of summer from GAN 2
* Input photo of summer from GAN 2 to GAN 1
* Output photo of winter from GAN 1
* Compare photo of winter (collection 2) to photo of winter from GAN 1

---

##  IDENTITY LOSS

The idea of identity loss is simple: we want to enforce that
CycleGAN preserves the overall color structure (or
temperature) of the picture. So we introduce a regularization
term that helps us keep the tint of the picture consistent with
the original image. Imagine this as a way of ensuring that even
after applying many filters onto your image, you still can
recover the original image.


This is done by feeding the images already in domain A to the
Generator from B to A (G ), because the CycleGAN should
BAunderstand that they are already in the correct domain. In other
words, we penalize unnecessary changes to the image: if we
feed in a zebra and are trying to “zebrafy” an image, we get
the same zebra back, as there is nothing to do.

---

### The Objective Function

There are two components to the CycleGAN objective function, an adversarial loss and a cycle consistency loss. Both are essential to getting good results.
If you are familiar with GANs, the adversarial loss should come as no surprise. Both generators are attempting to “fool” their corresponding discriminator into being less able to distinguish their generated images from the real versions. We use the least squares loss (found by Mao et al to be more effective than the typical log likelihood loss) to capture this.


![Imgur](https://imgur.com/D4O3pIQ.png)

However, the adversarial loss alone is not sufficient to produce good images, as it leaves the model under-constrained. It enforces that the generated output be of the appropriate domain, but does not enforce that the input and output are recognizably the same. For example, a generator that output an image y that was an excellent example of that domain, but looked nothing like x, would do well by the standard of the adversarial loss, despite not giving us what we really want.
The cycle consistency loss addresses this issue. It relies on the expectation that if you convert an image to the other domain and back again, by successively feeding it through both generators, you should get back something similar to what you put in. It enforces that F(G(x)) ≈ x and G(F(y)) ≈ y.


![Imgur](https://imgur.com/J4kru05.png)

We can create the full objective function by putting these loss terms together, and weighting the cycle consistency loss by a hyperparameter λ. We suggest setting λ = 10.

![Imgur](https://imgur.com/LrU8Xl7.png)
