# Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
This note is a section by section summary of the [paper](https://arxiv.org/abs/1703.10593)  based on personal understanding. Some personal remarks appear at the end of the note.



## 1. Introduction

Consider a mapping $G:X \rightarrow Y$ such that the disribution of $G(X)$ is indistinguishable from the distribution of $Y$. However, there are infinitely many such mappings (including e.g. mode collapse). Such a map does not guarantee that an individual
input $x$ and output $y$ are paired up in a meaningful way. Solution: **cycle consistency loss** that encourages $F(G(x))\simeq x$ and $G(F(y))\simeq y$


<br/><br/>
## 2. Related work
...




## 3. Formulation

#### 3.1. Adversarial Loss

$$
\mathcal{L}_{GAN} (G,D_Y,X,Y) = \mathbb{E}_{y\sim p_{data}(y)}\left[ \log D_Y(y)\right] + \mathbb{E}_{x\sim p_{data}(x)}\left[ \log (1 - D_Y(G(x)))\right]
$$

train $\min_G \max_{D_Y} \mathcal{L}_{GAN} (G, D_Y, X, Y)$, and similarily for $F:Y\rightarrow X$



#### 3.2 Cycle Consistency Loss

$$
\mathcal{L}_{cyc}(G,F) = \mathbb{E}_{x\sim p_{data}(x)} \left[ ||F(G(x))-x||_1\right] + \mathbb{E}_{y\sim p_{data}(y)} \left[ ||G(F(y))-y||_1\right]
$$



#### 3.3. Full objective

$$
\mathcal{L} (G,F,D_X,D_Y,X,Y) = \mathcal{L}_{GAN} (G,D_Y,X,Y) + \mathcal{L}_{GAN} (F,D_X,X,Y) + \lambda\mathcal{L}_{cyc}(G,F)
$$

Such a setup can also be seen as a special case of “adversarial autoencoders”

<br/><br/>
## 4. Implementation

#### Network Architecture
- *Generator*: 3 convolutions with residuals, 2 of them use fractional stride 1/2. use instance normalization.
- *Discriminator*: use 70x70 PatchGANs

#### Training details
- replace $\mathcal{L}_{GAN}$ by least-square loss:
    - $G$ to minimize $\mathbb{E}_{x\sim p_{data}(x)} \left[ (D(G(x))-1)^2 \right]$, 
    - $D$ to minimize $\mathbb{E}_{y\sim p_{data}(y)} \left[ (D(y)-1)^2\right] + \mathbb{E}_{x\sim p_{data}(x)} \left[ D(G(x))^2\right]$
    
    
- reduce model oscillation based on [this](https://arxiv.org/abs/1612.07828)
    - *self-regularization* term, 
    - local adversarial loss,
    - updating the discriminator using a history of refined images.
    
#### others
- $\lambda = 10$
- Adam with batch size 1
- lr = 0.0002 for the first 100 epochs then linearly decay to zero over the next 100 epohs


<br/><br/>
## 5. Results
...





## 6. Limitations and Discussion


- little success for tasks that require geometric changes (e.g. dog$\rightarrow$cat resulted minimal changes)
- lingering gap between the results achievable with paired training data and those achieved by our unpaired method
    - maybe this gap is impossible to remove