# GAN architectures

<img src="images/GANsnRoses.png" width="150pt"/>

Many GANs in our armory!

## [Progressive GAN](https://arxiv.org/abs/1710.10196) (Oct 2017)

Challange: generate __high-resolution high-detail__ pictures.

### Progressively growing the resolution

<img src="images/ProGAN_grow.webp" width="600pt"/>

### Fading between resolutions

<img src="images/ProGAN_blend.png" width="600pt"/>

### Tricks

- __Increased variation__ with batch statistics:  
  __standard-deviation layer__ concatenated at the end of the discriminator.
- __Controlled weights__ with run-time weight normalization using [He's initialization](https://arxiv.org/abs/1502.01852).
- __Controlled featurte maps__ with featrure map normalization.

## [Style GAN](https://arxiv.org/abs/1812.04948) (Dec 2018)

This paper proposed heavy changes on the generator part.

<img src="images/StyleGAN.png" width="500pt"/>

### No input layer, latents "from the side"

- __No input layer__ for the noise;
- a __learned constant__ instead.
- Latent variables $w$ __affinely transformed__;
- noise $\sigma$ __scale trasformed__;
- both $w$ and $\sigma$ inserted __from the side__;
- generator controlled at __different resolutions__.

### Mapping framework from latents $z$ to styles $w$

Latents $z$ are non-linearly mapped to the latent space $\mathcal{W}$.

$\mathcal{W}$ represents high-level characteristics.

In $\mathcal{W}$ the variables are more disentagled than $\mathcal{Z}$ (shown by separability measure).

### Adaptive Instance Normalization

Batch normalization reduces the variability, instance-normalization is better.

These are the normalization definitions:

$$\large
BN(x) = \alpha \frac{x - \mu(x)}{\sigma(x)} + \beta
$$

$$\large
IN(x_i) = \alpha \frac{x_i - \mu(x_i)}{\sigma(x_i)} + \beta
$$

$$\large
AdaIN(x_i, y) = y_{s,i} \frac{x_i - \mu(x_i)}{\sigma(x_i)} + y_{b,i}
$$

where $y$ is the style ($w$ affinely transformed), __no more trained parameters__.

### Mixed regularization

Geneate __many $w$s__, feed them randomly at different resolutions.

FID wrt the number of $w$s (FFHQ):

Mixing regularization |   1   |   2   |   3   |   4
:-------------------- | :---: | :---: | :---: | :---:
0% | 4.42 | 8.22 | 12.88 | 17.41
50% | 4.41 | 6.10 | 8.71 | 11.61
90% | __4.40__ | __5.11__ | 6.88 | 9.03
100% | 4.83 | 5.17 | __6.63__ | __8.40__

### Results

FID measure on different work steps (the lower the better):

Method | CelebA-HQ | FFHQ
:----- | :-------: | :--:
  Baseline Progressive GAN | 7.79 | 8.04
+ Tuning (incl. bilinear up/down) | 6.11 | 5.25
+ Add mapping and styles | 5.34 | 4.85
+ Remove traditional input | 5.07 | 4.88
+ Add noise inputs | __5.06__ | 4.42
+ Mixing regularization | 5.17 | __4.40__

### Mixing styles

<img src="images/StyleGAN_styles.png" width="500pt"/>


## [Style GAN 2](https://arxiv.org/abs/1912.04958) (Dec 2019)

Localized improvements to StyleGAN.

## Normalization artifacts

AdaIN normalization generates __droplet-like artifacts__.

This is a generator attempt to __bypass the normalization using spikes__.

<img src="images/StyleGAN2_droplets.png" width="750pt"/>

Main change: __noise after normalization__ but __before modulation__.

Secondary change: __mean normalization not needed__ anymore.

<img src="images/StyleGAN2.png" width="750pt"/>

Final step: weight (de)modulation.  
Do not directly normalize/scale instances, __adjust convolution weights__ instead:

$$\large
w'_{ijk} = s_i \cdot w_{ijk}
$$

$$\large
w''_{ijk} = \frac{w'_{ijk}}{\sqrt{\sum_{i,k}{w'_{ijk}}^2 + \epsilon}}
$$

## Better progressive growing

Progressive growing generates artifacts:

<img src="images/StyleGAN_issues.gif" width="600pt"/>

These mitigated by revisited network architecture:

<img src="images/StyleGAN_progressive.png" width="600pt"/>

## [Style GAN 2 ADA](https://arxiv.org/abs/2006.06676) (Jun 2020)

Problem: with a small dataset __D memorizes the data__ (overfitting).

Solution: __augment the dataset__.

Problem: the __generator produces augmented images__ ("leaking" augmentations).

Solution: __augment both reals and fakes__, this requires __differentiable augmentations__.

<img src="images/StyleGAN2-ADA.png" width="750pt"/>

## Leaking augmentations

Depending on augmentation type and probability, it can leak or not:

<img src="images/StyleGAN2-ADA_leaking.png" width="750pt"/>

## [Style GAN 3, Alias-Free GAN](https://arxiv.org/abs/2106.12423) (Oct 2021)

### The problem

> Textures used by the generator are not translation/rotation invatiant.

<img src="images/StyleGAN3_alias01.gif" width="750"/>
<img src="images/StyleGAN3_alias02.gif" width="750"/>

Watch some videos [here](https://nvlabs.github.io/stylegan3/).

### Continuous signal interpretation (and bandlimit requirement)

By Niquist-Shannon: 

> A signal can be fulluy restored sampling at 2 times its maximum frequency.

Consider the image $Z$ as the continuos image signal $z$ multiplied with a sampling grid of Dirac deltas $III_s$.

$$\Large
Z = z \odot III_s
$$

The opposite direction obtained by convolution with the 2d sync function:

$$\Large
z = Z * \phi_s
$$

<img src="images/StyleGAN3_continuous.png" width="600"/>

Continuous and equivalent discrete operators $f$ and $F$ are related with:

$$\Large
f(z) = F(z \odot III_s) * \phi_s \qquad F(Z) = f(Z * \phi_s) \odot III_s 
$$

### Equivalence

Given the input, __we wish equivalence under translation and rotation__.

An operator $f$ is equivalent under a transformation $t$ if commutes: $t\cdot f = f\cdot t$.

### Operators

- __Convolution__: 
 - bandwidth requirements and translation equivalence by nature, 
 - rotation invariance using 1x1 convolutions.
- __Upsample/Downsample__:
 - translation/rotation invariant,
 - implemented as pad-convolve and convolve-drop.
- __Nonlinearity__:
 - pointwise operations have translation/rotation invariance.
 - bandwidth limit managed by changing resolution.

### Changed structure

- Input 4x4x1024 substituted by fourier features (same sampling space but representation in 36x36x1024).
- No noise input.
- Fewer layers (14 VS 18).
- Low-pass filtering with jinc (Sombrero).

<img src="images/StyleGAN3.png" width="600"/>
<center><a href="https://medium.com/@steinsfu/stylegan3-clearly-explained-793edbcc8048">Steins</a></center>