## Gaussian Autoregressive Normalizaing flows

<center><img src="pics/flows_how2.png" width=800 /></center>



TODO: a few words about objective

* $f = f_{K} \circ f_{K - 1} \circ \dots \circ f_1$. $f_{k}$, $f_k$ is some diffeomorphism

* $f^{-1} = g = g_1 \circ g_2 \circ \dots \circ g_{K}$. $g_{k} = f_{k}^{-1}$ 

For Gaussian Autoregressive NF $f_k$ and $g_k$ are expressed with the following formulas:
$$
    \begin{aligned}
        g_k: x_j &= \sigma_j(x_{1:j-1}) z_j + \mu_j(x_{1:j-1}), \quad z_j \sim N(0, 1)\\
        f_k:      z_j &=  (x_j - \mu_j(x_{1:j-1})) \frac{1}{\sigma_j(x_{1:j-1})}
    \end{aligned}
$$

$g_k$ is sequential, $f_k$ is not sequential.

As a result we have 1) slow autoregressive sampling and 2) fast likelihood computation

$$
    \begin{aligned}
        f_k:      z_j &=  \sigma_j(z_{1:j-1}) x_j + \mu_j(z_{1:j-1}) \\
        g_k: x_j &= (z_j - \mu_j(z_{1:j-1})) \frac{1}{\sigma_j(z_{1:j-1})}, \quad z_j \sim N(0, 1)
    \end{aligned}
$$

$g_k$ is not sequential, $f_k$ is sequential.

As a result we have 1) fast sampling and 2) slow autoregressive likelihood computation.

Moreover, now we have much slower training process.

So we would like to have a model without any of theese drawback, but still expressive enough.

## RealNVP

For RealNVP $f_k$ and $g_k$ are expressed with the following formulas (both of them are not sequential):

<center><img src="pics/RealNVPblock.png" width=800 /></center>

**Q:** How to model $\boldsymbol{\sigma}(\cdot, \theta)$ and $\boldsymbol{\mu}(\cdot, \theta)$ for $2D$ data case?

```python
# x : tensor (bs, 2) 

x_1 = x * mask # tensor (bs, 2), mask is [0, 1] or [1, 0]
logit = NN(x_1) # tensor (bs, 2), i.e. NN : (bs, 2) -> (bs, 2)
mu, log_sigma = logit.split # tensors (bs, 1), (bs, 1)
# What to do next?
mu = mu * (1 - mask)
log_sigma = log_sigma * (1 - mask)
z = log_sigma.exp() * x + mu
```

### Jacobian

<center><img src="pics/RealNVPblock.png" width=800 /></center>

$$ \log\det \left(\frac{\partial \boldsymbol{z}}{\partial \boldsymbol{x}}\right) = \log\det \begin{bmatrix}\mathbf{I}_d & 0_{d \times m - d}\\ \frac{\partial \boldsymbol{z}_2}{\partial \boldsymbol{x}_1} & \frac{\partial \boldsymbol{z}_2}{\partial \boldsymbol{x}_2} \end{bmatrix} =\\= \text{sum } [ \underbrace{0, 0, \dots 0}_{d \text{ times}} ,  \log \frac{\partial z_{d + 1}}{\partial x_{d + 1}}, \dots , \log \frac{\partial z_{m}}{\partial x_{m}}] = ?$$

**Q:** What is RealNVP block input and output?

```python
# x : tensor (bs, 2) 

z, log_det = RealNVPBlock(x) # tensors (bs, 2), (bs, 2)
```

* `log_det` is the batch of vectors $[\log \frac{\partial z_{1}}{\partial x_{1}}, \log \frac{\partial z_{2}}{\partial x_{2}}]$

Principial RealNVP block scheme (for both $2D$ data and image data)

```python
# x : tensor (bs, *shape)

x_1 = prepare_x1(x)
logit = NN(x_1)
mu, log_sigma = logit.split
z = coupling(mu, log_sigma, x) # z.shape == x.shape
log_det = log_frac_dz_i_dx_i(mu, log_sigma, x) # log_det.shape == x.shape !!!
```

**Q:** How to combine several RealNVP blocks?

```python
# x : tensor (bs, 2) 

# RealNVPBlockList = [RealNVPBlock(0), RealNVPBloc(1), ...]
log_det = 0
for i in range(N):
    z, curr_log_det = RealNVPBlockList[i](x)
    log_det += curr_log_det
    x = z
```

**Q:** How to train RealNVP model?

Use **ForwardKL** in the data $X$-space (or **ReverseKL** in the latent $Z$-space which is equivalent). Objective:

$$-E_{\pi(x)} \left(\log p_z(f(x, \theta)) + \log | \det J_f|\right)$$

**Q:** How to split data vector $\boldsymbol{x}$ onto $[\boldsymbol{x}_1, \boldsymbol{x}_2]$ when $\boldsymbol{x}$ is an image?

### RealNVP block for image data case

The splitting schemes were proposed in the original RealNVP [article](https://arxiv.org/pdf/1605.08803.pdf).

<center><img src="pics/image_splitting_realnvp.png" width=800 /></center>

**Question**: What does the picture show?

### `CheckerboardCouplingLayer`

<center><img src="pics/checkerboard_splitting.png" width=400 /></center>

**Q:** Let input $\boldsymbol{x}$ has shape `(bs, c, w, h)`. 

What is the input and output of the network which produces $\boldsymbol{\mu}, \boldsymbol{\log \sigma}$ (what tensors and of which shape)? 

What is the output of `CheckerboardCouplingLayer` (what tensors and of which shape)?

### `ChannelCouplingLayer`

<center><img src="pics/channelwise_splitting.png" width=400 /></center>

**Q:** Do we need to mask the input tensor $\boldsymbol{x}$ in order to get $\boldsymbol{x}_1$?

**Q:** Let input $\boldsymbol{x}$ has shape `(bs, 2 * c, w, h)`. 

What is the input and output of the network which produces $\boldsymbol{\mu}, \boldsymbol{\log \sigma}$ (what tensors and of which shape)?

What is the output of `ChannelCouplingLayer` (what tensors and of which shape)?

### `squeeze` and `undo_squeeze` operations

<center><img src="pics/squeezing.png" width=600 /></center>

**Q:** Let input $\boldsymbol{x}$ has shape `(bs, c, w, h)`. Shape of the tensor under `squeeze` operation?

**Expected ordering of Coupling layers**

(following the original article [RealNVP](https://arxiv.org/pdf/1605.08803.pdf))

```python
#input: (bs, 1, w, h)
CheckerboardCouplingLayer("even"),
CheckerboardCouplingLayer("odd"),
CheckerboardCouplingLayer("even"),
CheckerboardCouplingLayer("odd"),
```

```python
# squeeze the tensor: (bs, 1, w, h) -> (bs, 4, w/2, h/2)
squeeze()
ChannelCouplingLayer("top")
ChannelCouplingLayer("bottom")
ChannelCouplingLayer("top")
ChannelCouplingLayer("bottom")
```

```python
# unsqueeze the tensor: (bs, 4, w/2, h/2) -> (bs, 1, w, h)
unsqueeze()
CheckerboardCouplingLayer("even"),
CheckerboardCouplingLayer("odd"),
CheckerboardCouplingLayer("even"),
CheckerboardCouplingLayer("odd")
```