In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import io
import itertools
import time
from IPython.display import clear_output

from sklearn.decomposition import PCA
import scipy

import torch
import torch.optim as optim
import torch.nn as nn
import torch.utils.data as data
import torch.nn.functional as F
import torch.distributions as TD
from torchvision.utils import make_grid
from torchvision import transforms

import pickle
import os
import sys
sys.path.append('../../homeworks') # to grab dgm_utils from ../../homeworks directory
from tqdm.notebook import tqdm
from scipy.stats import multivariate_normal


# for IWAE
from scipy.stats import multivariate_normal
from matplotlib import ticker, cm
from matplotlib import gridspec
from matplotlib import collections  as mplc
from scipy.special import logsumexp
# end for IWAE

if torch.cuda.is_available():
    DEVICE = 'cuda'
    GPU_DEVICE = 1
    torch.cuda.set_device(GPU_DEVICE)
else:
    DEVICE='cpu'
# DEVICE='cpu'

import warnings
warnings.filterwarnings('ignore')

# <span style="color:red"> No! </span>

# dgm_utils
from dgm_utils import train_model, plot_training_curves, show_samples
from dgm_utils import visualize_2d_samples, visualize_2d_densities, visualize_2d_data
# from seminar6_utils import visualize_2d_map, visualize_samples_pdf

# <center>Deep Generative Models</center>
## <center>Seminar 7</center>

<center><img src="pics/AIMastersLogo.png" width=600 /></center>
<center>24.10.2022</center>


## Plan

1. RealNVP implementation hints
    
    - RealNVP on 2D data
    
    - RealNVP for image data

## RealNVP

<center><img src="pics/flows_how2.png" width=800 /></center>



* $f = f_{K} \circ f_{K - 1} \circ \dots \circ f_1$. $f_{k}$ is a **RealNVP** coupling layer.

* $f^{-1} = g = g_1 \circ g_2 \circ \dots \circ g_{K}$. $g_{k} = f_{k}{-1}$ are easilty deduced from the $f_k$. 

### $f_k$ and $g_k$

<center><img src="pics/RealNVPblock.png" width=800 /></center>

**Question** How to model $\boldsymbol{\sigma}(\cdot, \theta)$ and $\boldsymbol{\mu}(\cdot, \theta)$ in $2D$ data case?

Partial answer:)

```python
# x : tensor (bs, 2) 

x_1 = x * mask # tensor (bs, 2), mask is [0, 1] or [1, 0]
mu, log_sigma = NN(x_1).split # tensors (bs, 1), (bs, 1)

```

**Question** What is the `mask` shape?

### Jacobian

$$ \log\det \left(\frac{\partial \boldsymbol{z}}{\partial \boldsymbol{x}}\right) = \log\det \begin{bmatrix}\mathbf{I}_d & 0_{d \times m - d}\\ \frac{\partial \boldsymbol{z}_2}{\partial \boldsymbol{x}_1} & \frac{\partial \boldsymbol{z}_2}{\partial \boldsymbol{x}_2} \end{bmatrix} =\\= \text{sum } [ \underbrace{0, 0, \dots 0}_{d \text{ times}} ,  \log \frac{\partial z_{d + 1}}{\partial x_{d + 1}}, \dots , \log \frac{\partial z_{m}}{\partial x_{m}}] = ?$$

**Question** What is RealNVP block input and output?

```python
# x : tensor (bs, 2) 

z, log_det = RealNVPBlock(x) # tensors (bs, 2), (bs, 2)
```

* `log_det` is the batch of vectors $[\log \frac{\partial z_{1}}{\partial x_{1}}, \log \frac{\partial z_{2}}{\partial x_{2}}]$

**Question** How to train RealNVP model?

Use **ForwardKL** in the data $X$-space (or **ReverseKL** in the latent $Z$-space which is equivalent). Objective:

$$-E_{\pi(x)} \left(\log p_z(f(x, \theta)) + \log | \det J_f|\right)$$

Recall the *Seminar 6* for the details

**Question** How to split data vector $\boldsymbol{x}$ onto $[\boldsymbol{x}_1, \boldsymbol{x}_2]$ when $\boldsymbol{x}$ is an image?

### RealNVP block for image data case

The splitting schemes were proposed in the original RealNVP [article](https://arxiv.org/pdf/1605.08803.pdf).

<center><img src="pics/image_splitting_realnvp.png" width=800 /></center>

### `CheckerboardCouplingLayer`

<center><img src="pics/checkerboard_splitting.png" width=400 /></center>

**Question** Let input $\boldsymbol{x}$ has shape `(bs, c, w, h)`. What is the output of the network which produces $\boldsymbol{\mu}, \boldsymbol{\log \sigma}$? What is the output of `CheckerboardCouplingLayer` (what tensors and of which shape)?

### `ChannelCouplingLayer`

<center><img src="pics/channelwise_splitting.png" width=400 /></center>

**Question** Do we need to mask the input tensor $\boldsymbol{x}$ in order to get $\boldsymbol{x}_1$?

**Question** Let input $\boldsymbol{x}$ has shape `(bs, 2 * c, w, h)`. What is the output of the network which produces $\boldsymbol{\mu}, \boldsymbol{\log \sigma}$? What is the output of `ChannelCouplingLayer` (what tensors and of which shape)?

### `squeeze` and `undo_squeeze` operations

<center><img src="pics/squeezing.png" width=600 /></center>

**Question** Let input $\boldsymbol{x}$ has shape `(bs, c, w, h)`. Output of the tensor under `squeeze` operation?

**Expected ordering of Coupling layers**


```python
#input: (bs, 1, w, h)
CheckerboardCouplingLayer("even"),
CheckerboardCouplingLayer("odd"),
CheckerboardCouplingLayer("even"),
CheckerboardCouplingLayer("odd"),
# squeeze the tensor: (bs, 1, w, h) -> (bs, 4, w/2, h/2)
squeeze()
ChannelCouplingLayer("top")
ChannelCouplingLayer("bottom")
ChannelCouplingLayer("top")
ChannelCouplingLayer("bottom")
# unsqueeze the tensor: (bs, 4, w/2, h/2) -> (bs, 1, w, h)
CheckerboardCouplingLayer("even"),
CheckerboardCouplingLayer("odd"),
CheckerboardCouplingLayer("even"),
CheckerboardCouplingLayer("odd")
```