In [1]:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import io
import itertools
import time
from IPython.display import clear_output

from sklearn.decomposition import PCA
import scipy

import torch
import torch.optim as optim
import torch.nn as nn
import torch.utils.data as data
import torch.nn.functional as F
import torch.distributions as TD
from torchvision.utils import make_grid
from torchvision import transforms

import pickle
import os
import sys
from tqdm.notebook import tqdm
from scipy.stats import multivariate_normal

if torch.cuda.is_available():
    DEVICE = 'cuda'
    GPU_DEVICE = 2
    torch.cuda.set_device(GPU_DEVICE)
else:
    DEVICE='cpu'
# DEVICE='cpu'

import warnings
warnings.filterwarnings('ignore')

# <span style="color:red"> No! </span>

# <center>Deep Generative Models</center>
## <center>Seminar 5</center>

<center><img src="pics/AIMastersLogo.png" width=600 /></center>
<center>10.10.2022</center>


## Plan

0. Missing topics of GMM via EM

1. VAE 
    
    - Your questions about VAE
    

### Missing topics of GMM via EM

## VAE

<img src="pics/vae-gaussian.png" width=800 height=800 />

$L(q, \theta) = \mathbb{E}_{z \sim q(z| x, \phi)} \ln p(x|z, \theta) - KL(q(z| x, \phi)||p(z))$

In the questions below consider 2 cases:

* $2D$ data
* Images data

**Question 1.** How to model VAE encoder $q(\boldsymbol{z} | \boldsymbol{x}, \phi)$? What does the encoder take as **input**? What is the **output** of the encoder?

<img src="pics/vae-gaussian.png" width=600 />

```python
# x : tensor (bs, 2) - first case
#   : tensor (bs, 3, w, h) - second case

z_stats = VAEEncoder(x) # (bs, 2*z_dim)

mu_z = z_stats[:, :z_dim] #(bs, z_dim)
log_sigma_z = z_stats[:, z_dim:] # (bs, z_dim)
```

**Question 2.** How to model VAE decoder $p(\boldsymbol{x} | \boldsymbol{z}, \theta)$? What does the decoder take as **input**? What is the **output** of the decoder?

<img src="pics/vae-gaussian.png" width=600 />

```python
# z : tensor (bs, z_dim)

X_stats = VAEDecoder(z) # X : tensor (bs, 2 * 2) - first case
                        #   : tensor (bs, 3, w, h ) - second case (predict only mu's!)
...

if not sample_from_decoder:
    return \mu_X
else:
    return sample N(\mu_X, \sigma_X)
```

**Question 3.** How to model $p(z)$?

<img src="pics/vae-gaussian.png" width=600 />

$\mathcal{N}(0, I_{\text{z_dim}})$

**Question 4.** How to sample from VAE?

<img src="pics/vae-gaussian.png" width=600 />

```python
1. z_sample = samples N(0, I_{z_dim}) of shape (bs,) # tensor (bs, z_dim)

2. X_stats = VAEDecoder(z_sample) # X : tensor (bs, 2 * 2) - first case
                                  #   : tensor (bs, 3, w, h ) - second case (predict only mu's!)
...
```

**Question 5.** 

1. Which **loss** function are we optimizing?

2. What parameters do we optimize?

3. How does it relate to the EM algorithm?

```python
for x in dataloader:
    vae_opt.zero_grad()
    loss = vae_loss(x) # what is the loss function here???
    loss.backward()
    vae_opt.step()
```

1. We optimize **ELBO**:

    $$L(\phi, \theta) = \mathbb{E}_{z \sim q(z| x, \phi)} \ln p(x|z, \theta) - KL(q(z| x, \phi)||p(z))$$
    
   but we do it in a tricky way:
   
   * $p(z) = \mathcal{N}(0, I_{\text{z_dim}})$
   
   * $q(z| x, \phi) = \mathcal{N}\big{(}\mu_z(x|\phi), \text{diag}\left( \sigma^{(1)}_z(x| \phi), \dots \sigma^{(\text{z_dim})}_z(x | \phi)\right)\big{)}$
   
   * $\Rightarrow$ $KL(q(z| x, \phi)||p(z))$ has closed-form expression (as a function of $\mu_z(x|\phi), \boldsymbol{\sigma_z}(x | \phi)$)
   
   * $\mathbb{E}_{z \sim q(z| x, \phi)} \ln p(x|z, \theta) = \mathbb{E}_{\epsilon \sim \mathcal{N}(0, I_{\text{z_dim}})} \ln p(x|\mu_z(x | \phi)) + \boldsymbol{\sigma}_z(x | \phi) \odot \epsilon, \theta) \approx \sum\limits_{i = 1}^{\text{batch_size}} \ln p(X_i|\mu_z(X_i | \phi)) + \boldsymbol{\sigma}_z(X_i | \phi) \odot \epsilon_i, \theta)$
   
   **Subquestion 5.1**: How to compute the log-likelihood $\ln p(X_i|\mu_z(Xi | \phi))$?
 
 
2. We optimize ELBO with respect to both $\phi$ (decoder parameters) and $\theta$ (encoder parameters) 

3. The optimization procedure is actually **Variational EM** algorithm in a joint manner:

$$\phi_{k + 1} = \phi_{k} + \eta \nabla_{\phi} L(q(\cdot| \boldsymbol{X}, \phi), \theta_k)|_{\phi=\phi_{k}} \\
\theta_{k + 1} = \theta_{k} + \eta \nabla_{\theta} L(q(\cdot| \boldsymbol{X}, \color{red}{\phi_{k}}), \theta)|_{\theta=\theta_{k}}$$

### Time for your questions regarding VAE