# Lecture #21: Implementation of Variational Autoencoders
## AM 207: Advanced Scientific Computing
### Stochastic Methods for Data Analysis, Inference and Optimization
### Fall, 2021

<img src="fig/logos.jpg" style="height:150px;">

## Outline

1. Overall Structure of an VAE Implementation
2. Implementing the ELBO
3. Implementing the Log-Likelihood

## Overall Structure of VAE Implementation

A VAE implementation requires us to keep track of two models: the generative model and the inference model. We need to update the parameters of these models during training, we will eventually need to evaluate the model by generating synthetic data and by computing the log-likelihood. 

Passing the two models back and forth between functions might get complicated. So we choose to implement a VAE class.

```python 
class VAE:
    # Initialization 
    def __init__(self, generative_architecture, inference_architecture):
        
        # We use the Feedforward neural network class to make the generative and inference models
        # 1. Initialize weights for the generative and inference models
        # 2. Instantiate generative model
        self.generative = Feedforward(decoder_architecture, weights=generative_weights)
        # 3. Instantiate inference model
        self.inference = Feedforward(encoder_architecture, weights=inference_weights)
        # 4. Do other initializations...
        
    # Define the variational objective: the negative ELBO, using S number of samples for the Monte Carlo estimte in the ELBO
    def make_objective(self, y, S):
        # ...
        return variational_objective, grad(variational_objective)
    
    # Generate synthetic data from the generative model
    def generate(self, N=100):
        # ...
        return synthetic_samples
    
    # Infer the variational parameters of the gaussian approximation of p(z|y) given y
    def infer(self, y):
        # ...
        return variational_means, variational_variances
    
    # Compute the log-likelihood of the data y using S number of samples for the Monte Carlo estimte in the log-likelihood
    def log_likelihood(self, y, S=100):
        # ...
        return lkhd
    
    # Fit the model given training data y, using S number of samples for the Monte Carlo estimte in the ELBO
    def fit(self, y, S):
        # ...
        return None
        
```

## Defining `.make_objective`
The `.make_objective` function of the VAE class takes in parameters: `y`, the training data, and `S`, the number of samples used for the Monte Carlo estimte in the ELBO. This function returns the negative ELBO and the gradient of the ELBO function.

Recall that the ELBO for an VAE, whose generative model has parameters $w$ and whose inference model has parameters $v$, is written as follows:

\begin{align}
\mathrm{ELBO}(w, q_v) &= \sum_{n=1}^N \left[\mathbb{E}_{q_v}\left[\log p_w(y_n|z_n) \right] - \mathrm{D}_{\mathrm{KL}}\left[q_v(z_n)|p(z_n) \right]\right]\\
&= \sum_{n=1}^N \left[\mathbb{E}_{q_v}\left[\log p_w(y_n|z_n) \right] - \mathbb{E}_{q_v}\left[\log \frac{q_v(z_n)}{p(z_n)} \right]\right]\\
&= \sum_{n=1}^N \left[\mathbb{E}_{q_v}\left[\log p_w(y_n|z_n) \right] - \mathbb{E}_{q_v}\left[\log q_v(z_n)\right] + \mathbb{E}_{q_v} \left[{p(z_n)} \right]\right]\\
&\approx \sum_{n=1}^N \left[\frac{1}{S}\sum_{s=1}^S \log p_w(y_n|z^s_n)  - \frac{1}{S}\sum_{s=1}^S\log q_v(z^s_n) + \frac{1}{S}\sum_{s=1}^S {p(z^s_n)}\right], \quad z^s_n \sim q_v(z).
\end{align}

We see that we need to implement three terms: 

1. $\mathbb{E}_{q_v}\left[\log p_w(y_n|z_n) \right]$
2. $\mathbb{E}_{q_v}\left[\log q_v(z_n)\right]$
3. $\mathbb{E}_{q_v} \left[{p(z_n)} \right]$

When we implement $\mathbb{E}_{q_v}\left[\log p_w(y_n|z_n) \right]$, we need to first sample $z_n$ from the inference model given the current inference parameters, $q_v$. Recall that $q_v$ is the Gaussian distribution $\mathcal{N}(\mu_v(y_n), \sigma_v(y_n))$, where $\mu_v(y_n)$ and $\sigma_v(y_n)$ are the variational mean and variance for the variational posterior for $y_n$.

Recall that $p_w(y_n|z_n)$ is the Gaussian distribution $\mathcal{N}(y_n; f_w(z_n), \sigma_y^2I)$, where $f_w$ is the generative model given the current generative parameters. So we will need to pass the samples $z_n$ through the inference model and obtain $f_w(z_n)$, then we evaluate the Gaussian pdf, with mean $f_w(z_n)$ and variance $\sigma_y^2$, at $y_n$.

When we implement $\mathbb{E}_{q_v}\left[\log q_v(z_n)\right]$, we evaluate the Gaussian $\mathcal{N}(\mu_v(x_n), \sigma_v(x_n))$ at the samples $z_n$, drawn from the variational posterior $q_z$ for $x_n$.

Finally, when we implement $\mathbb{E}_{q_v} \left[{p(z_n)} \right]$, we evaluate the standard Gaussian $\mathcal{N}(0, I)$ at the samples $z_n$, drawn from the variational posterior $q_z$ for $x_n$.

```python
def make_objective(self, y, y_var, S):
           
        def variational_objective(params, t):
            '''
            Definition of the ELBO
            params: the parameters of the generative and inference models concatenated into a single vecor
            t: the current iteration number, required by autograd
            '''
            
            #unpack the generative and inference model parameters
            inference_weights, generative_weights = self.unpack_weights(params)

            #infer z's: this returns the variational means and variances concatenated
            z_params = self.inference.forward(inference_weights, y)
            
            #unpack the variational means and variances
            mean, std = self.unpack_params(z_params)
            
            #sample z's
            z_samples = np.random.normal(0, 1, size=(self.S, self.z_dim, N)) * std + mean
            
            #predict y's
            y_pred = self.generative.forward(decoder_weights, z_samples)
            
            #evaluate term 1: log-likelihood
            log_likelihood = np.sum(norm.logpdf(y, y_pred, y_var ** 0.5), axis=-2)
            
            #evaluate term 2: evaluate sampled z's under variational distribution
            log_qz_given_y = np.sum(norm.logpdf(z_samples, mean, std), axis=-2)
                        
            #evaluate term 3: evaluate sampled z's under prior
            log_pz = np.sum(norm.logpdf(z_samples, 0.0, 1.0), axis=-2)
            
            #compute the elbo
            elbo = np.mean(log_likelihood - log_qz_given_y + log_pz)
            
            #return the negative elbo to be minimized
            return -elbo
            
            
        return variational_objective, grad(variational_objective)
```

## Defining `.log_likelihood`

The `.log_likelihood` function computes the log-likelihood of the observed data given a learned generative model $f_w$. Recall that the log-likelihood can be written as:
\begin{align}
\ell_v(y) &= \frac{1}{N}\sum_{n=1}^N \log \left[\mathbb{E}_{z_n \sim p(z)} \mathcal{N}(y_n; f_w(z_n), \sigma_y^2I) \right]\\
&\approx \frac{1}{N}\sum_{n=1}^N \log \left[\frac{1}{S}\sum_{s=1}^S \mathcal{N}(y_n; f_w(z^s_n), \sigma_y^2I) \right], \quad z^s_n \sim p(z)
\end{align}

Since the Gaussian pdf will tend to be small, in higher dimensions, the log-likelihood will be numerically unstable to compute. We instead write the log-likelihood in an equivalent but more computationally stable way:

\begin{align}
\ell_v(y) &\approx \sum_{n=1}^N \log \frac{1}{S}\sum_{s=1}^S \mathcal{N}(y_n; f_w(z^s_n), \sigma_y^2I) , \quad z^s_n \sim p(z)\\
&= \sum_{n=1}^N \log\frac{1}{S} + \sum_{n=1}^N \underbrace{\log \sum_{s=1}^S \mathrm{exp}}_{\texttt{log-sum-exp function}}\left\{ \log \mathcal{N}(y_n; f_w(z^s_n), \sigma_y^2I)\right\}, \quad z^s_n \sim p(z).
\end{align}
where the $\log \sum_{s=1}^S \mathrm{exp}$ portion of the expression is computed using `scipy`'s `logsumexp` function. 

```python
def log_likelihood(self, y, S=100):
        # Define the covariance matrix for the Gaussian likelihood
        cov = y_var * np.ones((1, y.shape[2], 1))
        # We reshape the observations into a 3D array
        y_tile = y.reshape((1, y.shape[2], N)).T
        # We sample from the prior
        z_samples = np.random.normal(0, 1, size=(N, self.z_dim, S))
        # We feed the samples through the generative model 
        y_synthetic = self.generative.forward(self.generative.weights, z_samples)

        # Compute the constant term of the log Gaussian pdf
        const = -0.5 * (self.y_dim * np.log(2 * np.pi) + np.sum(np.log(cov)))
        # Compute the exponential term of the log Gaussian pdf
        exponential = np.sum(-0.5 * ((y_synthetic - y_tile)**2 / cov), axis=1)
        # Add the constant and the exponential terms
        llkhd = const + exponential
        # Compute the log-likelihood using the logsumexp trick for numeric stability
        lkhd = -np.log(S) + logsumexp(llkhd, axis=1)
```