# Variational Autoencoders

During this lab we will implement a Variational Autoencoder. 

You already had a chance to experiment with two packages which are well suited for the implementation, namely `torch.distributions` and `pyro`. You are free to choose any framework that you are the most comfortable with.

Instead of starting with a short description of VAE, please refer to lecture materials and additional reading on GitHub.


## Requirements

Your VAE should meet the following requirements:

- modularity: encoders and decoders should be attributes of a VAE model, meaning that they can be readily replacible for any encoder, decoder.

- probabilistic formulation: loss of the VAE should follow a probabilistic interpretation which includes the log prob of the decoder, not the MSE loss. 

- easy sampling: VAE should have a method that samples from the model.

- regularization coefficient: your implementation should take a float beta as an argument and use it to weight the KL term in the loss, as in $\beta$-VAE. 

- independence from data: your implementation should not depend on data and the specific task

- device agnostic: you should be able to train your model on a CPU and a GPU


## Hints

If the task of implementing a VAE seems to complex at first, we break it down into small managable hints. Additionally, the easieast way to implement it (which is a matter of personal taste) is in torch.distribution. 

### Hint 1 

Implement a Gaussian Encoder. 

Implement a class EncoderGaussian that has a nerual network as an attribute. It should take a data point $x$ as input and output a vector $w \in \mathbb{R}^{2 \times D}$, where $D$ is the dimensionality of the latent space and $w$ parametrizes a multivariate normal distribution. The Encoder should have a log_prob method.

### Hint 2 

Once you have a Gaussian Encoder, find a way to sample from the multivariate normal in a way that allows gradient propagation. The reparametrization trick or the `rsample()` method may be of interest to you.  

https://pytorch.org/docs/stable/distributions.html#pathwise-derivative

### Hint 3 
Implement a Gaussian Decoder. Implement a class DecoderGaussian that takes a neural network as an attribute takes a vector $z \in \mathbb{R}^D$ as an input and the output parametrizes a distribution from which a data point $x$ can be sampled. Should have a log_prob method.

### Hint 4

Implement a VAE with encoder, decoder and prior as attributes and with a sample method. It should be easy to replace encoders and decoders easily and apply it to different data sets. 


Note that the encoder parameters are the variational parameters in VAE (recall guide in Pyro). 

Once you have a VAE implementation ready, use it to train on MNIST data set. You can not worry about architectures and use any architecture for the decoder and encoder parametrizations. Use your VAE to complete the following tasks.



## Tasks

### Task 1

Train your VAE on MNIST. Include a learning curve for the train and test sets, where on the $x$-axis you have consecutive epochs and on the $y$-axis the $-ELBO$. Pay attention to the aesthetics of the plot. Does your model converge? 

### Optional Extension

Add a scatter plot to your learning curve to show the average $-ELBO$ per epoch together with contributions from each of the data points.
