# "Variational Autoencoder: learning an underlying distribution and generating new data"
> "Constructing an autoencoder that learns the underlying distribution of the input data, generated from a multi-dimensional smooth function `f=f(x_1,x_2,x_3,x_4)`. This can be used to generate new data, sampling from the learned distribution"

- toc: true
- branch: master
- badges: true
- comments: true
- author: Lorenzo Posti
- categories: [neural network, autoencoder, variational autoencoder, basics, jupyter]


## Variational AutoEncoder (VAE): an algorithm to work with distributions

This notebook deals with generating an `Autoencoder` model to learn the underlying distribution of the data. To do this we have to modify the autoencoder such that the `encoder` does not learn a compressed representation of the input data, but rather it will learn the parameters of the distribution of the data in the latent (compressed) space. 

So the idea is to start from an observed sample of the distribution of the data $P({\bf X})$ and to pass this to the `encoder` which will reduce its dimensionality, i.e. $P({\bf X})\mapsto P({\bf X}_{\rm c})$ where ${\bf X}\in\mathrm{R}^m$ and ${\bf X}_{\rm c}\in\mathrm{R}^n$ with $n<m$. In other words, in a VAE the `encoder` step does not represent the input data ${\bf X}$ with a `code` ${\bf X}_{\rm c}$, but rather the initial data distribution $P({\bf X})$ with a compressed distribution $P({\bf X}_{\rm c})$, which we usually need to approximate in some analytic form, e.g. a multi-variate normal $P({\bf X}_{\rm c})\sim \mathcal{N}(\mu,\Sigma)$.

In [1]:
#collapse-hide
import numpy as np
import matplotlib.pylab as plt
from scipy.special import i0, i1, k0, k1
from torch import tensor
from torch import nn
from torch.nn import functional as F
import torch, math
import random

%config Completer.use_jedi = False
%matplotlib inline

rng = np.random.default_rng()

In [None]:
class VariationalAutoEncoder(nn.Module):
    def __init__(self, ninp, **kwargs):
        super().__init__()
        self.encodeLayer1 = nn.Linear(in_features=ninp, out_features=32)
        self.encodeLayer2 = nn.Linear(in_features=32,   out_features=16)
        self.encodeOut    = nn.Linear(in_features=16,   out_features=8)
        self.decodeLayer1 = nn.Linear(in_features=4,    out_features=16)
        self.decodeLayer2 = nn.Linear(in_features=16,   out_features=32)
        self.decodeOut    = nn.Linear(in_features=32,   out_features=ninp)
        self.ELBO_loss = None
        
    def encoder(self, x):       
        mean, logvar = torch.split(self.encodeOut(F.relu(self.encodeLayer2(F.relu(self.encodeLayer1(x))))),4,dim=1)
        return mean, logvar
    
    def decoder(self, encoded): return self.decodeOut(F.relu(self.decodeLayer2(F.relu(self.decodeLayer1(encoded)))))
    
    def reparametrize(self, mean, logvar):
        eps = tensor(rng.normal(size=mean.shape), dtype=torch.float)
        return eps * torch.exp(logvar * 0.5) + mean # exp(0.5logvar) = std
    
    # https://towardsdatascience.com/variational-autoencoder-demystified-with-pytorch-implementation-3a06bee395ed
    # https://arxiv.org/pdf/1312.6114.pdf
    # https://stats.stackexchange.com/questions/318748/deriving-the-kl-divergence-loss-for-vaes?noredirect=1&lq=1
    def _ELBO(self, x, decoded, mean, logvar):
        mseloss = nn.MSELoss(reduction='mean')
        logpx_z = -mseloss(x, decoded)
        KLdiv = -0.5 * torch.sum(1 + logvar - mean ** 2 - logvar.exp(), dim = 1)
        return (KLdiv - logpx_z).mean()
        
    def forward(self, x):
        mean, logvar = self.encoder(x)
        z = self.reparametrize(mean, logvar)
        decoded = self.decoder(z)
        self.ELBO_loss = self._ELBO(x, decoded, mean, logvar)
        
        return decoded
    
    def getELBO_loss(self, x):
        mean, logvar = self.encoder(x)
        z = self.reparametrize(mean, logvar)
        decoded = self.decoder(z)
        return self._ELBO(x, decoded, mean, logvar)
        