## Recap from class 3

We are interested in a posterior 
$$
p(\theta|\mathcal{D}) = \frac{p(\mathcal{D}|\theta) p(\theta)}{p(\mathcal{D})}
$$
which may be intractable

In that case we do approximate inference either through sampling (MCMC) or optimization (VI). 

In the latter we select a (simple) approximate posterior $q_\nu(\theta)$ and we optimize the parameters $\nu$ by maximizing the evidence lower bound (ELBO)

$$
\begin{align}
\log p(\mathcal{D}) \geq  \mathcal{L}(\nu) &= - \int q_\nu(\theta) \log \frac{q_\nu(\theta)}{p(\mathcal{D}|\theta) p (\theta)} d\theta  \\
&= \mathbb{E}_{\theta \sim q_\nu(\theta)} \left[\log p(\mathcal{D}|\theta)\right]- D_{KL}[q_\nu(\theta) || p(\theta)]  \nonumber 
\end{align}
$$

which makes $q_\nu(\theta)$ close to $p(\theta|\mathcal{D})$

What can we do to improve this?

1. Using more flexible approximate posteriors
1. Making the bound tigher

## More flexible approximate posteriors for VI

One way to obtain a more flexible posterior that is still tractable is to start with a simple distribution and apply a sequence of invertible transformations

This is the key idea behind [normalizing flows](https://arxiv.org/abs/1505.05770)

Let's say that $z\sim q(z)$ where $q$ is simple, *e.g.* standard gaussian

and that there is a smooth and invertible transformation $f$ such that $f^{-1}(f(z)) = z$

Then $z' = f(z)$ is a random variable too but its distribution is

$$
q_{z'}(z') = q(z) \left| \frac{\partial f^{-1}}{\partial z'} \right| = q(z) \left| \frac{\partial f}{\partial z} \right|^{-1}
$$

which is the original distribution times the inverse of jacobian of the transformation

And we can apply a chain of transformations $f_1, f_2, \ldots, f_K$ obtaining

$$
q_K(z_K) = q_0(z_0) \prod_{k=1}^K \left| \frac{\partial f_k}{\partial z_{k-1}} \right|^{-1}
$$

With this we can go from a simple Gaussian to more expressive/complex/multi-modal distributions 

Nowadays several types of flows exist in the literature, *e.g.* planar, radial, autoregresive

[Normalizing flows have been used to make the approximate posterior in VAE more expressive](https://arxiv.org/abs/1809.05861)

Three excellent blog posts covering normalizing flows:
- https://blog.evjang.com/2018/01/nf1.html
- http://akosiorek.github.io/ml/2018/04/03/norm_flows.html
- https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html

[Normalizing flows in Pyro](https://bmazoure.github.io/posts/nf-in-pyro/)

## Tigher bounds for VI

- [Auxiliary Deep Generative Models](https://arxiv.org/abs/1602.05473)
- [Importance Weighted Autoencoders](https://arxiv.org/abs/1509.00519)
- [Debiasing Evidence Approximations: On Importance-weighted Autoencoders and Jackknife Variational Inference](https://openreview.net/forum?id=HyZoi-WRb)
- [Tighter Variational Bounds are Not Necessarily Better](https://arxiv.org/abs/1802.04537)
- http://artem.sobolev.name/posts/2019-04-26-neural-samplers-and-hierarchical-variational-inference.html

## Other perspectives

- [Stein Variational gradient descent](https://arxiv.org/abs/1608.04471)
- [Approximate MCMC](https://arxiv.org/abs/1908.03491)
- [Adversarially learned inference (ALI)]()

https://papers.nips.cc/paper/8681-practical-deep-learning-with-bayesian-principles.pdf

A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference

https://arxiv.org/pdf/1901.02731.pdf

GECO: https://arxiv.org/pdf/1810.00597.pdf

- Blundell weight uncertainty (bayes by backprop): https://arxiv.org/pdf/1505.05424.pdf, https://github.com/ThirstyScholar/bayes-by-backprop, https://gluon.mxnet.io/chapter18_variational-methods-and-uncertainty/bayes-by-backprop.html, http://krasserm.github.io/2019/03/14/bayesian-neural-networks/

- Graves, practical VI for NN: https://papers.nips.cc/paper/4329-practical-variational-inference-for-neural-networks.pdf
- https://csc2541-f17.github.io/slides/lec04.pdf


FLIPOUT: https://arxiv.org/abs/1803.04386

- advances in VI: https://arxiv.org/pdf/1711.05597.pdf
- Variational dropout and local reparameterization trick: http://papers.nips.cc/paper/5666-variational-dropout-and-the-local-reparameterization-trick, https://alsibahi.xyz/snippets/2019/06/15/pyro_mnist_bnn_kl.html, 
- Uncertainty Estimations by Softplus normalization inBayesian Convolutional Neural Networks withVariational Inference: https://arxiv.org/pdf/1806.05978.pdf

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning 

https://arxiv.org/abs/1506.02142

Kalman VAE

https://arxiv.org/pdf/1710.057416.pdf

Bayesian optimization

http://pyro.ai/examples/bo.html

Frameworks
- [Automatic Differentiation Variational Inference (ADVI)](https://arxiv.org/abs/1603.00788)
- [Operator Variational Inference (OPVI)](https://papers.nips.cc/paper/6091-operator-variational-inference.pd)

DODEEPGENERATIVEMODELSKNOWWHATTHEYDON’TKNOW? 
https://arxiv.org/pdf/1810.09136.pdf