### Evidence lower bound (ELBO)

Evidence Lower Bound(ELBO) is a quantity that is used in variational inference to estimate the log marginal likelihood or evidence of a model given some observed data. 

The ELBO is expressed as follows:
$$
\begin{aligned}
\text{ELBO} \coloneqq E_{z \sim q_{\phi}} \left [ \text{log} \frac{p_{\theta}(x, z)}{q_{\phi} (z)} \right ]
\end{aligned}
$$

where $p_{\theta}(x, z)$ is the joint distribution of the data $x$ and latent variables $z$, and $q_{\phi} (z)$ is an approximate posterior distribution over $z$. The ELBO can be used to optimize the parameters of the approximate posterior distribution so that **it approximates the true posterior distribution as closely as possible.**

#### Why do we use ELBO?

We use the ELBO in variational inference as a proxy to the log marginal likelihood, which is often intractable to compute directly. The ELBO provides a lower bound on the log marginal likelihood, which allows us to perform optimization on a tractable objective function. By optimizing the ELBO, we are indirectly maximizing the log marginal likelihood. Maximizing the ELBO is equivalent to minimizing the KL divergence between the true posterior distribution and an approximate distribution, which allows us to find the optimal approximation of the true posterior.

#### Properties
1. **The evidence is always larger than or equal to the ELBO. We refer to this inequality as the ELBO inequality.**
   $$
   \begin{aligned}
   \text{log }p_{\theta}(x) &= \text{log} \int p_{\theta} (x|z) p(z) dz \\
   &= \text{log}\int p_{\theta}(x, z) dz \\
   &= \text{log}\int p_{\theta}(x, z) \frac{q_{\phi}(z)}{q_{\phi}(z)} dz \\
   &= \text{log}\int q_{\phi}(z) \frac{p_{\theta}(x, z)}{q_{\phi}(z)} dz \\
   &= \text{log} E_{z \sim q_{\phi}} \left [ \frac{p_{\theta}(x, z)}{q_{\phi}(z)} \right ] \\
   &\geq E_{z \sim q_{\phi}} \left [\text{log} \frac{p_{\theta}(x, z)}{q_{\phi}(z)} \right ] \quad \because \text{log}(x) \text{ is a concave function.} \\
   \\
   \therefore \text{evidence} &\geq \text{ELBO}
   \end{aligned}
   $$

2. **KL Divergence between $p_{\theta}(z|x)$ and $q_{\phi}(z)$ equals $\text{evidence} - \text{ELBO}$.**
   $$
   \begin{aligned}
   D_{\text{KL}}(q_{\phi}(z)||p_{\theta}(z|x)) &= \int q_{\phi}(z) \text{log} \frac{q_{\phi}(z)}{p_{\theta}(z|x)} dz \\
   &= E_{z \sim q_{\phi}} \left [\text{log}\frac{q_{\phi}(z)}{p_{\theta}(z|x)} \right ] \\
   &= E_{z \sim q_{\phi}}[\text{log }q_{\phi}(z)] - E_{z \sim q_{\phi}}[\text{log } p_{\theta}(z|x)] \\
   &= E_{z \sim q_{\phi}}[\text{log }q_{\phi}(z)] - E_{z \sim q_{\phi}} \left [ \text{log} \left ( p_{\theta}(z|x) \frac{p_{\theta}(x)}{p_{\theta}(x)} \right ) \right ] \\ 
   &= E_{z \sim q_{\phi}}[\text{log }q_{\phi}(z)] - E_{z \sim q_{\phi}} \left [ \text{log} \frac{p_{\theta}(z, x)}{p_{\theta}(x)} \right ] \\ 
   &= E_{z \sim q_{\phi}}[\text{log }q_{\phi}(z)] - E_{z \sim q_{\phi}} [ \text{log } p_{\theta}(z, x) ] + E_{z \sim q_{\phi}}[\text{log } p_{\theta}(x)] \\ 
   &= \text{log } p_{\theta}(x) - E_{z \sim q_{\phi}} \left [ \text{log} \frac{p_{\theta}(x, z)}{q_{\phi}(z)} \right ] \\
   &= \text{evidence} - \text{ELBO}
   \end{aligned}
   $$

#### Appendix: What is the log marginal likelihood or evidence?

The log marginal likelihood (also known as evidence) measures the probability of observing the given data, averaged over all possible values of the model parameters. It is defined as:

$$\log p(x) = \log \int p(x|\theta)p(\theta) d\theta$$

where $x$ is the observed data, $\theta$ are the model parameters, $p(x|\theta)$ is the likelihood function, and $p(\theta)$ is the prior distribution of the parameters.

The log marginal likelihood is often used as a model selection criterion, since it penalizes complex models that overfit the data by assigning low probability to them. In Bayesian inference, the log marginal likelihood is also used to compute the Bayes factor, which compares the evidence of two competing models.