## 5. The Downside of VI and ELBO

Let’s break this down carefully because there are actually two different layers of “parametric” assumptions in Bayesian modeling:

Parametric Model Assumption (the Bayesian model):

We choose a prior 
𝑝
(
𝑧
)
p(z) (e.g., a Gaussian) and a likelihood 
𝑝
(
𝑥
∣
𝑧
)
p(x∣z) (e.g., another parametric family).
This step is common to both MCMC and VI, because it defines the joint 
𝑝
(
𝑥
,
𝑧
)
p(x,z).
In other words, we typically do have a “parametric” or at least a specified functional form for 
𝑝
(
𝑧
)
p(z) and 
𝑝
(
𝑥
∣
𝑧
)
p(x∣z). This is what we call the generative model.
Inference Method Assumption (approximating the posterior 
𝑝
(
𝑧
∣
𝑥
)
p(z∣x)):

MCMC: Does not require choosing a parametric family for the posterior. Instead, it directly samples from 
𝑝
(
𝑧
∣
𝑥
)
p(z∣x) (which is fully determined by the model 
𝑝
(
𝑥
,
𝑧
)
p(x,z)) by building a Markov chain that converges (in principle) to the exact posterior.
So MCMC does not parametrically approximate 
𝑝
(
𝑧
∣
𝑥
)
p(z∣x). It just draws (potentially infinitely many) samples from the true posterior, assuming you run it long enough and the chain mixes well.
Variational Inference (VI): Does require choosing a parametric family 
𝑞
𝜙
(
𝑧
)
q 
ϕ
​
 (z) to approximate 
𝑝
(
𝑧
∣
𝑥
)
p(z∣x). For example, a factorized Gaussian, a mixture of Gaussians, or a normalizing flow with parameters 
𝜙
ϕ. You then optimize 
𝜙
ϕ (via the ELBO) so that 
𝑞
𝜙
(
𝑧
)
q 
ϕ
​
 (z) is as close as possible to the true posterior in some divergence sense.
Therefore:

Both MCMC and VI share the same “parametric model”: they assume you have already picked a prior 
𝑝
(
𝑧
)
p(z) and likelihood 
𝑝
(
𝑥
∣
𝑧
)
p(x∣z). That is a modeling choice.
The difference is that MCMC attempts to sample from the true posterior without imposing a further parametric form on 
𝑝
(
𝑧
∣
𝑥
)
p(z∣x), while VI uses a separate parametric family 
𝑞
𝜙
(
𝑧
)
q 
ϕ
​
 (z) to approximate 
𝑝
(
𝑧
∣
𝑥
)
p(z∣x).
Hence:

MCMC:
Parametric model for prior/likelihood (the usual Bayesian model).
But posterior is not forced into a parametric form; you approximate it by sampling from the exact posterior.
VI:
Same parametric model for prior/likelihood.
Additionally picks a parametric family 
𝑞
𝜙
(
𝑧
)
q 
ϕ
​
 (z) for the posterior approximation. This can cause “variational bias” if 
𝑞
𝜙
(
𝑧
)
q 
ϕ
​
 (z) cannot capture all complexities of 
𝑝
(
𝑧
∣
𝑥
)
p(z∣x).
That is the essential distinction.