Skip to content

Commit 2ce02a5

Browse files
committed
pass vae
1 parent 3eb6cc9 commit 2ce02a5

File tree

1 file changed

+23
-32
lines changed

1 file changed

+23
-32
lines changed

src/content/lessons/vae.mdx

Lines changed: 23 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -7,52 +7,44 @@ heroImage: '../../assets/blog-placeholder-3.jpg'
77

88
import T from '../../components/TypstMath.astro'
99

10-
It this lesson, we will derive the core principles behind Variational Autoencoders (VAEs) and understand how they work.
10+
It this Lecture, we will derive the core principles behind Variational Autoencoders (VAEs).
11+
The goal is to learn a generative model through: i) a low-dimensional latent space, ii) a decoder that maps from the latent space to the data space.
1112

12-
### Generative Models by Autoencoding
13+
{/* To achieve this, the an autoencoder will also learn a corresponding encoder that maps from the data space to the latent space. */}
1314

14-
The goal is to learn a generative model in the form of:
15+
### Autoencoders
1516

16-
- a low-dimensional latent space
17-
- a decoder that maps from the latent space to the data space
18-
19-
To achieve this, the an autoencoder will also learn a corresponding encoder that maps from the data space to the latent space.
20-
21-
### Plain Autoencoders
22-
23-
Plain autoencoders jointly learn an encoder $e_\phi(x)$ and a decoder $d_\theta(z)$ by minimizing the reconstruction error:
17+
**Idea** The core idea of vanilla autoencoders id to *jointly* learn an *encoder* $e_\varphi$, and a *decoder* $d_\theta$. The encoder maps a data point $x \in \mathbb{R}^d$ to a low-dimensional *latent representation* $e_\varphi(x)$, and the decoder maps back this latent representation $d_\theta(e_\varphi(x))$ to the data space $\mathbb{R}^d$.
18+
The parameters of the encoder and the decoder are learned by minimizing the error between a data point $x$ and its reconstruction $d_\theta(e_\varphi(x))$:
2419

2520
$$
26-
\mathcal{L}(\phi, \theta) = \sum_{x \in \text{data}} \| x - d_\theta(e_\phi(x)) \|^2
21+
\mathcal{L}(\phi, \theta) = \sum_{x \in \text{data}} \| \underbrace{x}_{\mathrm{sample}} - \underbrace{d_\theta(e_\varphi(x))}_{ \substack{ \mathrm{sample} \\ \mathrm{reconstruction} } } \|^2 \enspace.
2722
$$
2823

29-
(or in typst) <T block v='cal(L)(phi, theta) = sum_(x in "data") || x - d_(theta)(e_(phi)(x)) ||^2' />
24+
{/* (or in typst) <T block v='cal(L)(phi, theta) = sum_(x in "data") || x - d_(theta)(e_(phi)(x)) ||^2' /> */}
3025

31-
### A Probabilistic Interpretation of Autoencoders
26+
#### A Probabilistic Interpretation of Autoencoders
3227

3328
As in most cases where a squared error is minimized, we can interpret the decoder as a Gaussian likelihood model:
3429

3530
$$
36-
p_\theta(x|z) = \mathcal{N}(x; d_\theta(z), I)
31+
p(x|z, \theta) = \underbrace{\mathcal{N}(x; d_\theta(z), I)}_{\substack{\text{density of a Gaussian variable with mean } d_\theta(z) \\ \text{ and identity covariance, evaluated at point } x } } \enspace.
3732
$$
3833

39-
(or in typst) <T block v='p_(theta)(x|z) = N(x; d_(theta)(z), I)' />
34+
{/* (or in typst) <T block v='p_(theta)(x|z) = cal(N)(x; d_(theta)(z), I)' /> */}
4035

41-
With words, we assume that the decoder predicts the mean of a Gaussian distribution (of fixed identity covariance).
42-
43-
44-
We have the equivalence between minimizing the reconstruction error and maximizing the log-likelihood:
36+
In other words, the decoder predicts the mean of a Gaussian distribution, which is equivalent to maximize the log-likelihood:
4537

4638
$$
47-
\arg\min_{\phi, \theta} \sum_{x \in \text{data}} \| x - d_\theta(e_\phi(x)) \|^2
39+
\arg\min_{\phi, \theta} \sum_{x \in \text{data}} \| x - d_\theta(e_\varphi(x)) \|^2
4840
=
49-
\arg\max_{\phi, \theta} \sum_{x \in \text{data}} \log p_\theta(x|z = e_\phi(x))
41+
\arg\max_{\phi, \theta} \sum_{x \in \text{data}} \log p_\theta(x|z = e_\varphi(x))
5042
$$
5143

5244
(or in typst) <T block v='arg min_(phi, theta) sum_(x in "data") || x - d_(theta)(e_(phi)(x)) ||^2 = arg max_(phi, theta) sum_(x in "data") log p_(theta)(x|z = e_(phi)(x))' />
5345

5446
<details>
55-
<summary>Derivations on the equivalence</summary>
47+
<summary>"Proof" of the equivalence</summary>
5648
<div>
5749
We start with the log-likelihood:
5850
$$
@@ -82,14 +74,14 @@ $$
8274
Thus, maximizing the log-likelihood is equivalent to minimizing the squared error, i.e.:
8375

8476
$$
85-
\arg\min_{\phi, \theta} \sum_{x \in \text{data}} \| x - d_\theta(e_\phi(x)) \|^2
77+
\arg\min_{\phi, \theta} \sum_{x \in \text{data}} \| x - d_\theta(e_\varphi(x)) \|^2
8678
=
87-
\arg\max_{\phi, \theta} \sum_{x \in \text{data}} \log p_\theta(x|z = e_\phi(x))
79+
\arg\max_{\phi, \theta} \sum_{x \in \text{data}} \log p_\theta(x|z = e_\varphi(x))
8880
$$
8981
</div>
9082
</details>
9183

92-
### Overview of Variational Autoencoders (VAEs)
84+
#### Overview of Variational Autoencoders (VAEs)
9385

9486
In Variational Autoencoders (VAEs), the position $z_i$ in the latent space (for a data point $x_i$) is supposed to be a random variable.
9587
Indeed, there is technically some uncertainty on the exact position of $z_i$ that best explains $x_i$, especially given that we consider all points jointly.
@@ -98,7 +90,7 @@ In a VAE, on thus manipulates, for each point $x_i$, a distribution on its laten
9890
We will now decompose the construction of the VAE, starting with formulations that have only the decoder.
9991
The encoder will be introduced later as a trick (i.e., amortization).
10092

101-
### MAP Estimation of the Latent Variables
93+
#### MAP Estimation of the Latent Variables
10294

10395
We are interesting in estimating both the decoder parameters $\theta$ and the latent variables $z_i$ for each data point $x_i$.
10496

@@ -138,11 +130,10 @@ $$
138130
//However, this would lead to overfitting, as we could always increase the likelihood by increasing the capacity of the decoder and setting $z_i$ to arbitrary values.
139131

140132

141-
### Variational Inference
142-
143-
### Reparameterization Trick
133+
#### Variational Inference
144134

145-
### Doubly Stochastic Variational Inference
135+
#### Reparameterization Trick
146136

147-
### Prior and latent space misconception
137+
#### Doubly Stochastic Variational Inference
148138

139+
#### Prior and latent space misconception

0 commit comments

Comments
 (0)