# $\beta$-VAE and the Relationship Between $\beta$ and $\sigma$

## $\beta$-VAE

Beta-VAE is a variation of the standard Variational Autoencoder (VAE) that introduces a regularization coefficient $\beta$ to the Kullback-Leibler divergence term in the VAE loss function.

In a standard VAE, the loss function takes the form:

$$L = Reconstruction Error + DKL(q(z|x) || p(z))$$


In $\beta$-VAE, the loss function is modified as:

$$L = Reconstruction Error + \beta * DKL(q(z|x) || p(z))$$


where $\beta$ is a positive coefficient that controls the strength of regularization.

## Key Features of Beta-VAE

1. When $\beta > 1$, the model places greater emphasis on regularization, which promotes learning more disentangled and interpretable latent representations.

2. $\beta$-VAE was designed to address the problem of learning disentangled representations — where individual dimensions of the latent space correspond to separate factors of variation in the data.

3. Higher values of β can lead to better disentanglement of factors, but at the cost of reduced reconstruction quality.

## Relationship Between Beta and Sigma

There is a specific relationship between the $\beta$ parameter and the variance $\sigma^{2}$ in the latent space:

1. In standard VAE, we typically assume that the prior distribution of the latent space $p(z)$ is a standard normal distribution $N(0, I)$, where I is the identity matrix. When we increase the value of β in Beta-VAE, we effectively increase the weight of the KL-divergence regularization term.

2. As $\beta$ increases, the model tries to make $q(z|x)$ (the encoder) closer to the prior distribution $p(z)$.

3. This means that the variance $\sigma^{2}$ of the distribution $q(z|x)$ becomes closer to one, and the mean value $\mu$ tends toward zero.

4. When $\beta$ is very high, the encoder will strongly tend to output mean values close to zero and variances close to one for each dimension of the latent space.

5. In some implementations of $\beta$-VAE, instead of directly changing $\beta$, researchers sometimes adjust the variance $\sigma^{2}$ of the prior distribution $p(z) = N(0, \sigma^{2}I)$. Mathematically, changing the variance of the prior distribution has an effect similar to changing β (though not identical).

Thus, a high $\beta$ value can be informally viewed as a way to "compress" the variance of the distribution $q(z|x)$, making it more constrained and closer to the standard normal distribution.