# **Beta-VAE Loss Derivation as a Lagrangian under KKT Conditions**

## **Objective**

In the beta-VAE, we modify the original VAE loss function to encourage disentangled representations by scaling the KL divergence term with a factor, beta.

Given:

`L_beta-VAE = E_q(z|x)[ log p(x|z) ] - beta * D_KL( q(z|x) || p(z) )`

we aim to derive this as a Lagrangian under KKT (Karush-Kuhn-Tucker) conditions.

## **Step 1: Constrained Optimization Setup**

Set up the objective as a constrained optimization problem, where we maximize the reconstruction likelihood while constraining the KL divergence:

`maximize_q(z|x) E_q(z|x)[ log p(x|z) ]`  
**subject to**  
`D_KL( q(z|x) || p(z) ) <= delta`

where `delta` is a predefined threshold for the KL divergence.

## **Step 2: Formulating the Lagrangian**

Convert the constrained problem to an unconstrained problem by introducing a Lagrange multiplier, beta:

`L = E_q(z|x)[ log p(x|z) ] - beta * ( D_KL( q(z|x) || p(z) ) - delta )`

Expanding the terms, we get:

`L = E_q(z|x)[ log p(x|z) ] - beta * D_KL( q(z|x) || p(z) ) + beta * delta`

## **Step 3: Applying KKT Conditions**

To ensure optimality, we apply the KKT conditions:

1. **Primal Feasibility:** `D_KL( q(z|x) || p(z) ) <= delta`
2. **Dual Feasibility:** `beta >= 0`
3. **Complementary Slackness:** `beta * ( D_KL( q(z|x) || p(z) ) - delta ) = 0`

### **Interpretation**

- If `D_KL( q(z|x) || p(z) ) = delta`: `beta` is non-zero, keeping the KL term active.
- If `D_KL( q(z|x) || p(z) ) < delta`: `beta = 0`, which reduces the loss to the standard VAE form without scaling.

## **Conclusion: Final Loss Function**

The final loss function for the beta-VAE, incorporating the KKT-derived Lagrange multiplier beta, is:

`L_beta-VAE = E_q(z|x)[ log p(x|z) ] - beta * D_KL( q(z|x) || p(z) )`

This form allows tuning of the KL divergence to control the trade-off between reconstruction accuracy and latent disentanglement by varying beta.

