```{contents}
```

## Energy-Based Models (EBMs)

An **Energy-Based Model (EBM)** is a **probabilistic deep learning framework** that learns to **assign low energy (high probability)** to desirable or realistic configurations (e.g., real images, valid sentences)
and **high energy (low probability)** to undesirable or unlikely configurations.

---

### üîπ Intuitive Analogy

Think of an EBM as a **landscape of hills and valleys**:

* Each point represents a data configuration (an image, sentence, etc.).
* The **energy function** assigns a height to each configuration.
* **Realistic data ‚Üí valleys (low energy)**
* **Unrealistic data ‚Üí hills (high energy)**

The model learns to **shape this energy landscape** so real samples lie in valleys, and fake ones lie on peaks.

---

### Mathematical Foundation

Let $x$ be a data point (e.g., image) and $E_\theta(x)$ be an **energy function** parameterized by neural network parameters $\theta$.

The model defines a probability distribution as:

$$
p_\theta(x) = \frac{\exp(-E_\theta(x))}{Z_\theta}
$$

where $Z_\theta = \int \exp(-E_\theta(x)) dx$ is the **partition function**
that normalizes the probabilities (like in physics or Boltzmann statistics).

---

### üîπ Interpretation

* **Low energy ‚áí high probability**
* **High energy ‚áí low probability**

So, learning an EBM means **minimizing energy for real data** and **increasing energy for fake data**.

---

## **3. Training Objective**

Training tries to minimize the **negative log-likelihood (NLL)** of the data:

$$
\mathcal{L}(\theta) = -\mathbb{E}*{x \sim p*{data}} [\log p_\theta(x)]
$$

Expanding:

$$
\mathcal{L}(\theta) = \mathbb{E}*{x \sim p*{data}} [E_\theta(x)] + \log Z_\theta
$$

Computing $Z_\theta$ exactly is intractable for high-dimensional data,
so **approximation methods** are used.

---

## **4. Gradient of the Loss**

The gradient of the log-likelihood w.r.t. parameters is:

$$
\nabla_\theta \log p_\theta(x) = -\nabla_\theta E_\theta(x) + \mathbb{E}*{x' \sim p*\theta} [\nabla_\theta E_\theta(x')]
$$

This has **two competing terms**:

| **Term**                                                       | **Effect**                                                           |
| -------------------------------------------------------------- | -------------------------------------------------------------------- |
| $-\nabla_\theta E_\theta(x)$                                 | Lowers energy for real data (pulls real data to valleys)             |
| $\mathbb{E}*{x' \sim p*\theta} [\nabla_\theta E_\theta(x')]$ | Raises energy for fake samples (pushes unrealistic samples to hills) |

Thus, the model **learns to separate real and fake data** by shaping the energy landscape.

---

### Sampling from the Model

To generate data from an EBM, we must **sample** from $p_\theta(x)$, which is challenging.

A common method is **Markov Chain Monte Carlo (MCMC)**, especially **Langevin Dynamics**:

$$
x_{t+1} = x_t - \frac{\alpha}{2} \nabla_x E_\theta(x_t) + \sqrt{\alpha} , \eta_t
$$

where $\eta_t \sim \mathcal{N}(0, I)$

This is like a **noisy gradient descent** over the energy surface ‚Äî samples move downhill into energy valleys.

---

### Training Algorithms

| **Method**                                              | **Description**                                                                   |
| ------------------------------------------------------- | --------------------------------------------------------------------------------- |
| **Contrastive Divergence (CD)**                         | Approximate the expectation using a few Gibbs sampling steps (used in RBMs).      |
| **Persistent Contrastive Divergence (PCD)**             | Maintain a persistent chain of samples to stabilize training.                     |
| **Score Matching / Noise-Contrastive Estimation (NCE)** | Avoid computing partition function by comparing energy on real vs. noise samples. |
| **Langevin Dynamics (Modern EBMs)**                     | Iteratively refine samples using gradients of the energy function.                |

---

### Relationship with Other Models

| **Model**                              | **Connection to EBM**                                                                  |
| -------------------------------------- | -------------------------------------------------------------------------------------- |
| **Boltzmann Machine**                  | Original EBM with binary neurons and stochastic sampling.                              |
| **Restricted Boltzmann Machine (RBM)** | Simplified EBM (visible and hidden units with no intra-layer connections).             |
| **GANs**                               | Generator creates low-energy samples, discriminator implicitly learns energy function. |
| **VAEs**                               | Use explicit latent variable models; EBMs are implicit.                                |
| **Diffusion Models**                   | Can be seen as learning an energy landscape over noise trajectories.                   |

---

### Example: Intuition with Image Modeling

* Suppose $E_\theta(x)$ is a CNN that outputs a scalar energy for image $x$.
* During training:

  1. Real images ‚Üí lower energy (valleys)
  2. Random/noisy images ‚Üí higher energy (hills)
* At inference:

  * Start from random noise and use **Langevin dynamics** to descend into a low-energy region ‚Äî this yields a realistic-looking image.

---

### Applications of EBMs

| **Domain**                   | **Use Case**                                              |
| ---------------------------- | --------------------------------------------------------- |
| **Image Generation**         | Generate realistic samples (e.g., CIFAR-10, CelebA).      |
| **Anomaly Detection**        | Abnormal data has high energy (low likelihood).           |
| **Reinforcement Learning**   | Model energy as potential landscape for optimal policies. |
| **Representation Learning**  | Learn useful feature embeddings without supervision.      |
| **Denoising and Inpainting** | Model ‚Äúnatural‚Äù energy surfaces of clean data.            |

---

###  Advantages and Limitations

| **Aspect**                | **Advantage**                       | **Limitation**                   |
| ------------------------- | ----------------------------------- | -------------------------------- |
| **Likelihood estimation** | Exact formula (up to normalization) | Partition function intractable   |
| **Training stability**    | No adversarial training needed      | MCMC sampling is slow            |
| **Flexibility**           | Works for any data type             | Hard to scale to high-res data   |
| **Interpretability**      | Energy landscape is meaningful      | Gradient estimation can be noisy |

---

### Modern Implementations

Modern deep EBMs use **neural networks** (usually CNNs or ResNets) to parameterize $E_\theta(x)$.

Examples:

* **Deep Energy Models (LeCun et al.)**
* **Score-Based Models** (e.g., NCSN, diffusion-based reformulations)
* **EBMs trained with Langevin sampling** (Du & Mordatch, 2020)

---

**Summary**

| **Concept**            | **Description**                                                |
| ---------------------- | -------------------------------------------------------------- |
| **Goal**               | Learn energy surface over data space                           |
| **Energy Function**    | $E_\theta(x)$: neural network assigns ‚Äúenergy‚Äù to each input |
| **Training Objective** | Push down energy of real data, raise for fake data             |
| **Sampling**           | MCMC or Langevin dynamics                                      |
| **Key Equation**       | $p_\theta(x) = \frac{e^{-E_\theta(x)}}{Z_\theta}$            |
| **Applications**       | Generation, anomaly detection, unsupervised learning           |

---

**In short:**

An **Energy-Based Model** learns a function $E_\theta(x)$ that represents how ‚Äúcompatible‚Äù or ‚Äúrealistic‚Äù a configuration is.
It bridges the gap between **explicit probabilistic modeling (like VAEs)** and **implicit learning (like GANs)** by shaping an energy landscape that defines the data distribution.