# Bregman Divergence / Divergence de Bregman

## Formal Definition (Source: *Bregman, L.M., 1967, "The relaxation method...*)

Let $\phi: S \subseteq \mathbb{R}^d \to \mathbb{R}$ be a **strictly convex** and **differentiable** function.  
The Bregman divergence $D_\phi$ between points $\mathbf{p}$ and $\mathbf{q}$ is:

$$
D_\phi(\mathbf{p}, \mathbf{q}) = \phi(\mathbf{p}) - \phi(\mathbf{q}) - \langle \nabla \phi(\mathbf{q}), \mathbf{p} - \mathbf{q} \rangle
$$

where:
- $\nabla \phi(\mathbf{q})$ is the gradient of $\phi$ at $\mathbf{q}$
- $\langle \cdot, \cdot \rangle$ denotes the inner product

---

## Key Properties
1. **Non-negative**: $D_\phi(\mathbf{p}, \mathbf{q}) \geq 0$ with equality iff $\mathbf{p} = \mathbf{q}$
2. **Not symmetric**: Generally $D_\phi(\mathbf{p}, \mathbf{q}) \neq D_\phi(\mathbf{q}, \mathbf{p})$
3. **Convex** in $\mathbf{p}$ but not necessarily in $\mathbf{q}$

---

## Common Examples

### 1. Squared Euclidean Distance
When $\phi(\mathbf{x}) = \|\mathbf{x}\|^2$:
$$
D_\phi(\mathbf{p}, \mathbf{q}) = \|\mathbf{p} - \mathbf{q}\|^2
$$

### 2. Kullback-Leibler Divergence
For probability distributions with $\phi(p) = \sum_i p_i \log p_i$:
$$
D_\phi(\mathbf{p}, \mathbf{q}) = \sum_i \left( p_i \log \frac{p_i}{q_i} - p_i + q_i \right)
$$

---

## Calculation Example: Mahalanobis Distance

**Given**:
- $\phi(\mathbf{x}) = \frac{1}{2}\mathbf{x}^T \mathbf{A} \mathbf{x}$ (with $\mathbf{A}$ positive definite)
- $\mathbf{p} = [1, 2]^T$, $\mathbf{q} = [3, 1]^T$
- $\mathbf{A} = \begin{bmatrix}2 & 0\\0 & 1\end{bmatrix}$

**Step 1**: Compute gradient  
$\nabla \phi(\mathbf{q}) = \mathbf{A}\mathbf{q} = \begin{bmatrix}6\\1\end{bmatrix}$

**Step 2**: Evaluate $\phi$ terms  
$\phi(\mathbf{p}) = \frac{1}{2}[1, 2]\begin{bmatrix}2 & 0\\0 & 1\end{bmatrix}\begin{bmatrix}1\\2\end{bmatrix} = 3$  
$\phi(\mathbf{q}) = \frac{1}{2}[3, 1]\begin{bmatrix}2 & 0\\0 & 1\end{bmatrix}\begin{bmatrix}3\\1\end{bmatrix} = 9.5$

**Step 3**: Compute inner product  
$\langle \nabla \phi(\mathbf{q}), \mathbf{p}-\mathbf{q} \rangle = [6,1] \cdot [-2,1] = -11$

**Final Result**:
$$
D_\phi(\mathbf{p}, \mathbf{q}) = 3 - 9.5 - (-11) = 4.5
$$

---

## References
1. Bregman, L.M. (1967). "The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming". *USSR Computational Mathematics and Mathematical Physics*.
2. Banerjee et al. (2005). "Clustering with Bregman Divergences". *JMLR*.