# Random Fourier features

One central difficulty with Gaussian Processes (GPs) -- and more generally all kernel methods, such as Support Vector Machines (SVMs) -- is their computational cost.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

from IPython.display import HTML, set_matplotlib_formats
set_matplotlib_formats('pdf', 'svg')
css_style = open('../../../_static/custom_style.css', 'r').read()
HTML(f'<style>{css_style}</style>')

<div class="theorem">
    
**Theorem (Bochner's theorem)** A continuous kernnel $k(x, y) = k(x - y)$ is positive definite if and only if $k(\delta)$ is the Fourier transform of a non-negative measure.
    
</div>
<br>

Note that this statement slightly abuses the $k$ symbol, using it to denote both the kernel $k(x, y)$ as well as its writing in an explicitly translation-invariant form $k(x - y)$ -- the implied use is clear from context. Bochner's theorem implies that for any translation-invariant kernel $k$, there exists a corresponding non-negative measure such that $k$ is the Fourier transform of it.

$$\begin{align}
k(x - y) = \int p(\omega) e^{i \omega^\top (x - y)} d\omega = \mathbb{E}_{\omega}\left[\zeta_{\omega}(x)\zeta_{\omega}^*(y)\right].
\end{align}$$

We can therefore get an unbiased estimate of $k(x - y)$ by sampling $\omega \sim p(\omega)$ and computing $\zeta_{\omega}(x)\zeta_{\omega}^*(y)$. Note however, that even though $\mathbb{E}_{\omega}\left[\zeta_{\omega}(x)\zeta_{\omega}^*(y)\right]$ is real, the sampled $\zeta_{\omega}(x)\zeta_{\omega}^*(y)$ will in general be complex. This is an issue if we want to use $\zeta_{\omega}$ to represent real functions. Instead, we can write

$$\begin{align}
\mathbb{E}_{\omega}\left[\zeta_{\omega}(x)\zeta_{\omega}^*(y)\right] &= \text{Re}\left[\mathbb{E}_{\omega}\left[e^{i \omega^\top (x - y)}\right]\right] \\
                                                                     &= \mathbb{E}_{\omega}\left[\text{Re}\left[e^{i \omega^\top (x - y)}\right]\right] \\
                                                                     &= \mathbb{E}_{\omega}\left[\cos(\omega^\top (x - y))\right].
\end{align}$$

We would like to manipulate the expression above into an expectation of the form $\mathbb{E}_{\omega}\left[f(x)f(y)\right]$ rather than $\mathbb{E}_{\omega}\left[f(x - y)\right]$, which we can achieve via the following trick. Using the fact that

$$\begin{align}
\mathbb{E}_{\phi}\left[\cos(z + n\phi)\right] = 0,
\end{align}$$

for all $z \in \mathbb{R}, n \in \mathbb{N}^+$, where $\phi \sim \text{Uniform}[0, 2\pi]$, we can re-write the expectation as

$$\begin{align}
\mathbb{E}_{\omega}\left[\zeta_{\omega}(x)\zeta_{\omega}^*(y)\right] &= \mathbb{E}_{\omega, \phi}\left[\cos\left(\omega^\top (x - y)\right) + \cos\left(\omega^\top (x + y\right) + 2b)\right] \\
                                                                     &= \mathbb{E}_{\omega, \phi}\left[2 \cos\left(\omega^\top x + b\right) \cos\left(\omega^\top y + b\right)\right].
\end{align}$$

We can therefore get an unbiased, real valued estimate of $k(x - y)$ by sampling $\omega \sim p(\omega), \phi \sim \text{Uniform}[0, 2\pi]$ and computing

$$\begin{align}
\mathbb{E}_{\omega}\left[\zeta_{\omega}(x)\zeta_{\omega}^*(y)\right] \approx z_{\omega, b}(x) z_{\omega, b}(y), \text{ where } z_{\omega, b}(x) = \sqrt{2} \cos(\omega^\top x + b).
\end{align}$$

In fact, we can go a bit further by drawing $N$ independent pairs of $\omega, b$ and computing the estimate

$$\begin{align}
\mathbb{E}_{\omega}\left[\zeta_{\omega}(x)\zeta_{\omega}^*(y)\right] \approx \frac{1}{N} \sum_{n = 1}^N z_{\omega_n, \phi_n}(x) z_{\omega_n, \phi_n}(y).
\end{align}$$

This is also an unbiased estimate of $k$, however its variance is lower than in the $N = 1$ case, since the variance of the average of the sum of $N$ i.i.d. random variables is lower than the variance of a single one of the variables. We therefore arrive at the following algorithm for estimating $k$.

<div class="definition">
    
**Algorithm (Random Fourier Features)** Given a translation invariant kernel $k$ that is the Fourier transform of a probability measure $p$, we have the unbiased real-valued estimator
    
$$\begin{align}
k(x - y) \approx \frac{1}{N} \sum_{n = 1}^N z_{\omega_n, \phi_n}(x) z_{\omega_n, \phi_n}(y) = z^\top(x)z(y),
\end{align}$$
    
where we have used the notation $z(x) = \left[ z_{\omega_1, \phi_1}(x), ..., z_{\omega_N, \phi_N}(x) \right]^\top$ and $\omega_n \sim p(\omega), \phi_n \sim \text{Uniform}[0, 2\pi]$ are independent and identically distributed samples.
    
</div>
<br>

Now there remains the question of how large the error of this estimator is. Since $-\sqrt{2} \leq z_{\omega, \phi} \leq \sqrt{2}$, we can use Hoeffding's inequality{cite}`grimmett2020probability` to obtain the following high-probability bound on the absolute error.

<div class="lemma">
    
**Result (Hoeffding for RFF)** The RFF estimator of $k$, using $N$ pairs of $\omega, b$, obeys
    
$$\begin{align}
p\big(|z^\top(x)z(y) - k(x, y)| \geq \epsilon \big) \leq 2 \exp\left(-N \frac{\epsilon^2}{4}\right).
\end{align}$$
    
</div>
<br>

Therefore for any given input pair, the error of the RFF estimator decays exponentially quickly with $N$.

## References

```{bibliography} ./references.bib
```