## Gaussian Process

<div class="alert alert-block alert-info">

The partitioned Gaussian vector:
$$
\begin{bmatrix} \mathbf{x}_1 \\ \mathbf{x}_2 \end{bmatrix} \sim \mathcal{N}\left( \begin{bmatrix} \boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2 \end{bmatrix}, \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} \right),
$$

$$
\boldsymbol{\mu}_{2|1} = \boldsymbol{\mu}_2 + \Sigma_{21} \Sigma_{11}^{-1} (\mathbf{x}_1 - \boldsymbol{\mu}_1),
$$
$$
\Sigma_{2|1} = \Sigma_{22} - \Sigma_{21} \Sigma_{11}^{-1} \Sigma_{12}.
$$


Let:
$$
\mathbf{x}_1 = \mathbf{f}, \quad \mathbf{x}_2 = \mathbf{f}_*, \quad \boldsymbol{\mu}_1 = \mathbf{0}, \quad \boldsymbol{\mu}_2 = \mathbf{0},
$$
$$
\Sigma_{11} = K(X, X), \quad \Sigma_{12} = K(X, X_*), \quad \Sigma_{21} = K(X_*, X), \quad \Sigma_{22} = K(X_*, X_*).
$$

Then:
$$
p\left( \begin{bmatrix} \mathbf{f} \\ \mathbf{f}_* \end{bmatrix} \middle| X, X_* \right) = \mathcal{N}\left( \begin{bmatrix} \mathbf{0} \\ \mathbf{0} \end{bmatrix}, \begin{bmatrix} K(X, X) & K(X, X_*) \\ K(X_*, X) & K(X_*, X_*) \end{bmatrix} \right),
$$

---

The posterior $\textbf{mean}$ is:
$$
\begin{aligned}
\boldsymbol{\mu}_{\mathbf{f}_* | \mathbf{f}} &= \boldsymbol{\mu}_2 + \Sigma_{21} \Sigma_{11}^{-1} (\mathbf{x}_1 - \boldsymbol{\mu}_1)\\ &= K(X_*, X) K(X, X)^{-1} \mathbf{f}.
\end{aligned}
$$

The posterior $\textbf{covariance}$ is:
$$
\begin{aligned}
\Sigma_{\mathbf{f}_* | \mathbf{f}} &= \Sigma_{22} - \Sigma_{21} \Sigma_{11}^{-1} \Sigma_{12}\\ &= K(X_*, X_*) - K(X_*, X) K(X, X)^{-1} K(X, X_*).
\end{aligned}
$$
  
</div>

---

## Bayesian Quadrature 1D example
### Integrating f(t)
For a gaussian process $f \sim \mathcal{GP}(m(x), k(x, x'))$, the following:

$$
F(x) = \int_a^x f(t) dt
$$

can be expressed in closed form.

---

Lets start by deriving the mean of the integral

Using RBF Kernel $k(y,x) = \sigma_f^2\exp \left(-\frac{1}{2l^2} |x-y|^2 \right)$:
$$
\begin{aligned}
v[k] &= \int_a^x k(t,s) dt \\ 
     &= \int_a^x \sigma_f^2\exp \left(-\frac{1}{2l^2} |s-t|^2 \right) dt \\
\end{aligned}
$$
Let $l = \sigma$ hereon and factor out $\sigma_f^2 \cdot \sigma \sqrt{2\pi}$:
$$
\begin{aligned}
v[k] &= \sigma_f^2 \cdot \sigma \sqrt{2\pi} \int_a^x \frac{1}{\sigma\sqrt{2\pi}}\exp \left(-\frac{(s-t)^2}{2\sigma^2} \right) dt \\
     &= \sigma_f^2 \cdot \sigma \sqrt{2\pi} \int_a^x k^*(t,s) dt \\
\end{aligned}
$$
Now working with just $k^*(t,s)$:
$$
\begin{aligned}
v^*[k] &= \int_a^x \frac{1}{\sigma\sqrt{2\pi}}\exp \left(-\frac{(s-t)^2}{2\sigma^2} \right) dt \\
     &= \frac{1}{2} \left[1+\text{erf} \left(\frac{x-s}{\sigma\sqrt{2}} \right) \right] - \frac{1}{2} \left[1+\text{erf} \\ \left(\frac{a-s}{\sigma\sqrt{2}} \right) \right] \\
     &= \frac{1}{2} \left[ \text{erf} \left(\frac{x-s}{\sigma\sqrt{2}} \right) - \text{erf}\left(\frac{a-s}{\sigma\sqrt{2}} \right) \right] \\
\end{aligned}
$$

---

For the uncertainty we have to integrate again. 

$$
\begin{aligned}
vv[k] &= \int_a^x \int_a^x k(t,s) dsdt \\
      &= \sigma_f^2 \cdot \sigma \sqrt{2\pi} \int_a^x \int_a^x k^*(t,s) dtds \\
      &= \sigma_f^2 \cdot \sigma \sqrt{2\pi} \int_a^x \frac{1}{2}\text{erf} \left(\frac{x-s}{\sigma\sqrt{2}} \right) - \frac{1}{2}\text{erf} \left(\frac{a-s}{\sigma\sqrt{2}} \right)ds \\
\end{aligned}
$$

Focusing on just $k^*(t,s)$, let $u_1 = \frac{x-s}{\sigma\sqrt{2}}$ and $u_2 = \frac{a-s}{\sigma\sqrt{2}}$  

$\implies$ $du_1 = du_2 = du = \frac{-1}{\sigma\sqrt{2}} \cdot ds$ 

$\implies$ $ds = -\sigma\sqrt{2} \cdot du$

$$
\begin{aligned}
vv^*[k] &= \int_a^x \frac{1}{2}\text{erf}(u_1)ds - \int_a^x\frac{1}{2}\text{erf}(u_2)ds \\
       &= \int_{\frac{x-a}{\sigma\sqrt{2}}}^0 -\frac{1}{2}\text{erf}(u_1) \cdot \sigma\sqrt{2} \cdot du_1 - \int_0^{\frac{a-x}{\sigma\sqrt{2}}} - \frac{1}{2}\text{erf}(u_2) \cdot \sigma\sqrt{2} \cdot du_2 \\
       &= \int_0^{\frac{x-a}{\sigma\sqrt{2}}} \frac{1}{2}\text{erf}(u) \cdot \sigma\sqrt{2} \cdot du - \int_{\frac{a-x}{\sigma\sqrt{2}}}^0 \frac{1}{2}\text{erf}(u) \cdot \sigma\sqrt{2} \cdot du \\
       &= \frac{\sigma\sqrt{2}}{2} \left(\int_0^{\frac{x-a}{\sigma\sqrt{2}}} \text{erf}(u)du + \int_0^{\frac{x-a}{\sigma\sqrt{2}}} \text{erf}(u)du \right) \\
       &= \sigma\sqrt{2} \left(\int_0^{\frac{x-a}{\sigma\sqrt{2}}} \text{erf}(u)du \right) \\
\end{aligned}
$$

Using $\int_0^\theta \text{erf}(u)du = \theta \, \text{erf}(\theta) + \frac{e^{-\theta^2} - 1}{\sqrt{\pi}}$, Let $\theta = \frac{x-a}{\sigma\sqrt{2}}$:

$$
\begin{aligned}
vv^*[k] &= \sigma\sqrt{2} \left(\theta \, \text{erf}(\theta) + \frac{e^{-\theta^2} - 1}{\sqrt{\pi}} \right) \\
        &= (x-a) \, \text{erf} \left( \frac{x-a}{\sigma \sqrt{2}} \right) + \sigma\sqrt{2} \left( \frac{\exp{\left(-{\frac{(x-a)^2}{2 \sigma^2}}\right)} - 1}{\sqrt{\pi}} \right)\\
\end{aligned}
$$

---

So we have:
$$
\begin{align*}
v[k] &= \left( \sigma_f^2 \cdot \sigma \sqrt{2\pi} \right) v^*[k] \\
     &= \left( \sigma_f^2 \cdot \sigma \sqrt{2\pi} \right) \, \left[\frac{1}{2} \text{erf} \left(\frac{x-s}{\sigma\sqrt{2}} \right) - \frac{1}{2} \text{erf}\left(\frac{a-s}{\sigma\sqrt{2}} \right) \right] \\
     \\
vv[k] &= \left( \sigma_f^2 \cdot \sigma \sqrt{2\pi} \right) vv^*[k] \\
      &= \left( \sigma_f^2 \cdot \sigma \sqrt{2\pi} \right) \left[(x-a) \, \text{erf} \left( \frac{x-a}{\sigma \sqrt{2}} \right) + \sigma\sqrt{2} \left( \frac{\exp{\left(-{\frac{(x-a)^2}{2 \sigma^2}}\right)} - 1}{\sqrt{\pi}} \right) \right] \\
      &= \sigma_f^2 \left[ \sigma \sqrt{2\pi} \, (x-a) \, \text{erf} \left( \frac{x-a}{\sigma \sqrt{2}} \right) + 2 \sigma^2 \left( e^{-{\frac{(x-a)^2}{2 \sigma^2}}} - 1 \right) \right] \\
\end{align*}
$$



So we have:
$$
\boxed{v[k] = \left( \sigma_f^2 \cdot \sigma \sqrt{2\pi} \right) \, \left[\frac{1}{2} \text{erf} \left(\frac{x-s}{\sigma\sqrt{2}} \right) - \frac{1}{2} \text{erf}\left(\frac{a-s}{\sigma\sqrt{2}} \right) \right]}
$$     

$$
\boxed{vv[k] = \sigma_f^2 \left[ \sigma \sqrt{2\pi} \, (x-a) \, \text{erf} \left( \frac{x-a}{\sigma \sqrt{2}} \right) + 2 \sigma^2 \left( e^{-{\frac{(x-a)^2}{2 \sigma^2}}} - 1 \right) \right]}
$$


$$
\boxed{\mathbb{E}_{f|\mathbf{D}}[F(x)] = M(x) + \mathbf{k}_F^T(x) K^{-1} \Big(f(X)-m(X) \Big)}
$$

$$
\boxed{\text{Cov}_{f|\mathbf{D}}\left(F(x), F(x')\right) = K(x,x') - \mathbf{k}_F^T(x) K^{-1} \mathbf{k}_F(x')}
$$

---

## Bayesian Quadrature
### Integrating f(t)p(t) - univariate case

For a gaussian process $f \sim \mathcal{GP}(m(x), k(x, x'))$ and a gaussian density $p \sim \mathcal{N}$, the following:

$$
\begin{aligned}
F(x) &= \int_a^x f(t) d\mathbb{P}(t) \\
     &= \int_a^x f(t) p(t) dt\\
\end{aligned}
$$

can be expressed in closed form.

---

**Prior mean function:**

$$
\begin{aligned}
M(x) &= \mathbb{E}[F(x)] \\
      &= \mathbb{E}\left[\int_a^x f(t)p(t) dt\right] \\
      &= \int_a^x \mathbb{E}[f(t)] p(t) dt \\
      &= \int_a^x m(t) p(t)dt
\end{aligned}
$$


**Prior covariance function:**

$$
\begin{aligned} 
K(x,x') &= \text{Cov}\Big(F(x), F(x')\Big) \\
        &= \text{Cov}\left(\int_a^x f(t) p(t) dt , \int_a^{x'} f(s) p(s) ds\right) \\
        &= \int_a^x \int_a^{x'} \text{Cov}\Big(f(t), f(s)\Big)p(s)p(t)\, ds\, dt \\
        &= \int_a^x \int_a^{x'} k(t,s) p(s)p(t)\, ds\, dt \\
\end{aligned} 
$$

---

An integral is a linear operator making $F(x)$ a linear function of a Gaussian process $f$. This makes $F(x)$ also a Gaussian process.
$$
F(x) \sim \mathcal{GP}\left( M(x),\ K(x, x') \right)
$$

where:

* $M(x) = \int_a^x m(t)p(t)\,dt$
* $K(x, x') =  \int_a^x \int_a^{x'} k(t, s)p(t)p(s)\,ds\,dt$
---

**Posterior mean function:**

$$
\begin{aligned}
\mathbb{E}_{f|\mathbf{D}}[F(x)] 
     &= \mathbb{E}_{f|\mathbf{D}}\left[\int_a^x f(t)p(t) dt\right] \\
     &= \int_a^x \mathbb{E}_{f|\mathbf{D}}[f(t)] p(t) dt \\
     &= \int_a^x \left[ \int_F f(t) p(f|\mathbf{D}) df \right] p(t)  dt\\
     &= \int_a^x \bar{f}_{f|\mathbf{D}}(t) p(t)dt
\end{aligned}
$$

Using the partitioned gaussian result from before:
$$
\boldsymbol{\mu}_{\mathbf{f}_* | \mathbf{f}} = \boldsymbol{\mu}_2 + \Sigma_{21} \Sigma_{11}^{-1} (\mathbf{x}_1 - \boldsymbol{\mu}_1)
$$
The posterior $\textbf{mean of f}$ is:
$$
\bar{f}_{f|\mathbf{D}}(t) = m(t) + k(t,X) k(X,X)^{-1} \Big( f(X) - m(X) \Big)
$$
Substituting gives:
$$
\begin{aligned}
\mathbb{E}_{f|\mathbf{D}}[F(x)]
    &= \int_a^x \Big[m(t) + k(t, X) k(X, X)^{-1} \Big( f(X) - m(X) \Big)\Big] p(t)dt \\
    &= \int_a^x m(t)p(t)dt + \int_a^x k(t, X) k(X, X)^{-1} \Big( f(X) - m(X) \Big)p(t)dt \\
    &= M(x) + \underbrace{\left[ \int_a^x k(t, X)p(t)dt \right]}_{\mathbf{k}_F} k(X, X)^{-1} \Big( f(X) - m(X) \Big) \\
\end{aligned}
$$


**Posterior covariance function:**
$$
\begin{aligned} 
\text{Cov}_{f|\mathbf{D}}\left(F(x), F(x')\right)
         &=\text{Cov}_{f|\mathbf{D}}\left(\int_a^x f(t)p(t) dt, \int_a^{x'} f(s)p(s) ds\right) \\
         &= \int_a^x \int_a^{x'} \text{Cov}_{f|\mathbf{D}}\Big(f(t)p(t), f(s)p(s)\Big)\, ds\, dt \\
         &= \int_a^x \int_a^{x'} \text{Cov}_{f|\mathbf{D}}\Big(f(t), f(s)\Big)p(s)p(t)\, ds\, dt \\
\end{aligned} 
$$

Using the partitioned gaussian result from before:
$$
\boldsymbol{\Sigma}_{\mathbf{f}_* | \mathbf{f}} = \Sigma_{22} - \Sigma_{21} \Sigma_{11}^{-1} \Sigma_{12}
$$
The posterior $\textbf{covariance function of f}$ is:
$$
\text{Cov}_{f|\mathbf{D}}\Big(f(t), f(s)\Big) = k(t,s) - k(t,X) k(X,X)^{-1} k(X,s)
$$
Substituting gives:
$$
\begin{aligned} 
\text{Cov}_{f|\mathbf{D}}\left(F(x), F(x')\right)
        &= \int_a^x \int_a^{x'} \Big[ k(t, s) - k(t, X) k(X, X)^{-1} k(X, s) \Big] p(s)p(t)\, ds\, dt \\
        &= \int_a^x \int_a^{x'} k(t, s)p(t)p(s) - \int_a^x \int_a^{x'} k(t, X) k(X, X)^{-1} k(X, s) p(s)p(t)\, ds\, dt \\
        &= K(x,x') - \int_a^x \int_a^{x'} \Big[ k(t, X)p(t)\Big] k(X, X)^{-1} \Big[ k(X, s) p(s) \Big]\, ds\, dt \\
\end{aligned} 
$$

---

We have:
$$
\boxed{\mathbb{E}_{f|\mathbf{D}}[F(x)] = M(x) + \mathbf{k}_F^T(x) K^{-1} \Big(f(X)-m(X) \Big)}
$$

$$
\boxed{\text{Cov}_{f|\mathbf{D}}\left(F(x), F(x')\right) = K(x,x') - \mathbf{k}_F^T(x) K^{-1} \mathbf{k}_F(x')}
$$


where $$\textbf{k}_F(x) = \Big[ \int_a^x k(t, x_1)p(t)dt,\, \int_a^x k(t, x_2)p(t)dt,\, \cdots ,\, \int_a^x k(t, x_n)p(t)dt \Big]$$

and $K = k(X,X)$ is the Gram matrix of the training inputs under the kernel k.

---

### Integrating f(x)p(x)

Lets start by deriving the mean of the integral


For now, we use RBF isotropic kernel $k(\mathbf{y},\mathbf{x}) = \sigma_f^2\exp \left(-\frac{1}{2l^2} ||\mathbf{x}-\mathbf{y}||^2 \right)$:

Looking at the univariate case for now:
$$
\begin{aligned}
v[k] &= \int_a^x k(t,s)p(t) dt \\ &= \int_a^x \sigma_f^2 \exp \left(-\frac{1}{2l^2} (s-t)^2 \right) \frac{1}{\sigma_p \sqrt
        {2 \pi}} \exp\left( -\frac{1}{2 \sigma_p^2}(t - \mu_p)^2 \right) dt \\
     &= \sigma_f^2 l \sqrt{2 \pi} \int_a^x \frac{1}{l \sqrt{2 \pi}} \exp \left(-\frac{1}{2l^2} (s-t)^2 \right) \frac{1}    
        {\sigma_p \sqrt{2 \pi}} \exp\left(-\frac{1}{2 \sigma_p^2}(t - \mu_p)^2 \right) dt \\
        &= \sigma_f^2 l \sqrt{2 \pi} \, \mathcal{N}(t | s, l^2)\,\mathcal{N}(t | \mu_p, \sigma_p) \\
\end{aligned}
$$


Using result A.7 in *Carl Edward Rasmussen & Christopher K. I. Williams (2006)*:

The product of two Gaussians gives another (un-normalized) Gaussian
$$
\mathcal{N}(x | a, A)\,\mathcal{N}(x | b, B)
= Z^{-1}\,\mathcal{N}(x | c, C)
$$
where
$$
\mathbf{c} = \mathbf{C}\,(A^{-1}a + B^{-1}b), 
\qquad 
C = \left(A^{-1} + B^{-1}\right)^{-1}
$$

The normalizing constant looks itself like a Gaussian

$$
Z^{-1} = (2\pi)^{-D/2} \, |A + B|^{-1/2} \, 
\exp\left(-\tfrac{1}{2}(a-b)^T(A+B)^{-1}(a-b)\right)
$$

Applying the result:

$$
\mathbf{c} = \mathbf{C}\,((l^2)^{-1} s + (\sigma_p^2)^{-1} \mu_p), 
\qquad 
C = \left((l^2)^{-1} + (\sigma_p^2)^{-1}\right)^{-1}
$$

The normalizing constant looks itself like a Gaussian

$$
\begin{aligned}
Z^{-1} &= (2\pi)^{-1/2} \, |(l^2) + (\sigma_p^2)|^{-1/2} \, \exp\left(-\tfrac{1}{2}(s-\mu_p)^T((l^2)+(\sigma_p^2))^{-1}(s-\mu_p)\right) \\
&= \frac{1}{\sqrt{2\pi \, (l^2 + \sigma_p^2)}} \, \exp\left(-\frac{(s-\mu_p)^2}{2(l^2+\sigma_p^2)}\right) \\
\end{aligned}
$$

And finally:
$$
\begin{aligned}
v[k] &= \sigma_f^2 l \sqrt{2 \pi} \, \mathcal{N}(t | s, l^2)\,\mathcal{N}(t | \mu_p, \sigma_p) \\
     &= \sigma_f^2 l \sqrt{2 \pi} \, \frac{1}{\sqrt{2\pi \, (l^2 + \sigma_p^2)}} \, \exp\left(-\frac{(s-\mu_p)^2}{2(l^2+\sigma_p^2)}\right) \\
     &= \sigma_f^2 \, \sqrt{\frac{l^2}{l^2 + \sigma_p^2}}\, \exp\left(-\frac{(s-\mu_p)^2}{2(l^2+\sigma_p^2)}\right) \\
\end{aligned}
$$

---

For the uncertainty we have to integrate again. We will again use $[a,x] = [-\infty, \infty]$.


$$
\begin{aligned}
vv[k] &= \int \int k(t,s)p(t)p(s) dtds \\
      &= \int \Big[\int k(t,s)p(t) dt\Big] \,p(s)ds \\
      &= \int v[k] \,p(s)ds \\
      &= \int \sigma_f^2 \, \sqrt{\frac{l^2}{l^2 + \sigma_p^2}}\, \exp\left(-\frac{(s-\mu_p)^2}{2(l^2+\sigma_p^2)}\right) ds \, \mathcal{N}(s | \mu_p, \Sigma_p) \\
      &= \sigma_f^2 \, \sqrt{\frac{l^2}{l^2 + \sigma_p^2}} \sqrt{(2\pi) (\sigma_p^2 + l^2)} \int \frac{1}{\sqrt{(2\pi) (\sigma_p^2 + l^2)}}\exp\left(-\frac{(s-\mu_p)^2}{2(l^2+\sigma_p^2)}\right) ds \, \mathcal{N}(s | \mu_p, \Sigma_p) \\
      &= \sigma_f^2 \, \sqrt{\frac{l^2}{l^2 + \sigma_p^2}} \sqrt{(2\pi) (\sigma_p^2 + l^2)} \, \mathcal{N}(s | \mu_p, \sigma_p^2 + l^2) \, \mathcal{N}(s | \mu_p, \sigma_p^2)\\
\end{aligned}
$$

Using the earlier A.7 result:

$$
Z^{-1} = (2\pi)^{-1/2} \, (2\sigma_p^2 + l^2)^{-1/2} 
$$

We then have:
$$
\begin{aligned}
vv[k] &= \sigma_f^2 \, \sqrt{\frac{l^2}{l^2 + \sigma_p^2}} \sqrt{(2\pi) (\sigma_p^2 + l^2)} \, \mathcal{N}(s | \mu_p, \sigma_p^2 + l^2) \, \mathcal{N}(s | \mu_p, \sigma_p^2)\\
      &= \sigma_f^2 \, \sqrt{\frac{l^2}{l^2 + \sigma_p^2}} \sqrt{(2\pi) (\sigma_p^2 + l^2)} \, (2\pi)^{-1/2} \, (2\sigma_p^2 + l^2)^{-1/2} \\
      &= \sigma_f^2 \, \sqrt{\frac{l^2}{l^2 + 2\sigma_p^2}}  \\
\end{aligned}
$$

---

So we have:

$$
\boxed{\mathbb{E}_{f|\mathbf{D}}[F(x)] = M(x) + \mathbf{k}_F^T(x) K^{-1} \Big(f(X)-m(X) \Big)}
$$

$$
\boxed{\text{Cov}_{f|\mathbf{D}}\left(F(x), F(x')\right) = K(x,x') - \mathbf{k}_F^T(x) K^{-1} \mathbf{k}_F(x')}
$$

$$
\boxed{v[k] = \sigma_f^2 \sqrt{\frac{l^2}{l^2 + \sigma_p^2}} \; \exp\left[- \frac{(s - \mu_p)^2}{2 (l^2 + \sigma_p^2)} \right]}
$$

$$
\boxed{vv[k] = \sigma_f^2 \sqrt{\frac{l^2}{l^2 + 2 \sigma_p^2}}}
$$

### Product of 2 gaussians



$$
\begin{aligned}
v[k] &= \int_a^x k(t,s)p(t) dt \\ 
     &= \int_a^x \sigma_f^2 \exp \left(-\frac{1}{2l^2} (s-t)^2 \right) \frac{1}{\sigma_p \sqrt{2 \pi}} \exp\left(      
     -\frac{1}{2 \sigma_p^2}(t - \mu_p)^2 \right) dt \\
     &= \int_a^x \frac{\sigma_f^2}{\sigma_p \sqrt{2 \pi}} \exp \left(-\frac{1}{2l^2} (s-t)^2  -\frac{1}{2 \sigma_p^2}(t - \mu_p)^2 \right) dt \\
     &= \int_a^x \frac{\sigma_f^2}{\sigma_p \sqrt{2 \pi}} \exp \left(-\frac{1}{2l^2} \left(s^2-2st+t^2\right)  -\frac{1}{2 \sigma_p^2}\left(t^2 - 2t\mu_p + \mu_p^2\right) \right) dt \\
     &= \int_a^x \frac{\sigma_f^2}{\sigma_p \sqrt{2 \pi}} \exp \left( -\frac{1}{2} \left[ \left(\frac{1}{l^2} + \frac{1}{\sigma_p^2}\right) t^2 - 2\left(\frac{s}{l^2} + \frac{\mu_p}{\sigma_p^2}\right) t + \left(\frac{s^2}{l^2} + \frac{\mu_p^2}{\sigma_p^2}\right) \right] \right)dt \\
     &= \int_a^x \frac{\sigma_f^2}{\sigma_p \sqrt{2 \pi}} \exp \left(-\frac{1}{2} \left(\frac{1}{l^2} + \frac{1}{\sigma_p^2}\right) \left[ \left( t^2 - 2 \frac{\frac{s}{l^2} + \frac{\mu_p}{\sigma_p^2}}{\frac{1}{l^2} + \frac{1}{\sigma_p^2}} t \right) + \frac{\frac{s^2}{l^2} + \frac{\mu_p^2}{\sigma_p^2}}{\frac{1}{l^2} + \frac{1}{\sigma_p^2}} \right]\right)dt \\
\end{aligned}
$$
Let $\upsilon^2 = \frac{l^2\sigma_p^2}{l^2 + \sigma_p^2}$, $m = \frac{\frac{s}{l^2} + \frac{\mu_p}{\sigma_p^2}}{\frac{1}{l^2} + \frac{1}{\sigma_p^2}}, c_1 = \frac{\frac{s^2}{l^2} + \frac{\mu_p^2}{\sigma_p^2}}{\frac{1}{l^2} + \frac{1}{\sigma_p^2}}$

$$
\begin{aligned}
 v[k] &= \int_a^x \frac{\sigma_f^2}{\sigma_p \sqrt{2 \pi}} \exp \left(-\frac{1}{2\upsilon^2} \left[\left( t^2 - 2mt 
     \right) + c_1 \right]\right)dt \\
     &= \int_a^x \frac{\sigma_f^2}{\sigma_p \sqrt{2 \pi}} \exp \left(-\frac{1}{2\upsilon^2} \left[ \left(\left(t - m\right)^2 - m^2\right) + c_1\right]\right)dt \\
     &= \int_a^x \frac{\sigma_f^2}{\sigma_p \sqrt{2 \pi}} \exp \left(-\frac{1}{2\upsilon^2} \left(t - m\right)^2 \right) \exp \left( -\frac{1}{2\upsilon^2} \left[ - m^2 + c_1 \right] \right)  dt \\
\end{aligned}
$$


Let $c_2(s) := c_2 = \exp \left(\frac{m^2 - c_1}{2\upsilon^2} \right)\,\frac{\upsilon \,\sigma_f^2}{\sigma_p}$


$$
\begin{aligned}
v[k] &= c_2 \int_a^x \frac{1}{\upsilon \sqrt{2 \pi}} \exp \left(-\frac{1}
     {2\upsilon^2} \left(t - m\right)^2 \right) dt \\
     &= c_2 \int_a^x k^*(t,m) dt \\
\end{aligned}
$$

If most of the density of $p$ falls in the confident regions of the GP $f$, then we can take $[a,x] = [-\infty, \infty]$.

$\int_{-\infty}^{\infty} k^*(t,m) dt$ integrates to 1 and $v[k] = c_2$.



#### Simplify $\upsilon^2$, $m$, $c_1$ and $c_2$.

Using:

$\upsilon^2 = \frac{l^2 \sigma_p^2}{l^2 + \sigma_p^2}$

$m = \frac{\frac{s}{l^2} + \frac{\mu_p}{\sigma_p^2}}{\frac{1}{l^2} + \frac{1}{\sigma_p^2}} = \frac{s \sigma_p^2 + \mu_p l^2}{l^2 + \sigma_p^2}$

$c_1 = \frac{\frac{s^2}{l^2} + \frac{\mu_p^2}{\sigma_p^2}}{\frac{1}{l^2} + \frac{1}{\sigma_p^2}} = \frac{s^2 \sigma_p^2 + \mu_p^2 l^2}{l^2 + \sigma_p^2}$

$c_2 = \exp \left(\frac{m^2 - c_1}{2\upsilon^2} \right)\,\frac{\upsilon \,\sigma_f^2}{\sigma_p}$

We simplify:

$m^2 - c_1 = \frac{(s \sigma_p^2 + \mu_p l^2)^2}{(l^2 + \sigma_p^2)^2} - \frac{s^2 \sigma_p^2 + \mu_p^2 l^2}{l^2 + \sigma_p^2} = \frac{(s \sigma_p^2 + \mu_p l^2)^2 - (s^2 \sigma_p^2 + \mu_p^2 l^2)(l^2 + \sigma_p^2)}{(l^2 + \sigma_p^2)^2}$

*Numerator 1st term:* $(s \sigma_p^2 + \mu_p l^2)^2 = s^2 \sigma_p^4 + 2 s \mu_p l^2 \sigma_p^2 + \mu_p^2 l^4$

*Numerator 2nd term:* $(s^2 \sigma_p^2 + \mu_p^2 l^2)(l^2 + \sigma_p^2) = s^2 \sigma_p^2 l^2 + s^2 \sigma_p^4 + \mu_p^2 l^4 + \mu_p^2 l^2 \sigma_p^2$

*Numerator:*
$$
\begin{aligned}
(s^2 \sigma_p^4 + 2 s \mu_p l^2 \sigma_p^2 + \mu_p^2 l^4) - (s^2 \sigma_p^4 + s^2 l^2 \sigma_p^2 + \mu_p^2 l^4 + \mu_p^2 l^2 \sigma_p^2) \\
= 2 s \mu_p l^2 \sigma_p^2 - s^2 l^2 \sigma_p^2 - \mu_p^2 l^2 \sigma_p^2 \\
= l^2 \sigma_p^2 (2 s \mu_p - s^2 - \mu_p^2) \\
= - l^2 \sigma_p^2 (s - \mu_p)^2
\end{aligned}
$$

$\frac{m^2 - c_1}{2 \upsilon^2} = \frac{- l^2 \sigma_p^2 (s - \mu_p)^2}{2} \cdot \frac{l^2 + \sigma_p^2}{l^2 \sigma_p^2} = - \frac{(l^2 + \sigma_p^2) (s - \mu_p)^2}{2}$


$\frac{\upsilon \sigma_f^2}{\sigma_p} = \frac{\sqrt{\frac{l^2 \sigma_p^2}{(l^2 + \sigma_p^2)}} \, \sigma_f^2}{\sigma_p} = \sigma_f^2 {\frac{l}{\sqrt{l^2 + \sigma_p^2}}}$

$$
\boxed{c_2(s) = \frac{m^2 - c_1}{2 \upsilon^2} \cdot \frac{\upsilon \sigma_f^2}{\sigma_p} = \sigma_f^2 \sqrt{\frac{l^2}{l^2 + \sigma_p^2}} \; \exp\left[- \frac{(s - \mu_p)^2}{2 (l^2 + \sigma_p^2)} \right]}
$$

---

For the uncertainty we have to integrate again. We will again use $[a,x] = [-\infty, \infty]$.


$$
\begin{aligned}
vv[k] &= \int \int k(t,s)p(t)p(s) dtds \\
      &= \int \Big[\int k(t,s)p(t) dt\Big] \,p(s)ds \\
      &= \int c_2(s) \,p(s)ds \\
      &= \int \sigma_f^2 \sqrt{\frac{l^2}{l^2 + \sigma_p^2}} \; \exp\left[- \frac{(s - \mu_p)^2}{2 (l^2 + \sigma_p^2)} \right] \,\frac{1}{\sigma_p \sqrt{2 \pi}} \exp\left(      
     -\frac{1}{2 \sigma_p^2}(s - \mu_p)^2 \right) ds \\
     &= \sigma_f^2 \sqrt{\frac{l^2}{l^2 + \sigma_p^2}} \,\frac{1}{\sigma_p \sqrt{2 \pi}} \int \exp\left[- \frac{(s - \mu_p)^2}{2 (l^2 + \sigma_p^2)} \right] \exp\left(-\frac{1}{2 \sigma_p^2}(s - \mu_p)^2 \right) ds \\
\end{aligned}
$$

Focusing on the integral:
$$
\begin{aligned}
\int \exp\left[- \frac{(s - \mu_p)^2}{2 (l^2 + \sigma_p^2)} \right] \exp\left(-\frac{1}{2 \sigma_p^2}(s - \mu_p)^2 \right) ds 
&= \int \exp\left[- \frac{(s - \mu_p)^2}{2 (l^2 + \sigma_p^2)} - \frac{(s - \mu_p)^2}{2 \sigma_p^2} \right]ds \\
&= \int \exp \Big[ - (s - \mu_p)^2 \Big( \frac{1}{2 (l^2 + \sigma_p^2)} + \frac{1}{2 \sigma_p^2} \Big) \Big]ds \\
&= \int \exp \Big[ - (s - \mu_p)^2 \frac{l^2 + 2 \sigma_p^2}{2 \sigma_p^2 (l^2 + \sigma_p^2)} \Big]ds \\
&= \sqrt{2 \pi} \, \sqrt{\frac{\sigma_p^2 (l^2 + \sigma_p^2)}{l^2 + 2 \sigma_p^2}} \\
\end{aligned} 
$$

Therefore:
$$
\begin{aligned}
vv[k] &= \sigma_f^2 \sqrt{\frac{l^2}{l^2 + \sigma_p^2}} \,\frac{1}{\sigma_p \sqrt{2 \pi}} \int \exp\left[- \frac{(s - \mu_p)^2}{2 (l^2 + \sigma_p^2)} \right] \exp\left(-\frac{1}{2 \sigma_p^2}(s - \mu_p)^2 \right) ds \\
      &= \sigma_f^2 \sqrt{\frac{l^2}{l^2 + \sigma_p^2}} \,\frac{1}{\sigma_p \sqrt{2 \pi}} \left[\sqrt{2 \pi} \, \sqrt{\frac{\sigma_p^2 (l^2 + \sigma_p^2)}{l^2 + 2 \sigma_p^2}} \right]\\
      &= \sigma_f^2 \sqrt{\frac{l^2}{l^2 + \sigma_p^2}} \,\frac{1}{\sigma_p} \left[\sigma_p\sqrt{\frac{l^2 + \sigma_p^2}{l^2 + 2 \sigma_p^2}} \right]\\
      &= \sigma_f^2 \sqrt{\frac{l^2}{l^2 + 2 \sigma_p^2}} \\
\end{aligned}
$$

---

## Bayesian Quadrature
### Integrating f(t)p(t) - multivariate case


---

We have:
$$
\boxed{\mathbb{E}_{f|\mathbf{D}}[F(x)] = M(x) + \mathbf{k}_F^T(x) K^{-1} \Big(f(X)-m(X) \Big)}
$$

$$
\boxed{\text{Cov}_{f|\mathbf{D}}\left(F(x), F(x')\right) = K(x,x') - \mathbf{k}_F^T(x) K^{-1} \mathbf{k}_F(x')}
$$


where $$\textbf{k}_F(x) = \Big[ \int_a^x k(t, x_1)p(t)dt,\, \int_a^x k(t, x_2)p(t)dt,\, \cdots ,\, \int_a^x k(t, x_n)p(t)dt \Big]$$

and $K = k(X,X)$ is the Gram matrix of the training inputs under the kernel k.

---

### Integrating f(x)p(x)

Lets start by deriving the mean of the integral

We will use the Anisotropic Gaussian kernel $k(\mathbf{y}, \mathbf{x}) = \sigma_f^2 \exp\Big[-\frac{1}{2} (\mathbf{x}-\mathbf{y})^T \mathbf{L}^{-2} (\mathbf{x}-\mathbf{y}) \Big]$. This is known as the ARD Gaussian kernel when $\mathbf{L}^{-2} = \mathrm{diag}(l_1^{-2}, \dots, l_d^{-2})$:



$$
\begin{aligned}
v[k] &= \int k(\mathbf{t},\mathbf{s})p(\mathbf{t}) d\mathbf{t} \\ 
     &= \int \sigma_f^2 \exp\Big[-\frac{1}{2} (\mathbf{t}-\mathbf{s})^T \mathbf{L}^{-2} (\mathbf{t}-\mathbf{s}) \Big] \frac{1}{\sqrt{(2\pi)^d |\Sigma_p|}} \exp\Big[-\frac{1}{2} (\mathbf{t}-\boldsymbol{\mu}_p)^T \Sigma_p^{-1} (\mathbf{t}-\boldsymbol{\mu}_p)\Big] d\mathbf{t} \\
     &= \sigma_f^2 \sqrt{(2\pi)^d |\mathbf{L^2}|} \int \frac{1}{\sqrt{(2\pi)^d |\mathbf{L^2}|}} \exp \Big[-\frac{1}{2} (\mathbf{t}-\mathbf{s})^T \mathbf{L}^{-2} (\mathbf{t}-\mathbf{s}) \Big] \frac{1}{\sqrt{(2\pi)^d |\Sigma_p|}} \exp\Big[-\frac{1}{2} (\mathbf{t}-\boldsymbol{\mu}_p)^T \Sigma_p^{-1} (\mathbf{t}-\boldsymbol{\mu}_p)\Big] d\mathbf{t} \\
     &= \sigma_f^2 (2\pi)^{d/2} |\mathbf{L}| \int \mathcal{N}(\mathbf{t} \mid \mathbf{s}, \mathbf{L}^2) \, \mathcal{N}(\mathbf{t} \mid \boldsymbol{\mu}_p, \Sigma_p) \, d\mathbf{t}.
\end{aligned}
$$

Using result A.7 in *Carl Edward Rasmussen & Christopher K. I. Williams (2006)*:

The product of two Gaussians gives another (un-normalized) Gaussian
$$
\mathcal{N}(x | a, A)\,\mathcal{N}(x | b, B)
= Z^{-1}\,\mathcal{N}(x | c, C)
$$
where
$$
\mathbf{c} = \mathbf{C}\,(A^{-1}a + B^{-1}b), 
\qquad 
C = \left(A^{-1} + B^{-1}\right)^{-1}
$$

The normalizing constant looks itself like a Gaussian

$$
Z^{-1} = (2\pi)^{-D/2} \, |A + B|^{-1/2} \, 
\exp\left(-\tfrac{1}{2}(a-b)^T(A+B)^{-1}(a-b)\right)
$$

Applying the result:
$$
\mathbf{C} = \Big(\mathbf{(L^2})^{-1} + \Sigma_p^{-1}\Big)^{-1} = \Big(\frac{1}{l^2} I + \Sigma_p^{-1}\Big)^{-1}
$$

$$
\mathbf{c} = \mathbf{C} \Big((\mathbf{L^2})^{-1} \mathbf{s} + \Sigma_p^{-1} \boldsymbol{\mu}_p \Big)
$$

$$
Z^{-1} = (2\pi)^{-d/2} |\Sigma_p + \mathbf{L^2}|^{-1/2} \exp\Big[-\frac{1}{2} (\mathbf{s}-\boldsymbol{\mu}_p)^T (\Sigma_p + \mathbf{L^2})^{-1} (\mathbf{s}-\boldsymbol{\mu}_p) \Big]
$$


The integral of a normalized Gaussian over all $\mathbf{t}$ is 1. So only the normalization constant is necessary:

$$
\begin{aligned}
v[k] &= \sigma_f^2 (2\pi)^{d/2} |\mathbf{L}| Z^{-1} \\
     &= \sigma_f^2 (2\pi)^{d/2} |\mathbf{L}|  (2\pi)^{-d/2} |\Sigma_p + \mathbf{L^2}|^{-1/2} \exp\Big[-\frac{1}{2} (\mathbf{s}-\boldsymbol{\mu}_p)^T (\Sigma_p + \mathbf{L^2})^{-1} (\mathbf{s}-\boldsymbol{\mu}_p) \Big] \\
     &= \sigma_f^2 |\mathbf{L}| |\Sigma_p + \mathbf{L^2}|^{-1/2} \exp\Big[-\frac{1}{2} (\mathbf{s}-\boldsymbol{\mu}_p)^T (\Sigma_p +  \mathbf{L^2})^{-1} (\mathbf{s}-\boldsymbol{\mu}_p) \Big] \\
\end{aligned}
$$
---



For the uncertainty we have to integrate again. We will again use $[a,x] = [-\infty, \infty]$.


$$
\begin{aligned}
vv[k] &= \int \int k(t,s)p(t)p(s) dtds \\
      &= \int \Big[\int k(t,s)p(t) dt\Big] \,p(s)ds \\
      &= \int v[k] \,p(s)ds \\
      &= \int \sigma_f^2 |\mathbf{L}| |\Sigma_p + \mathbf{L^2}|^{-1/2} \exp\Big[-\frac{1}{2} (\mathbf{s}-\boldsymbol{\mu}_p)^T (\Sigma_p +  \mathbf{L^2})^{-1} (\mathbf{s}-\boldsymbol{\mu}_p) \Big] ds \, \mathcal{N}(s | \mu_p, \Sigma_p) \\
      &= \sigma_f^2 |\mathbf{L}| |\Sigma_p + \mathbf{L^2}|^{-1/2} \sqrt{(2\pi)^d |\Sigma_p + \mathbf{L^2}|} \, \mathcal{N}(s | \mu_p, \Sigma_p + \mathbf{L^2}) \, \mathcal{N}(s | \mu_p, \Sigma_p)\\
\end{aligned}
$$

Using the earlier A.7 result:

$$
Z^{-1} = (2\pi)^{-d/2} \, |[\Sigma_p + l^2I] + \Sigma_p|^{-1/2} 
$$

We then have:
$$
\begin{aligned}
vv[k] &= \sigma_f^2 |\mathbf{L}| |\Sigma_p + \mathbf{L^2}|^{-1/2} \sqrt{(2\pi)^d |\Sigma_p + \mathbf{L^2}|} (2\pi)^{-d/2} \, \Big|[\Sigma_p + \mathbf{L^2}] + \Sigma_p \Big|^{-1/2} \\
      &= \sigma_f^2 |\mathbf{L}| \, |2\Sigma_p + \mathbf{L^2}|^{-1/2} \\
\end{aligned}
$$

---

So we have:

$$
\boxed{\mathbb{E}_{f|\mathbf{D}}[F(x)] = M(x) + \mathbf{k}_F^T(x) K^{-1} \Big(f(X)-m(X) \Big)}
$$

$$
\boxed{\text{Cov}_{f|\mathbf{D}}\left(F(x), F(x')\right) = K(x,x') - \mathbf{k}_F^T(x) K^{-1} \mathbf{k}_F(x')}
$$

$$
\boxed{v[k] = \sigma_f^2 |\mathbf{L}| |\Sigma_p + \mathbf{L^2}|^{-1/2} \exp\Big[-\frac{1}{2} (\mathbf{s}-\boldsymbol{\mu}_p)^T (\Sigma_p +  \mathbf{L^2})^{-1} (\mathbf{s}-\boldsymbol{\mu}_p) \Big]}
$$

$$
\boxed{vv[k] = \sigma_f^2 |\mathbf{L}| \, |2\Sigma_p + \mathbf{L^2}|^{-1/2}}
$$