# 同時分布と変分事後分布の式

---

## Variational Gaussian Process

### 同時分布

$$
\begin{align*}
p(\mathbf{y}, \mathbf{f} \mid \mathbf{X})
&=
p(\mathbf{y} \mid \mathbf{f})
p(\mathbf{f} \mid \mathbf{X}) \\
p(\mathbf{f} \mid \mathbf{X})
&=
N(\mathbf{0}, k(\mathbf{X}, \mathbf{X}))
\end{align*}
$$

### 変分事後分布

$$
\begin{align*}
p(\mathbf{f} \mid \mathbf{X}, \mathbf{y})
&\approx
q(\mathbf{f}) \\
&=
N(\mathbf{\mu}^*, \mathbf{\Sigma}^*)
\end{align*}
$$

---

## Variational Sparse Gaussian Process

### 同時分布

$$
\begin{align*}
p(\mathbf{y}, \mathbf{f}, \mathbf{u} \mid \mathbf{X})
&=
p(\mathbf{y} \mid \mathbf{f})
p(\mathbf{f} \mid \mathbf{X}, \mathbf{u})
p(\mathbf{u} \mid \mathbf{Z}) \\
p(\mathbf{f} \mid \mathbf{X}, \mathbf{u})
&=
N(
    k(\mathbf{X}, \mathbf{Z})
    k(\mathbf{Z}, \mathbf{Z})^{-1}
    \mathbf{u},
    k(\mathbf{X}, \mathbf{X}) -
    k(\mathbf{X}, \mathbf{Z})
    k(\mathbf{Z}, \mathbf{Z})^{-1}
    k(\mathbf{Z}, \mathbf{X})
) \\
p(\mathbf{u} \mid \mathbf{Z})
&=
N(\mathbf{0}, k(\mathbf{Z}, \mathbf{Z}))
\end{align*}
$$

### 変分事後分布

$$
\begin{align*}
p(\mathbf{f}, \mathbf{u} \mid \mathbf{X}, \mathbf{y})
&\approx
p(\mathbf{f} \mid \mathbf{X}, \mathbf{u})
q(\mathbf{u}) \\
q(\mathbf{u})
&=
N(\mathbf{\mu}^*, \mathbf{\Sigma}^*)
\end{align*}
$$

(疑問) 事後分布を $p(\mathbf{f}, \mathbf{u} \mid \mathbf{X}, \mathbf{y}) \approx p(\mathbf{f} \mid \mathbf{X}, \mathbf{u}) q(\mathbf{u})$ へ分解する近似は、大胆すぎないか？ 
$p(\mathbf{f} \mid \mathbf{X}, \mathbf{y}, \mathbf{u}) = p(\mathbf{f} \mid \mathbf{X}, \mathbf{u})$ が成り立つとは思えないし、そもそも $p(\mathbf{f} \mid \mathbf{X}, \mathbf{u})$ の式は $p(\mathbf{u} \mid \mathbf{Z}) = N(\mathbf{0}, k(\mathbf{Z}, \mathbf{Z}))$ という前提で導かれたものだから $q(\mathbf{u}) = N(\mathbf{\mu}^*, \mathbf{\Sigma}^*)$ へ更新された後には使えないのではないか？

---

## Stochastic Variational Sparse Gaussian Process

### 同時分布

$$
\begin{align*}
p(\mathbf{y}, \mathbf{f}, \mathbf{u} \mid \mathbf{X})
&=
p(\mathbf{y} \mid \mathbf{f})
p(\mathbf{f} \mid \mathbf{X}, \mathbf{u})
p(\mathbf{u} \mid \mathbf{Z}) \\
&=
\left\{\prod_{n}^{N}{
    p(y_n \mid f_n)
    p(f_n \mid \mathbf{x}_n, \mathbf{u})
}\right\}
p(\mathbf{u} \mid \mathbf{Z}) \\
&\approx
\left\{\prod_{m}^{M}{
    p(y_m \mid f_m)
    p(f_m \mid \mathbf{x}_m, \mathbf{u})
}\right\}^{\frac{N}{M}}
p(\mathbf{u} \mid \mathbf{Z}) \\
p(f_n \mid \mathbf{x}_n, \mathbf{u})
&=
N(
    k(\mathbf{x}_n, \mathbf{Z})
    k(\mathbf{Z}, \mathbf{Z})^{-1}
    \mathbf{u},
    k(\mathbf{x}_n, \mathbf{x}_n) -
    k(\mathbf{x}_n, \mathbf{Z})
    k(\mathbf{Z}, \mathbf{Z})^{-1}
    k(\mathbf{Z}, \mathbf{x}_n)
) \\
p(\mathbf{u} \mid \mathbf{Z})
&=
N(\mathbf{0}, k(\mathbf{Z}, \mathbf{Z}))
\end{align*}
$$

### 変分事後分布

$$
\begin{align*}
p(\mathbf{f}, \mathbf{u} \mid \mathbf{X}, \mathbf{y})
&\approx
p(\mathbf{f} \mid \mathbf{X}, \mathbf{u})
q(\mathbf{u}) \\
q(\mathbf{u})
&=
N(\mathbf{\mu}^*, \mathbf{\Sigma}^*)
\end{align*}
$$

---

## Heteroscedastic Gaussian Process Regression

### 同時分布

$$
\begin{align*}
p(\mathbf{y}, \mathbf{f}, \mathbf{r} \mid \mathbf{X})
&=
p(\mathbf{y} \mid \mathbf{f}, \mathbf{r})
p(\mathbf{f} \mid \mathbf{X})
p(\mathbf{r} \mid \mathbf{X}) \\
p(\mathbf{y} \mid \mathbf{f}, \mathbf{r})
&= N(\mathbf{f}, \exp(\mathbf{r})) \\
p(\mathbf{f} \mid \mathbf{X})
&=
N(\mathbf{0}, k_f(\mathbf{X}, \mathbf{X})) \\
p(\mathbf{r} \mid \mathbf{X})
&=
N(\mathbf{\mu}_r, k_r(\mathbf{X}, \mathbf{X}))
\end{align*}
$$

### 変分事後分布

$$
\begin{align*}
p(\mathbf{f} \mid \mathbf{X}, \mathbf{y})
&\approx
q(\mathbf{f}) \\
&=
N(\mathbf{\mu}^*_f, \mathbf{\Sigma}^*_f) \\
p(\mathbf{r} \mid \mathbf{X}, \mathbf{y})
&\approx
q(\mathbf{r}) \\
&=
N(\mathbf{\mu}^*_r, \mathbf{\Sigma}^*_r)
\end{align*}
$$