# Gaussian Process Regression

In previous 2 methods for regression problem, we will define a set of parameters and find the best fit combination. However, in some cases it is difficult to know the relation between dependent variables and independent variables, if use parameter-based regression approach, may can not get a effective prediction model.

By Bayes rule, we can mapping the uncertainty into a prior over maping and get the posterior. Gaussian process can be used to represent a prior distribution and no parameter is needed.  

<img src="img/gaussian_process_example.png" width="600">

**Definition**: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution, which means each random variable is distributed normally and their joint distribution is also Gaussian.
  
A Gaussian function has 2 components: mean function $\mu(x)$ and covariance function $cov(x,x')$

$$\mu(x) = \Bbb{E}[f(x)]$$
$$cov(x,x') = \Bbb{E}[(f(x)-\mu(x))(f(x')-\mu(x'))]$$

The Gaussian process can be expressed as,
$$f(x)\sim \cal{GP}(\mu(x),cov(x,x'))$$

Gaussian distributions have the nice algebraic property of being closed under conditioning and marginalization. Being closed under conditioning and marginalization means that the resulting distributions from these operations are also Gaussian, which makes many problems in statistics and machine learning tractable.

If X, Y are tehe subest of a Gaussian process, we can notation as,

$$p_{XY} = \left[\begin{array}{ccc}X\\Y \end{array}\right]\sim \cal{N}(\mu, \Sigma)=\cal{N}(
    \left[\begin{array}{ccc}\mu_X\\ \mu_Y \end{array}\right] ,
    \left[\begin{array}{ccc}\Sigma_{XX}\Sigma_{XY}\\ \Sigma_{XY}\Sigma_{YY} \end{array}\right]
)$$

**Marginalization**  
we can dertermine the marginal distributeion of $X$ and $Y$ as,  
$X\sim \cal{N}(\mu_x, \Sigma_{XX})$  
$Y\sim \cal{N}(\mu_y, \Sigma_{YY})$  

**Conditioning**  
Conditioning is used to determine the probability of one variable depending on another variable. Using the same exampel,  
$$X|Y \sim \cal{N}(\mu_x+\Sigma_{XY}\Sigma_{YY}^{-1}(Y-\mu_Y), \Sigma_{XX}-\Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{YX})$$
$$Y|X \sim \cal{N}(\mu_y+\Sigma_{YX}\Sigma_{XX}^{-1}(Y-\mu_X), \Sigma_{YY}-\Sigma_{YX}\Sigma_{XX}^{-1}\Sigma_{XY})$$

<img src="img/gaussian_process_feature.png" width="600">

#### **Bayesian Inference**
A Bayesian inference is to update a statistical hypothesis unital the new information untial available. The posterior probability $p(X|Y)$ can be derived from prior probability $p(X)$ and ikelihood function $p(Y|X)$ (can be observed from given data $X$ and $Y$) by Bayesian rule,
$$p(X|Y) = {p(Y|X)p(X)} \over {p(Y)} $$

$p(Y)$ is the marginal likelihood, 
$$p(Y) = \int p(Y|X)p(X)dX $$
 
We usually take log marginal likelihood $logp(Y)$ as objective function during the learning.


#### **Gaussian process with Bayesian inference**

Here show a two-dimensional gaussian process example.
<img src="img/gaussian_process_distribution_example.png" width="600">

The ext problem is now to setup the mean $\mu$ and covariance $\Sigma$? In Gaussian process covariance matrix is determined by convariance function $k$, called *kernel*.  
<img src="img/gaussian_process_kernel.png" width="600">


#### Reference
https://github.com/aidanscannell/probabilistic-modelling/blob/master/notebooks/gaussian-process-regression.ipynb
https://distill.pub/2019/visual-exploration-gaussian-processes/
