##Introduction to Gaussian Processes.

$
% DEFINITIONS
\newcommand{\bff}{\mathbf{f}}
\newcommand{\bm}{\mathbf{m}}
\newcommand{\bk}{\mathbf{k}}
\newcommand{\bx}{\mathbf{x}}
\newcommand{\by}{\mathbf{y}}
\newcommand{\bz}{\mathbf{z}}
\newcommand{\bA}{\mathbf{A}}
\newcommand{\bB}{\mathbf{B}}
\newcommand{\bC}{\mathbf{C}}
\newcommand{\bD}{\mathbf{D}}
\newcommand{\bI}{\mathbf{I}}
\newcommand{\bK}{\mathbf{K}}
\newcommand{\bL}{\mathbf{L}}
\newcommand{\bM}{\mathbf{M}}
\newcommand{\bX}{\mathbf{X}}
\newcommand{\bLambda}{\boldsymbol{\Lambda}}
\newcommand{\bSigma}{\boldsymbol{\Sigma}}
\newcommand{\bmu}{\boldsymbol{\mu}}
\newcommand{\calN}{\mathcal{N}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\Rd}{\R^d}
\newcommand{\Rdd}{\R^{d\times d}}
\newcommand{\bzero}{\mathbf{0}}
\newcommand{\GP}{\mbox{GP}}
% END OF DEFINITIONS
$
In many engineering problems we have to deal with functions that are unknown.
For example, in oil reservoir modeling, the permeability tensor $\bK(\bx)$ or the porosity $\phi(\bx)$ of
the ground are, generally, unknown quantities.
Therefore, we would like to treat them as if they where random.
That is, we have to talk about probabilities on function spaces.
Such a thing is achieved via the theory of *random fields*.
However, instead of developing the generic mathematical theory of random fields,
we concentrate on a special class of random fields, the *Gaussian random fields*
or *Gaussian processes*.

A Gaussian process (GP) is a generalization of a multivariate Gaussian distribution to
*infinite* dimensions.
It essentially defines a probability measure on a function space.
When we say that $f(\cdot)$ is a GP, we mean that it is a random variable that is actually
a function.
Mathematically, we write:
\begin{equation}
f(\cdot) | m(\cdot), k(\cdot, \cdot) \sim \GP\left(f(\cdot) | m(\cdot), k(\cdot, \cdot) \right),
\end{equation}
where 
$m:\Rd\rightarrow R$ is the *mean function* and 
$k:\Rd\times\Rd\rightarrow\R$ is the *covariance function*.
So, compared to a multivariate normal we have:

+ A random function $f(\cdot)$ instead of a random vector $\bx$.
+ A mean function $m(\cdot)$ instead of a mean vector $\bmu$.
+ A covariance function $k(\cdot,\cdot)$ instead of a covariance matrix $\bSigma$.

But, what does this definition actually mean? Actually, it gets its meaning from the multivariate Gaussian distribution. Here is how: 

+ Let $\bx_1,\dots,\bx_n$ be $n$ points in $\R^d$. To keep the notation down, let us arrange these
points in an $n\times d$ matrix:
\begin{equation}
\bX =
\left(
\begin{array}{c}
\bx_1\\
\vdots\\
\bx_n
\end{array}
\right).
\end{equation}
+ Let $\bff\in\R^n$ be the outputs of $f(\cdot)$ on each one of the rows of $\bX$, i.e.,
\begin{equation}
\bff =
\left(
\begin{array}{c}
f(\bx_1)\\
\vdots\\
f(\bx_n)
\end{array}
\right).
\end{equation}
+ The fact that $f(\cdot)$ is a GP with mean and covariance function $m(\cdot)$ and $k(\cdot,\cdot)$, respectively, *means* that the vector of outputs $\bff$ at
the arbitrary inputs in $\bX$ is the following multivariate-normal:
\begin{equation}
\bff | \bX, m(\cdot), k(\cdot, \cdot) \sim \calN\left(\bff | \bm(\bX), \bk(\bX, \bX) \right),
\end{equation}
with mean vector:
$$
\bm(\bX) =
\left(
\begin{array}{c}
m(\bx_1)\\
\vdots\\
m(\bx_n)
\end{array}
\right),
$$
and covariance matrix:
$$
\bk(\bX, \bX) = \left(
\begin{array}{ccc}
k(\bx_1,\bx_1) & \dots & k(\bx_1, \bx_n)\\
\vdots & \ddots & \vdots\\
k(\bx_n, \bx_1) & \dots & k(\bx_n, \bx_n)
\end{array}
\right).
$$


###1. Priors on function spaces

###2. Mean function.


## Interpretation of the mean
What is the meaning of $m(\cdot)$?
Well, it is quite easy to grasp.
For any point $\bx\in\R^d$, $m(\bx)$ should give us the value we beleive is more probable for 
$f(\bx)$.
Mathematically, $m(\bx)$ is nothing more than the expected value of the random variable $f(\bx)$.
That is:
\begin{equation}
m(\bx) = \mathbb{E}[f(\bx)].
\end{equation}

In practical application, we usually take the mean to be:

+ zero, i.e.,
$$
m(\bx) = 0.
$$

+ a constant, i.e.,
$$
m(\bx) = c,
$$
where $c$ is a parameter.

+ linear, i.e.,
$$
m(\bx) = c_0 + \sum_{i=1}^dc_ix_i,
$$
where $c_i, i=0,\dots,d$ are parameters.

+ using a set of $m$ basis functions (generalized linear model), i.e.,
$$
m(\bx) = \sum_{i=1}^mc_i\phi_i(\bx),
$$
where $c_i$ and $\phi_i(\cdot)$ are parameters and basis functions.

+ and endless other possibilities.

## Interpretation of the covariance function
What is the meaning of $k(\cdot, \cdot)$?
This concept is considerably more challenging than the mean.
Let's try to break it down:

+ Let $\bx\in\Rd$. Then $k(\bx, \bx)$ is the variance of the random variable $f(\bx)$, i.e.,
$$
\mathbb{V}[f(\bx)] = \mathbb{E}\left[\left(f(\bx) - m(\bx) \right)^2 \right].
$$
In other words, we beleive that there is about $95\%$ probability that the value of
the random variable $f(\bx)$ fall within the interval:
$$
\left((m(\bx) - 2\sqrt{k(\bx, \bx)}, m(\bx) + 2\sqrt{k(\bx,\bx)}\right).
$$

+ Let $\bx,\bx'\Rd$. Then $k(\bx, \bx')$ tells us how the random variable $f(\bx)$ and
$f(\bx')$ are correlated. In particular, $k(\bx,\bx')$ is equal to the covariance
of the random variables $f(\bx)$ and $f(\bx')$, i.e.,
$$
k(\bx, \bx') = \mathbb{C}[f(\bx), f(\bx')]
= \mathbb{E}\left[
\left(f(\bx) - m(\bx)\right)
\left(f(\bx') - m(\bx')\right)
\right].
$$

## Properties of the covariance function

+ There is one property of the covariance function that we can note right away.
Namely, that for any $\bx\in\Rd$, $k(\bx, \bx) > 0$.
This is easly understood by the interpretation of $k(\bx, \bx)$ as the variance
of the random variable $f(\bx)$.

+ $k(\bx, \bx')$ becomes smaller as the distance between $\bx$ and $\bx'$ grows.

+ For any choice of points $\bX\in\R^{n\times d}$, the covariance matrix: $\bK(\bX, \bX)$ has
to be positive-definite (so that the vector of outputs $\bff$ is indeed a multivariate
normal distribution).


###Sampling from a Gaussian Process. 

**Write about the algorithm to sample from a Gaussian Process here**

### Karhunen-Loeve Expansion.

###Numerical Approximation to the KL Expansion. 

**Talk about the Nystrom Approximation**

##Elliptic Partial Differential Equation Example. 

