# Gaussian calibration

$\DeclareMathOperator*{\argmin}{arg\,min}$
$\newcommand{\Rset}{\mathbb{R}}$
$\newcommand{\Nset}{\mathbb{R}}$
$\newcommand{\vect}[1]{\boldsymbol{#1}}$

## References

   - N. H. Bingham and John M. Fry (2010). *Regression, Linear Models in Statistics*, Springer Undergraduate Mathematics Series. Springer.

   -  S. Huet, A. Bouvier, M.A. Poursat, and E. Jolivet (2004). *Statistical Tools for Nonlinear Regression*, Springer.

   -  C.E. Rasmussen and C. K. I. Williams (2006), *Gaussian Processes for Machine Learning*, The MIT Press.

## Introduction

We consider a computer model $\vect{h}$ (i.e. a deterministic
function) to calibrate:

$$
\vect{z} = \vect{h}(\vect{x}, \vect{\theta}),
$$

where

-  $\vect{x} \in \Rset^{d_x}$ is the input vector;
-  $\vect{z} \in \Rset^{d_z}$ is the output vector;
-  $\vect{\theta} \in \Rset^{d_h}$ are the unknown parameters of
   $\vect{h}$ to calibrate.

Let $n \in \Nset$ be the number of observations. The standard
hypothesis of the probabilistic calibration is:

$$
\vect{Y}^i = \vect{z}^i + \vect{\varepsilon}^i,
$$

for $i=1,...,n$ where $\vect{\varepsilon}^i$ is a random
measurement error.



The goal of Gaussian calibration is to estimate $\vect{\theta}$,
based on observations of $n$ inputs
$(\vect{x}^1, \ldots, \vect{x}^n)$ and the associated $n$
observations of the output $(\vect{y}^1, \ldots, \vect{y}^n)$. In
other words, the calibration process reduces the discrepancy between the
observations $(\vect{y}^1, \ldots, \vect{y}^n)$ and the
predictions $\vect{h}(\vect{\theta})$. Given that
$(\vect{y}^1, \ldots, \vect{y}^n)$ are realizations of a random
variable, the estimate of $\vect{\theta}$, denoted by
$\hat{\vect{\theta}}$, is also a random variable. Hence, the
secondary goal of calibration is to estimate the distribution of
$\hat{\vect{\theta}}$ representing the uncertainty of the
calibration process.

In the remaining of this section, the input $\vect{x}$ is not
involved anymore in the equations. This is why we simplify the equation
into:

$$
\vect{z} = \vect{h}(\vect{\theta}).
$$

## Bayesian calibration

The bayesian calibration framework is based on two hypotheses.

The first hypothesis is that the parameter $\vect{\theta}$ has a
known distribution, called the *prior* distribution, and denoted by
$p(\vect{\theta})$.

The second hypothesis is that the output observations
$(\vect{y}^1, \ldots, \vect{y}^n)$ are sampled from a known
conditional distribution denoted by $p(\vect{y} | \vect{\theta})$.

For any $\vect{y}\in\Rset^{d_z}$ such that $p(\vect{y})>0$,
the Bayes theorem implies that the conditional distribution of
$\vect{\theta}$ given $\vect{y}$ is:

$$
p(\vect{\theta} | \vect{y}) = \frac{p(\vect{y} | \vect{\theta}) p(\vect{\theta})}{p(\vect{y})}
$$

for any $\vect{\theta}\in\Rset^{d_h}$.



The denominator of the previous Bayes fraction is independent of
$\vect{\theta}$, so that the posterior distribution is
proportional to the numerator:

$$
p(\vect{\theta} | \vect{y}) \propto  p(\vect{y} | \vect{\theta}) p(\vect{\theta}).
$$

for any $\vect{\theta}\in\Rset^{d_h}$.

In the Gaussian calibration, the two previous distributions are assumed
to be Gaussian.

More precisely, we make the hypothesis that the parameter
$\vect{\theta}$ has the Gaussian distribution:

$$
\vect{\theta} \sim \mathcal{N}(\vect{\mu}, B),
$$

where $\vect{\mu}\in\Rset^{d_h}$ is the mean of the Gaussian prior
distribution, which is named the *background* and
$B\in\Rset^{d_h \times d_h}$ is the covariance matrix of the
parameter.

Secondly, we make the hypothesis that the output observations have the
conditional gaussian distribution:

$$
\vect{y} | \vect{\theta} \sim \mathcal{N}(\vect{h}(\vect{\theta}), R),
$$

where $R\in\Rset^{d_z \times d_z}$ is the covariance matrix of the
output observations.

## Posterior distribution

Denote by $\|\cdot\|_B$ the Mahalanobis distance associated with
the matrix $B$ :

$$
\|\vect{\theta}-\vect{\mu} \|^2_B = (\vect{\theta}-\vect{\mu} )^T B^{-1} (\vect{\theta}-\vect{\mu} ),
$$

for any $\vect{\theta},\vect{\mu} \in \Rset^{d_h}$. Denote by
$\|\cdot\|_R$ the Mahalanobis distance associated with the matrix
$R$ :

$$
\|\vect{y}-H(\vect{\theta})\|^2_R = (\vect{y}-H(\vect{\theta}))^T R^{-1} (\vect{y}-H(\vect{\theta})).
$$

for any $\vect{\theta} \in \Rset^{d_h}$ and any
$\vect{y} \in \Rset^{d_z}$. Therefore, the posterior distribution
of $\vect{\theta}$ given the observations $\vect{y}$ is :

$$
   p(\vect{\theta}|\vect{y}) \propto \exp\left( -\frac{1}{2} \left( \|\vect{y}-H(\vect{\theta})\|^2_R 
   + \|\vect{\theta}-\vect{\mu} \|^2_B \right) \right)
$$

for any $\vect{\theta}\in\Rset^{d_h}$.

## MAP estimator

The maximum of the posterior distribution of $\vect{\theta}$ given
the observations $\vect{y}$ is reached at :

$$
   \hat{\vect{\theta}} = \argmin_{\vect{\theta}\in\Rset^{d_h}} \frac{1}{2} \left( \|\vect{y} - H(\vect{\theta})\|^2_R 
   + \|\vect{\theta}-\vect{\mu} \|^2_B \right).
$$

It is called the *maximum a posteriori posterior* estimator or *MAP*
estimator.

## Regularity of solutions of the Gaussian Calibration

The gaussian calibration is a tradeoff, so that the second expression
acts as a *spring* which pulls the parameter $\vect{\theta}$
closer to the background $\vect{\mu}$ (depending on the "spring
constant" $B$, meanwhile getting as close a possible to the
observations. Depending on the matrix $B$, the computation may
have better regularity properties than the plain non linear least
squares problem.

## Non Linear Gaussian Calibration : 3DVAR

The cost function of the gaussian nonlinear calibration problem is :

$$
   C(\vect{\theta}) = \frac{1}{2}\|\vect{y}-H(\vect{\theta})\|^2_R 
   + \frac{1}{2}\|\vect{\theta}-\vect{\mu} \|^2_B
$$

for any $\vect{\theta}\in\Rset^{d_h}$.

The goal of the non linear gaussian calibration is to find the value of
$\vect{\theta}$ which minimizes the cost function $C$. In
general, this involves using a nonlinear unconstrained optimization
solver.

Let $J \in \Rset^{n \times d_h}$ be the Jacobian matrix made of
the partial derivatives of $\vect{h}$ with respect to
$\vect{\theta}$:

$$
J(\vect{\theta}) = \frac{\partial \vect{h}}{\partial \vect{\theta}}.
$$



The Jacobian matrix of the cost function $C$ can be expressed
depending on the matrices $R$ and $B$ and the Jacobian
matrix of the function $h$:

$$
   \frac{d }{d\vect{\theta}} C(\vect{\theta}) 
   = B^{-1} (\vect{\theta}-\vect{\mu}) + J(\vect{\theta})^T R^{-1} (H(\vect{\theta}) - \vect{y})
$$

for any $\vect{\theta}\in\Rset^{d_h}$.

The Hessian matrix of the cost function is

$$
   \frac{d^2 }{d\vect{\theta}^2} C(\vect{\theta}) 
   = B^{-1}  + J(\vect{\theta})^T R^{-1} J(\vect{\theta})
$$

for any $\vect{\theta}\in\Rset^{d_h}$.

If the covariance matrix $B$ is positive definite, then the
Hessian matrix of the cost function is positive definite. Under this
hypothesis, the solution of the nonlinear gaussian calibration is
unique.



## Solving the Non Linear Gaussian Calibration Problem

The implementation of the resolution of the gaussian non linear
calibration problem involves the Cholesky decomposition of the
covariance matrices $B$ and $R$. This allows to transform
the sum of two Mahalanobis distances into a single euclidian norm. This
leads to a classical non linear least squares problem.

## Linear Gaussian Calibration : bayesian BLUE

We make the hypothesis that $h$ is linear with respect to
$\vect{\theta}$, i.e., for any
$\vect{\theta}\in\Rset^{d_h}$, we have :

$$
h(\vect{\theta}) = h(\vect{\mu}) + J(\vect{\theta}-\vect{\mu} ),
$$

where $J$ is the constant Jacobian matrix of $h$.

Let $A$ be the matrix:

$$
A^{-1} = B^{-1} + J^T R^{-1} J.
$$

We denote by $K$ the Kalman matrix:

$$
K = A J^T R^{-1}.
$$



The maximum of the posterior distribution of $\vect{\theta}$ given
the observations $\vect{y}$ is:

$$
\hat{\vect{\theta}} = \vect{\mu} + K (\vect{y} - H(\vect{\mu})).
$$

It can be proved that:

$$
   p(\vect{\theta} | \vect{y}) \propto 
   \exp\left(\frac{1}{2} (\vect{\theta} - \hat{\vect{\theta}})^T A^{-1} (\vect{\theta} - \hat{\vect{\theta}}) \right)
$$

for any $\vect{\theta}\in\Rset^{d_h}$.

This implies:

$$
\hat{\vect{\theta}} \sim \mathcal{N}(\vect{\theta},A)
$$