# Gaussian calibration

## References

   - N. H. Bingham and John M. Fry (2010). *Regression, Linear Models in Statistics*, Springer Undergraduate Mathematics Series. Springer.

   -  S. Huet, A. Bouvier, M.A. Poursat, and E. Jolivet (2004). *Statistical Tools for Nonlinear Regression*, Springer.

   -  C.E. Rasmussen and C. K. I. Williams (2006), *Gaussian Processes for Machine Learning*, The MIT Press.
   -  Mark Asch, Marc Bocquet, Maëlle Nodet. "Data assimilation: methods, algorithms, and applications". SIAM. (2016)
   -  Inverse Problem Theory and Model Parameter Estimation. Albert Tarantola, SIAM, 2005.

## Introduction

We consider a computer model $\boldsymbol{h}$ (i.e. a deterministic
function) to calibrate:

$$
\boldsymbol{z} = \boldsymbol{h}(\boldsymbol{x}, \boldsymbol{\theta}),
$$

where

-  $\boldsymbol{x} \in \mathbb{R}^{d_x}$ is the input vector;
-  $\boldsymbol{z} \in \mathbb{R}^{d_z}$ is the output vector;
-  $\boldsymbol{\theta} \in \mathbb{R}^{d_h}$ are the unknown parameters of
   $\boldsymbol{h}$ to calibrate.

Let $n \in \mathbb{N}$ be the number of observations. The standard
hypothesis of the probabilistic calibration is:

$$
\boldsymbol{Y}^i = \boldsymbol{z}^i + \boldsymbol{\varepsilon}^i,
$$

for $i=1,...,n$ where $\boldsymbol{\varepsilon}^i$ is a random
measurement error.



The goal of Gaussian calibration is to estimate $\boldsymbol{\theta}$,
based on observations of $n$ inputs
$(\boldsymbol{x}^1, \ldots, \boldsymbol{x}^n)$ and the associated $n$
observations of the output $(\boldsymbol{y}^1, \ldots, \boldsymbol{y}^n)$. In
other words, the calibration process reduces the discrepancy between the
observations $(\boldsymbol{y}^1, \ldots, \boldsymbol{y}^n)$ and the
predictions $\boldsymbol{h}(\boldsymbol{\theta})$. Given that
$(\boldsymbol{y}^1, \ldots, \boldsymbol{y}^n)$ are realizations of a random
variable, the estimate of $\boldsymbol{\theta}$, denoted by
$\hat{\boldsymbol{\theta}}$, is also a random variable. Hence, the
secondary goal of calibration is to estimate the distribution of
$\hat{\boldsymbol{\theta}}$ representing the uncertainty of the
calibration process.

In the remaining of this section, the input $\boldsymbol{x}$ is not
involved anymore in the equations. This is why we simplify the equation
into:

$$
\boldsymbol{z} = \boldsymbol{h}(\boldsymbol{\theta}).
$$

## Bayesian calibration

The bayesian calibration framework is based on two hypotheses.

The first hypothesis is that the parameter $\boldsymbol{\theta}$ has a
known distribution, called the *prior* distribution, and denoted by
$p(\boldsymbol{\theta})$.

The second hypothesis is that the output observations
$(\boldsymbol{y}^1, \ldots, \boldsymbol{y}^n)$ are sampled from a known
conditional distribution denoted by $p(\boldsymbol{y} | \boldsymbol{\theta})$.

For any $\boldsymbol{y}\in\mathbb{R}^{d_z}$ such that $p(\boldsymbol{y})>0$,
the Bayes theorem implies that the conditional distribution of
$\boldsymbol{\theta}$ given $\boldsymbol{y}$ is:

$$
p(\boldsymbol{\theta} | \boldsymbol{y}) = \frac{p(\boldsymbol{y} | \boldsymbol{\theta}) p(\boldsymbol{\theta})}{p(\boldsymbol{y})}
$$

for any $\boldsymbol{\theta}\in\mathbb{R}^{d_h}$.



The denominator of the previous Bayes fraction is independent of
$\boldsymbol{\theta}$, so that the posterior distribution is
proportional to the numerator:

$$
p(\boldsymbol{\theta} | \boldsymbol{y}) \propto  p(\boldsymbol{y} | \boldsymbol{\theta}) p(\boldsymbol{\theta}).
$$

for any $\boldsymbol{\theta}\in\mathbb{R}^{d_h}$.

In the Gaussian calibration, the two previous distributions are assumed
to be Gaussian.

More precisely, we make the hypothesis that the parameter
$\boldsymbol{\theta}$ has the Gaussian distribution:

$$
\boldsymbol{\theta} \sim \mathcal{N}(\boldsymbol{\mu}, B),
$$

where $\boldsymbol{\mu}\in\mathbb{R}^{d_h}$ is the mean of the Gaussian prior
distribution, which is named the *background* and
$B\in\mathbb{R}^{d_h \times d_h}$ is the covariance matrix of the
parameter.

Secondly, we make the hypothesis that the output observations have the
conditional gaussian distribution:

$$
\boldsymbol{y} | \boldsymbol{\theta} \sim \mathcal{N}(\boldsymbol{h}(\boldsymbol{\theta}), R),
$$

where $R\in\mathbb{R}^{d_z \times d_z}$ is the covariance matrix of the
output observations.

## Posterior distribution

Denote by $\|\cdot\|_B$ the Mahalanobis distance associated with
the matrix $B$ :

$$
\|\boldsymbol{\theta}-\boldsymbol{\mu} \|^2_B = (\boldsymbol{\theta}-\boldsymbol{\mu} )^T B^{-1} (\boldsymbol{\theta}-\boldsymbol{\mu} ),
$$

for any $\boldsymbol{\theta},\boldsymbol{\mu} \in \mathbb{R}^{d_h}$. Denote by
$\|\cdot\|_R$ the Mahalanobis distance associated with the matrix
$R$ :

$$
\|\boldsymbol{y}-H(\boldsymbol{\theta})\|^2_R = (\boldsymbol{y}-H(\boldsymbol{\theta}))^T R^{-1} (\boldsymbol{y}-H(\boldsymbol{\theta})).
$$

for any $\boldsymbol{\theta} \in \mathbb{R}^{d_h}$ and any
$\boldsymbol{y} \in \mathbb{R}^{d_z}$. Therefore, the posterior distribution
of $\boldsymbol{\theta}$ given the observations $\boldsymbol{y}$ is :

$$
   p(\boldsymbol{\theta}|\boldsymbol{y}) \propto \exp\left( -\frac{1}{2} \left( \|\boldsymbol{y}-H(\boldsymbol{\theta})\|^2_R 
   + \|\boldsymbol{\theta}-\boldsymbol{\mu} \|^2_B \right) \right)
$$

for any $\boldsymbol{\theta}\in\mathbb{R}^{d_h}$.

## MAP estimator

The maximum of the posterior distribution of $\boldsymbol{\theta}$ given
the observations $\boldsymbol{y}$ is reached at :

$$
   \hat{\boldsymbol{\theta}} = \textrm{argmin}_{\boldsymbol{\theta}\in\mathbb{R}^{d_h}} \frac{1}{2} \left( \|\boldsymbol{y} - H(\boldsymbol{\theta})\|^2_R 
   + \|\boldsymbol{\theta}-\boldsymbol{\mu} \|^2_B \right).
$$

It is called the *maximum a posteriori posterior* estimator or *MAP*
estimator.

## Regularity of solutions of the Gaussian Calibration

The gaussian calibration is a tradeoff, so that the second expression
acts as a *spring* which pulls the parameter $\boldsymbol{\theta}$
closer to the background $\boldsymbol{\mu}$ (depending on the "spring
constant" $B$, meanwhile getting as close a possible to the
observations. Depending on the matrix $B$, the computation may
have better regularity properties than the plain non linear least
squares problem.

## Non Linear Gaussian Calibration : 3DVAR

The cost function of the gaussian nonlinear calibration problem is :

$$
   C(\boldsymbol{\theta}) = \frac{1}{2}\|\boldsymbol{y}-H(\boldsymbol{\theta})\|^2_R 
   + \frac{1}{2}\|\boldsymbol{\theta}-\boldsymbol{\mu} \|^2_B
$$

for any $\boldsymbol{\theta}\in\mathbb{R}^{d_h}$.

The goal of the non linear gaussian calibration is to find the value of
$\boldsymbol{\theta}$ which minimizes the cost function $C$. In
general, this involves using a nonlinear unconstrained optimization
solver.

Let $J \in \mathbb{R}^{n \times d_h}$ be the Jacobian matrix made of
the partial derivatives of $\boldsymbol{h}$ with respect to
$\boldsymbol{\theta}$:

$$
J(\boldsymbol{\theta}) = \frac{\partial \boldsymbol{h}}{\partial \boldsymbol{\theta}}.
$$



The Jacobian matrix of the cost function $C$ can be expressed
depending on the matrices $R$ and $B$ and the Jacobian
matrix of the function $h$:

$$
   \frac{d }{d\boldsymbol{\theta}} C(\boldsymbol{\theta}) 
   = B^{-1} (\boldsymbol{\theta}-\boldsymbol{\mu}) + J(\boldsymbol{\theta})^T R^{-1} (H(\boldsymbol{\theta}) - \boldsymbol{y})
$$

for any $\boldsymbol{\theta}\in\mathbb{R}^{d_h}$.

The Hessian matrix of the cost function is

$$
   \frac{d^2 }{d\boldsymbol{\theta}^2} C(\boldsymbol{\theta}) 
   = B^{-1}  + J(\boldsymbol{\theta})^T R^{-1} J(\boldsymbol{\theta})
$$

for any $\boldsymbol{\theta}\in\mathbb{R}^{d_h}$.

If the covariance matrix $B$ is positive definite, then the
Hessian matrix of the cost function is positive definite. Under this
hypothesis, the solution of the nonlinear gaussian calibration is
unique.



## Solving the Non Linear Gaussian Calibration Problem

The implementation of the resolution of the gaussian non linear
calibration problem involves the Cholesky decomposition of the
covariance matrices $B$ and $R$. This allows to transform
the sum of two Mahalanobis distances into a single euclidian norm. This
leads to a classical non linear least squares problem.

## Linear Gaussian Calibration : bayesian BLUE

We make the hypothesis that $h$ is linear with respect to
$\boldsymbol{\theta}$, i.e., for any
$\boldsymbol{\theta}\in\mathbb{R}^{d_h}$, we have :

$$
h(\boldsymbol{\theta}) = h(\boldsymbol{\mu}) + J(\boldsymbol{\theta}-\boldsymbol{\mu} ),
$$

where $J$ is the constant Jacobian matrix of $h$.

Let $A$ be the matrix:

$$
A^{-1} = B^{-1} + J^T R^{-1} J.
$$

We denote by $K$ the Kalman matrix:

$$
K = A J^T R^{-1}.
$$



The maximum of the posterior distribution of $\boldsymbol{\theta}$ given
the observations $\boldsymbol{y}$ is:

$$
\hat{\boldsymbol{\theta}} = \boldsymbol{\mu} + K (\boldsymbol{y} - H(\boldsymbol{\mu})).
$$

It can be proved that:

$$
   p(\boldsymbol{\theta} | \boldsymbol{y}) \propto 
   \exp\left(\frac{1}{2} (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})^T A^{-1} (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}}) \right)
$$

for any $\boldsymbol{\theta}\in\mathbb{R}^{d_h}$.

This implies:

$$
\hat{\boldsymbol{\theta}} \sim \mathcal{N}(\boldsymbol{\theta},A)
$$