# Mapping Uncertainty using Ensemble Modelling
### Abstract
>When modelling physical systems it is sometimes useful to understand how uncertainty varies in the domain (raneg of input variable values) of the system. One way of doing this is to build a representation of the uncertainty either at the full dimensionality of the domain or aggregation of the uncertainty into a lower dimensional subdomain. A model of the system may consist of a noise-free model of the expected average behaviour with a model of the residual noise added. Thus a simulation of the model might be $p = p_m + p_n$ where $p$ is the simulated property values, $p_m$ is the modelled part of the simulation and $p_n$ is a simulation of the noise model.
>
>The uncertainty in the model can be expressed using maps descriptive statastics of the model in the domain. Useful statistics include the mean, variance, standard deviation and relative standard deviation (RSD). To obtain these descriptive statistics it is necessary to produce a sampling of the models and calculating statistics from these. Given that the simulated value is the sum of the nose free and noise models it is possible to compute the mean and variance of the noise free $\mu_m$ and $\sigma_m^2$ and for the noise $\mu_n$ and $\sigma_n^2$ independently and combine them to produce the mean $\mu$ and variance $\sigma^2$ for the simulations. Other statistics can be derived from the mean and variance using the definitions for standard deviation $\sigma = \sqrt{\sigma^2}$ and for $RSD = \frac{\sigma}{\mu}$
>
>The case where the uncertainty for an ensemble of noise free models is combined with a noise model based on [*Gaussian Process*](https://en.wikipedia.org/wiki/Gaussian_process) is considered.

## Introduction
This is a study of how to map the uncertainty of a modelling process. For the purposes of this study a single property will be modelled, thus the model will be a function from a vector $\mathbf{x}$ to a single estimated value $\hat{p}$, thus $\hat{p} = \mathrm{f}\left(\mathbf{x}\right): \mathbb{R}^n \mapsto \mathbb{R}$. A map of the uncertainty is the distribution of the uncertainty in the domain (or a subdomain) of the model, for variance it is a map of the variance as a function of the domain $\sigma^2\left(\mathbf{x^\prime}\right) = \mathbb{E}\left[\left(\mathrm{f}\left(\mathbf{x^\prime}\right) - \mathbb{E}\left[\mathrm{f}\left(\mathbf{x^\prime}\right)\right]\right)^2\right] : \mathbb{R}^m \mapsto \mathbb{R}$ where the expectation is over all possible models $\mathrm{f(x^\prime)}$, where $m \le n$, if $m \lt n$ the map will be an aggragation of the uncertainty in the missing dimensions into a subdomain.

It is assumed that the modelling uses some data $\mathbf{d}$ to which models can be fitted. The fitting process will produce multiple, possible models rather than a single "best case" model. After fitting there will multiple residuals given $\mathbf{r_i} = \mathbf{d} - \mathbf{p_{m,i}}$ where $\mathbf{r_i}$ are the residuals for model $i$ calculated by subtracting the predicted values for a model $\mathbf{p_{m,i}}$ from the observed data set. Analysis of each set of residuals will produce its own model of the noise.

In this study the residual noise will be modelled using Gaussian Process. Gaussian Processes can be ["completely defined by their second-order statistics](https://en.wikipedia.org/wiki/Gaussian_process#Covariance_functions), this means that only the variance (and co-variance) needs to be modelled. The model is typically a function which describes the variance as a function of separation. A feature of Gaussian Process is that it is possible to estimate the mean and variance from the model without the need to sample model simulations.

## Case Study: Uncertainty Map for a 3D Property
As an example, a 3 dimensional property will be studied. The simulations will occur on a regular grid of cells indexed as $ijk$. For the data available a number $n_m$ of models were produced, for each model a model of the noise is made and used to generate a large number $N$ simulations. Thus for a single model and simulation of the noise for a cell we have.
$$
p_{ijk} = p_{m,ijk} + p_{n,ijk}
$$
Where $p_{ijk}$ is the simulation value for cell $ijk$, $P_{m,ijk}$ is the value predicted by the model for cell $ijk$ and $P_{n,ijk}$ is value for the noise for cell $ijk$.