# FOSM - a brief overview (with equations!)

## FOSM = "First Order, Second Moment", which is the mathematical description of what is being described

## FOSM = "linear uncertainty analysis", page 460 in Anderson et al. (2015), PEST parlance

<img src="bayes.png" style="float: left; width: 25%; margin-right: 1%; margin-bottom: 0.5em;">
<img src="jacobi.jpg" style="float: left; width: 25%; margin-right: 1%; margin-bottom: 0.5em;">
<img src="gauss.jpg" style="float: left; width: 22%; margin-right: 1%; margin-bottom: 0.5em;">
<img src="schur.jpg" style="float: left; width: 22%; margin-right: 1%; margin-bottom: 0.5em;">
<p style="clear: both;">

## $\underbrace{P(\theta|\textbf{D})}_{\substack{\text{what we} \\ \text{know now}}} \propto \underbrace{\mathcal{L}(\theta | \textbf{D})}_{\substack{\text{what we} \\ \text{learned}}} \underbrace{P(\theta)}_{\substack{\text{what we} \\ \text{knew}}} $






## This in a nutshell is the famous Bayes Rule.

### We can also think of this graphically, as taken from Anderson et al. (2015) in slightly different notation but the same equation and concept:

<img src="AW&H2015.png" style="float: right">

<img src="Fig10.3_Bayes_figure.png" style="float: center">

## The problem is for real-world problems, the likelihood function  $\mathcal{L}(\theta | \textbf{D})$ is high-dimensional and non-parameteric, requiring non-linear (typically Monte Carlo) integration for rigorous Bayes

## But, we can make some assumptions and greatly reduce computational burden. This is why we often suggest using these linear methods first before burning the silicon on the non-linear ones like Monte Carlo.  

## How do we reduced the computational burden?  By using these shortcuts:

## 0.) an approximate linear relation between pars and obs  
<img src="jacobi.jpg",width=200,height=200>

##     <center> $\mathbf{J} \approx \text{constant}$, $\frac{\partial\text{obs}}{\partial\text{par}} \approx \text{constant}$</center>


## 1.) The parameter and forecast posterior distribution is approximately Gaussian
<img src="gauss.jpg",width=200,height=200>
##  <center>  $ P(\boldsymbol{\theta}|\mathbf{D}) \approx \mathcal{N}(\overline{\boldsymbol{\mu}}_{\boldsymbol{\theta}},\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}})$ </center>

## Armed with these two assumptions, from Bayes equations, one can derive the Schur complement for conditional uncertainty propogation:
<img src="schur.jpg",width=200,height=200>


## <center> $\underbrace{\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}}}_{\substack{\text{what we} \\ \text{know now}}} = \underbrace{\boldsymbol{\Sigma}_{\boldsymbol{\theta}}}_{\substack{\text{what we} \\ \text{knew}}} - \underbrace{\boldsymbol{\Sigma}_{\boldsymbol{\theta}}\bf{J}^T\left[\bf{J}\boldsymbol{\Sigma}_{\boldsymbol{\theta}}\bf{J}^T + \boldsymbol{\Sigma}_{\boldsymbol{\epsilon}}\right]^{-1}\bf{J}\boldsymbol{\Sigma}_{\boldsymbol{\theta}}}_{\text{what we learned}}$ </center>

## some remarks:
## 0.) no parameter values or observation values
## 1.) "us + data" = $\overline{\Sigma}_{\theta}$; "us" = $\Sigma_{\theta}$
## 2.) the '-' on the RHS shows that we are (hopefully) collapsing the probability manifold in parameter space by "learning" from the data. Or put another way, we are subtracting from the uncertainty we started with (we started with the Prior uncertainty)
## 3.) uncertainty in our measurements of the world is encapsulated in $\Sigma_{\epsilon}$. If the "observations" are highly uncertain, then parameter "learning" decreases because $\Sigma_{\epsilon}$ is in the denominator. Put another way, if our measured data are made (assumed) to be accurate and precise, then uncertainty associated with the parameters that are constrained by these measured data is reduced - we "learn" more. 
## 4.) what quantities are needed? $\bf{J}$, $\boldsymbol{\Sigma}_{\theta}$, and $\boldsymbol{\Sigma}_{\epsilon}$
## 5.) the diagonal of $\Sigma_{\theta}$ and $\overline{\Sigma}_{\theta}$ are the Prior and Posterior uncertainty (variance) of each adjustable parameter

# But what about forecasts? We can use the same assumptions:
<img src="jacobi.jpg" style="float: left; width: 25%; margin-right: 1%; margin-bottom: 0.5em;">
<img src="gauss.jpg" style="float: left; width: 22%; margin-right: 1%; margin-bottom: 0.5em;">
<p style="clear: both;">

## prior forecast uncertainty (variance): $\sigma^2_{s} = \mathbf{y}^T\boldsymbol{\Sigma}_{\boldsymbol{\theta}}\mathbf{y}$
## posterior forecast uncertainty (variance): $\overline{\sigma}^2_{s} = \mathbf{y}^T\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}}\mathbf{y}$
## some remarks:
## 0.) no parameter values or forecast values
## 1.) what's needed? $\bf{y}$, which is the *sensitivity of a given forecast* to each adjustable parameter. Each forecast will have its own $\bf{y}$
## 2.)   How do I get $\bf{y}$? the easiest way is to include your forecast(s) as an observation in the control file - then we get the $\bf{y}$'s for free during the parameter estimation process.

# Mechanics of calculating FOSM parameter and forecast uncertainty estimates

## in the PEST world:
<img src="workflow.png",width=1000,height=200>

## in PEST++
<img src="workflow++.png",width=1000,height=200>

# Hands on:  Demystifying matrices and vectors used in FOSM

### Pages 461-465 of Anderson et al. use the PREDUNC equation of PEST to discuss an applied view of FOSM, what goes into it, and what it means in practice.  Here we will look more closely at these.  The objective is to get a better feel for what is going on under the hood in linear uncertainty analyses. 

In [2]:
%matplotlib inline
import os
import numpy as np
import matplotlib.pyplot as plt
import pyemu



setting random seed


In [3]:
# get some files to look at
sc = pyemu.Schur(jco="freyberg_zones_alt.jcb")

### Recall that a Jacobian matrix looks at the changes in observations as a parameter is changed.  Therefore the Jacobian matrix has parameters in the columns and observations in the rows.  The bulk of the matrix is made up of the difference in  observations between a base run and a run where the parameter at the column head was perturbed (typically 1% from the base run value - controlled by the "parameter groups" info).  Now we'll plot out the Jacobian matrix from the fryberg_zones activity:

In [4]:
sc.jco.to_dataframe().loc[sc.pst.nnz_obs_names,:]

Unnamed: 0,hk1,hk2,hk3,hk4,hk5,hk6
cr03c16,0.0,-0.006942,-0.002314,0.196697,-0.004628,-0.437361
cr03c10,0.0,-0.143473,-0.057852,-0.888606,-0.004628,-0.178184
cr04c9,0.0,-0.180498,-0.053224,-1.059848,-0.006942,-0.210581
cr10c2,0.0,-1.916057,-1.832751,-1.360678,-0.057852,-0.35174
cr14c11,0.0,-0.01157,-0.004628,0.344798,-0.013884,-0.724307
cr16c17,0.0,0.0,0.0,0.039339,-0.194383,-0.131902
cr22c11,0.0,0.048596,-0.016199,-0.037025,-0.069422,-0.597032
cr23c16,0.0,0.002314,0.0,0.009256,-0.398022,0.138845
cr25c5,0.0,0.323971,-0.141159,-1.737873,-0.199011,-0.914061
cr27c7,0.0,0.182812,0.013884,-0.740505,-0.231408,-1.064476


### This reports changes in observations to a change in a parameter.  We can report how  forecasts of interests change as the parameter is perturbed.  Note pyemu extraacted the forecast rows from the Jacobian on instantiation:

In [5]:
sc.forecasts.to_dataframe()

Unnamed: 0,rivflux_cal,rivflux_fore,travel_time,fr03c16,fr04c9
hk1,0.0,0.0,0.0,0.0,0.0
hk2,53.455223,60.628868,-1724.220207,-0.004628,-0.180498
hk3,57.157749,92.563157,-217.060603,-0.002314,-0.069422
hk4,-183.275051,1.388447,-1170.923936,0.222152,-0.914061
hk5,-699.777467,-906.656123,-1719.823457,-0.002314,-0.006942
hk6,-1716.120931,-1381.967934,-1328.281303,-0.393393,-0.17587


### But the forecasts also have uncertainty because we have inherent uncertainty in the parameters.  Here's what we have defined for parameter uncertainty - the Prior.  It was constructed on-the-fly from the parameter bounds in the control file: 

In [6]:
sc.parcov.to_dataframe()

Unnamed: 0,hk1,hk2,hk3,hk4,hk5,hk6
hk1,25.0,0.0,0.0,0.0,0.0,0.0
hk2,0.0,25.0,0.0,0.0,0.0,0.0
hk3,0.0,0.0,25.0,0.0,0.0,0.0
hk4,0.0,0.0,0.0,25.0,0.0,0.0
hk5,0.0,0.0,0.0,0.0,25.0,0.0
hk6,0.0,0.0,0.0,0.0,0.0,25.0


### Page 463-464 in Anderson et al. (2015) spends some time on what is shown above.  For our purposes, a diagonal Prior -  numbers only along the diagaonal - shows that we expect the uncertainty for each parameter to only results from itself - there is no covariance with other parameters. The numbers themselves reflect "the innate parameter variability", and is input into the maths as a standard deviation around the parameter value.  This is called the "C(p) matrix of innate parameter variability" in the PEST parlance.

## IMPORTANT POINT:  Again, how did PEST++ get these standard deviations shown in the diagonal?  From the *parameter bounds* that were specified for each parameter in the PEST control file.

### On page 462-463 in Anderson et al. they also point out that a forecast uncertainty has to take into account the noise/uncertainty in the observations.   Similar to the parameter Prior - the $\Sigma_{\theta}$ matrix -, it is a covariance matrix of measurement error associated with the observations.  This is the same as  $\Sigma_{\epsilon}$ that we discussed above. For our Fryberg problem, say each observation had a standard devation of 1 around the observed value.  The $C{\epsilon}$ matrix would look like:

In [7]:
sc.obscov.to_dataframe().loc[sc.pst.nnz_obs_names,sc.pst.nnz_obs_names]

Unnamed: 0,cr03c16,cr03c10,cr04c9,cr10c2,cr14c11,cr16c17,cr22c11,cr23c16,cr25c5,cr27c7,cr30c16,cr34c8,cr35c11
cr03c16,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
cr03c10,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
cr04c9,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
cr10c2,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
cr14c11,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
cr16c17,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
cr22c11,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
cr23c16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
cr25c5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
cr27c7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0


## IMPORTANT POINT:  How did PEST++ get these standard deviations shown in the diagonal?  From the *weights* that were specified for each observation in the PEST control file.

## IMPORTANT POINT: You can use FOSM in the "pre-calibration" state to design an objective function (e.g. weights) to maximize forecast uncertainty reduction.

## IMPORTANT POINT: In PEST++, if a given observation has a larger-than-expected residual, the variance of said observation is reset to the variance implied by the residual.  That is, the diagonal elements of $\Sigma_{\epsilon}$ are reset according to the residuals

### Okay, enough emphasis.  Here's the point.  When we apply FOSM using these matrices above we can see how our uncertainty changes during calibration, first for parameters and then for forecasts:

In [8]:
sc.get_parameter_summary()

Unnamed: 0,percent_reduction,post_var,prior_var
hk1,0.0,25.0,25.0
hk2,83.355118,4.16122,25.0
hk3,77.741373,5.564657,25.0
hk4,98.908852,0.272787,25.0
hk5,98.065483,0.483629,25.0
hk6,98.639404,0.340149,25.0


### Where did the "prior_var" and "post_var" columns come from?

### Why did uncertainty in "hk1" not change?

In [9]:
sc.get_forecast_summary()

Unnamed: 0,percent_reduction,post_var,prior_var
rivflux_cal,98.514221,1290575.0,86861840.0
rivflux_fore,98.250037,1200521.0,68602660.0
travel_time,93.271596,15329390.0,227830900.0
fr03c16,98.378174,0.08277065,5.103546
fr04c9,98.599511,0.3164704,22.59713
