# FOSM - a brief overview (with equations!)

## FOSM = "First Order, Second Moment", which is the mathematical description of what is being described

## FOSM = "linear uncertainty analysis", page 460 in Anderson et al. (2015), PEST parlance

<img src="figs/bayes.png" style="float: left; width: 25%; margin-right: 1%; margin-bottom: 0.5em;">
<img src="figs/jacobi.jpg" style="float: left; width: 25%; margin-right: 1%; margin-bottom: 0.5em;">
<img src="figs/gauss.jpg" style="float: left; width: 22%; margin-right: 1%; margin-bottom: 0.5em;">
<img src="figs/schur.jpg" style="float: left; width: 22%; margin-right: 1%; margin-bottom: 0.5em;">
<p style="clear: both;">

## $\underbrace{P(\boldsymbol{\theta}|\textbf{d})}_{\substack{\text{what we} \\ \text{know now}}} \propto \underbrace{\mathcal{L}(\boldsymbol{\theta} | \textbf{d})}_{\substack{\text{what we} \\ \text{learned}}} \underbrace{P(\boldsymbol{\theta})}_{\substack{\text{what we} \\ \text{knew}}} $






## This in a nutshell is the famous Bayes Rule.

### We can also think of this graphically, as taken from Anderson et al. (2015) in slightly different notation but the same equation and concept:

<img src="figs/AW&H2015.png" style="float: right">

<img src="figs/Fig10.3_Bayes_figure.png" style="float: center">

## The problem is for real-world problems, the likelihood function  $\mathcal{L}(\theta | \textbf{D})$ is high-dimensional and non-parameteric, requiring non-linear (typically Monte Carlo) integration for rigorous Bayes

## But, we can make some assumptions and greatly reduce computational burden. This is why we often suggest using these linear methods first before burning the silicon on the non-linear ones like Monte Carlo.  

## How do we reduced the computational burden?  By using these shortcuts:

## 0.) an approximate linear relation between pars and obs  

<img src="figs/jacobi.jpg" style="float: left; width: 15%; margin-right: 1%; margin-bottom: 0.5em;">

##     <center> $\mathbf{J} \approx \text{constant}$, $\frac{\partial\text{obs}}{\partial\text{par}} \approx \text{constant}$</center>

## 1.) The parameter and forecast prior and posterior distributions are approximately Gaussian

<img src="figs/gauss.jpg" style="float: left; width: 10%; margin-right: 1%; margin-bottom: 0.5em;">

##  <center>  $ P(\boldsymbol{\theta}|\mathbf{d}) \approx \mathcal{N}(\overline{\boldsymbol{\mu}}_{\boldsymbol{\theta}},\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}})$ </center>

## Armed with these two assumptions, from Bayes equations, one can derive the Schur complement for conditional uncertainty propogation:
<img src="figs/schur.jpg" style="float: left; width: 10%; margin-right: 1%; margin-bottom: 0.5em;">

## <center> $\underbrace{\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}}}_{\substack{\text{what we} \\ \text{know now}}} = \underbrace{\boldsymbol{\Sigma}_{\boldsymbol{\theta}}}_{\substack{\text{what we} \\ \text{knew}}} - \underbrace{\boldsymbol{\Sigma}_{\boldsymbol{\theta}}\bf{J}^T\left[\bf{J}\boldsymbol{\Sigma}_{\boldsymbol{\theta}}\bf{J}^T + \boldsymbol{\Sigma}_{\boldsymbol{\epsilon}}\right]^{-1}\bf{J}\boldsymbol{\Sigma}_{\boldsymbol{\theta}}}_{\text{what we learned}}$ </center>

## some remarks:
## 0.) no parameter values or observation values
## 1.) "us + data" = $\overline{\Sigma}_{\theta}$; "us" = $\Sigma_{\theta}$
## 2.) the '-' on the RHS shows that we are (hopefully) collapsing the probability manifold in parameter space by "learning" from the data. Or put another way, we are subtracting from the uncertainty we started with (we started with the Prior uncertainty)
## 3.) uncertainty in our measurements of the world is encapsulated in $\Sigma_{\epsilon}$. If the "observations" are highly uncertain, then parameter "learning" decreases because $\Sigma_{\epsilon}$ is in the denominator. Put another way, if our measured data are made (assumed) to be accurate and precise, then uncertainty associated with the parameters that are constrained by these measured data is reduced - we "learn" more. 
## 4.) what quantities are needed? $\bf{J}$, $\boldsymbol{\Sigma}_{\theta}$, and $\boldsymbol{\Sigma}_{\epsilon}$
## 5.) the diagonal of $\Sigma_{\theta}$ and $\overline{\Sigma}_{\theta}$ are the Prior and Posterior uncertainty (variance) of each adjustable parameter

# But what about forecasts? We can use the same assumptions:
<img src="figs/jacobi.jpg" style="float: left; width: 15%; margin-right: 1%; margin-bottom: 0.5em;">
<img src="figs/gauss.jpg" style="float: left; width: 12%; margin-right: 1%; margin-bottom: 0.5em;">
<p style="clear: both;">

## prior forecast uncertainty (variance): $\sigma^2_{s} = \mathbf{y}^T\boldsymbol{\Sigma}_{\boldsymbol{\theta}}\mathbf{y}$
## posterior forecast uncertainty (variance): $\overline{\sigma}^2_{s} = \mathbf{y}^T\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}}\mathbf{y}$
## some remarks:

## - no parameter values or forecast values
## - what's needed? $\bf{y}$, which is the *sensitivity of a given forecast* to each adjustable parameter. Each forecast will have its own $\bf{y}$
## -   How do I get $\bf{y}$? the easiest way is to include your forecast(s) as an observation in the control file - then we get the $\bf{y}$'s for free during the parameter estimation process.

## Mechanics of calculating FOSM parameter and forecast uncertainty estimates

### in the PEST world:
<img src="figs/workflow.png" style="float: left; width: 50%; margin-right: 1%; margin-bottom: 0.5em;">



## in PEST++

<img src="figs/workflow++.png" style="float: left; width: 50%; margin-right: 1%; margin-bottom: 0.5em;">


# Hands on:  Demystifying matrices and vectors used in FOSM

Pages 461-465 of Anderson et al. use the PREDUNC equation of PEST to discuss an applied view of FOSM, what goes into it, and what it means in practice.  Here we will look more closely at these.  The objective is to get a better feel for what is going on under the hood in linear uncertainty analyses. 

In [None]:
%matplotlib inline
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
sys.path.append("..")
import pyemu



# Let's check out the files that pestpp-glm created

### Let's look at the parameter uncertainty summary written by pestpp:

In [None]:
if os.path.exists(os.path.join('..','master_glm','freyberg_pp.par.usum.csv')):
    df = pd.read_csv(os.path.join('..','master_glm','freyberg_pp.par.usum.csv'),index_col=0)
else:
    df = pd.read_csv(os.path.join('pstfiles','freyberg_pp.par.usum.csv'),index_col=0)
df


In [None]:
df.loc['CONST_RECH14__CN'].to_frame().T

In [None]:
axes = pyemu.plot_utils.plot_summary_distributions(df.loc['WELFLUX_023'].to_frame().T,subplots=True)

### There is a similar file for forecasts:

In [None]:
if os.path.exists(os.path.join('..','master_glm','freyberg_pp.pred.usum.csv')):
    axes = pyemu.plot_utils.plot_summary_distributions(os.path.join('..','master_glm','freyberg_pp.pred.usum.csv'),subplots=True)
else:
    axes = pyemu.plot_utils.plot_summary_distributions(os.path.join('pstfiles','freyberg_pp.pred.usum.csv'),subplots=True)

### So that's cool!  Questions:
### - where do the prior parameter distro's come from?
### - where do the prior forecast distro's come from?
### - why are are the posterior distro's differenent than the priors?

## but pyemu does the same calculations, but also allows you to do other, more exciting things...

# FOSM with pyEMU

### The ``Schur`` object is one of the primary object for FOSM in pyEMU and the only one we will talk about in this class

In [None]:
if os.path.exists(os.path.join('..','master_glm','freyberg_pp.jcb')):
    sc = pyemu.Schur(os.path.join('..','master_glm','freyberg_pp.jcb'),verbose=True)
else:
    sc = pyemu.Schur(os.path.join('pstfiles','freyberg_pp.jcb'),verbose=True)


### Now that seemed too easy, right?  Well, underhood the ``Schur`` object found the control file ("freyberg_zn.pst") and used it to build the prior parameter covariance matrix, $\boldsymbol{\Sigma}_{\theta}$, from the parameter bounds and the observation noise covariance matrix ($\boldsymbol{\Sigma}_{\epsilon}$) from the observation weights.  These are the ``Schur.parcov`` and ``Schur.obscov`` attributes.  

### The ``Schur`` object also found the "++forecasts()" optional pestpp argument in the control, found the associated rows in the Jacobian matrix file and extracted those rows to serve as forecast sensitivity vectors:

In [None]:
sc.pst.pestpp_options

### Recall that a Jacobian matrix looks at the changes in observations as a parameter is changed.  Therefore the Jacobian matrix has parameters in the columns and observations in the rows.  The bulk of the matrix is made up of the difference in  observations between a base run and a run where the parameter at the column head was perturbed (typically 1% from the base run value - controlled by the "parameter groups" info).  Now we'll plot out the Jacobian matrix from the fryberg_zones activity:

In [None]:
sc.jco.to_dataframe().loc[sc.pst.nnz_obs_names,:]

### This reports changes in observations to a change in a parameter.  We can report how  forecasts of interests change as the parameter is perturbed.  Note pyemu extracted the forecast rows from the Jacobian on instantiation:

In [None]:
sc.forecasts.to_dataframe()

### Each of these columns in a $\bf{y}$ vector used in the FOSM calculations...that's it!


### But the forecasts also have uncertainty because we have inherent uncertainty in the parameters.  Here's what we have defined for parameter uncertainty - the Prior.  It was constructed on-the-fly from the parameter bounds in the control file: 

In [None]:
sc.parcov.to_dataframe()

### Page 463-464 in Anderson et al. (2015) spends some time on what is shown above.  For our purposes, a diagonal Prior -  numbers only along the diagaonal - shows that we expect the uncertainty for each parameter to only results from itself - there is no covariance with other parameters. The numbers themselves reflect "the innate parameter variability", and is input into the maths as a standard deviation around the parameter value.  This is called the "C(p) matrix of innate parameter variability" in the PEST parlance.

## IMPORTANT POINT:  Again, how did PEST++ and pyEMU get these standard deviations shown in the diagonal?  From the *parameter bounds* that were specified for each parameter in the PEST control file.

### On page 462-463 in Anderson et al. they also point out that a forecast uncertainty has to take into account the noise/uncertainty in the observations.   Similar to the parameter Prior - the $\Sigma_{\theta}$ matrix -, it is a covariance matrix of measurement error associated with the observations.  This is the same as  $\Sigma_{\epsilon}$ that we discussed above. For our Fryberg problem, say each observation had a standard devation of 1 around the observed value.  The $C{\epsilon}$ matrix would look like:

In [None]:
sc.obscov.to_dataframe().loc[sc.pst.nnz_obs_names,sc.pst.nnz_obs_names]

### IMPORTANT POINT:  How did PEST++ and pyEMU get these standard deviations shown in the diagonal?  From the *weights* that were specified for each observation in the PEST control file.

### IMPORTANT POINT: You can use FOSM in the "pre-calibration" state to design an objective function (e.g. weights) to maximize forecast uncertainty reduction.

### IMPORTANT POINT: In PEST++, if a given observation has a larger-than-expected residual, the variance of said observation is reset to the variance implied by the residual.  That is, the diagonal elements of $\Sigma_{\epsilon}$ are reset according to the residuals

### Okay, enough emphasis.  Here's the point.  When we apply FOSM using these matrices above we can see how our uncertainty changes during calibration, first for parameters and then for forecasts:

In [None]:
df = sc.get_parameter_summary()
df.sort_values(ascending=False, by='percent_reduction').iloc[:20].percent_reduction.plot(kind="bar", figsize=(14,4))
df

### Do these results make sense?  

###  Where did the "prior_var" and "post_var" columns come from?

In [None]:
sc.get_forecast_summary()

### Do these results make sense?  Remember, these are not the "calibrated" forecast values, these are the prior (before calibration) and posterior (after calibration) forecast uncertainties...