# data worth

In this notebook, we will use outputs from previous notebooks (in particular `pestpp-glm_part1.ipynb`) to undertake data worth assessments based on first-order second-moment (FOSM) techniques. "Worth" is framed here in the context of the extent to which the uncertainty surrounding a model prediction of management interest is reduced through data collection.  Given that these anayses can help target and optimize data acquisition strategies, this is a concept that really resonates with decision makers.

In [None]:
%matplotlib inline
import os
import shutil
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.rcParams['font.size']=12
import flopy
import pyemu


In [None]:
m_d = "master_glm"

In [None]:
pst = pyemu.Pst(os.path.join(m_d,"freyberg_pp.pst"))
print(pst.npar_adj)
pst.write_par_summary_table(filename="none")

### first ingredient: parameter covariance matrix (representing prior uncertainty in this instance)

In [None]:
cov = pyemu.Cov.from_binary(os.path.join(m_d,"prior_cov.jcb")).to_dataframe()
cov = cov.loc[pst.adj_par_names,pst.adj_par_names]
cov = pyemu.Cov.from_dataframe(cov)

### second ingredient: jacobian matrix

In [None]:
jco = os.path.join(m_d,"freyberg_pp.jcb")

### the third ingredient--the (diagonal) noise covariance matrix--populated on-the-fly using weights when constructing the Schur object below...

In [None]:
sc = pyemu.Schur(jco=jco,parcov=cov)

## there we have it--all computations done and contained within `sc`.  We will only be required to access different parts of `sc` below...

## Parameter uncertainty

First let's inspect the (approx) posterior parameter covariance matrix and the reduction in parameter uncertainty through "data assimilation", before mapping to forecasts... (note that this matrix is ${\it not}$ forecast-specific)

In [None]:
sc.posterior_parameter.to_dataframe().sort_index(axis=1).iloc[100:105:,100:105]

We can see the posterior variance for each parameter along the diagonal. The off-diags are symmetric.

In [None]:
par_sum = sc.get_parameter_summary().sort_values("percent_reduction",ascending=False)
par_sum.head()

In [None]:
par_sum.loc[par_sum.index[:15],"percent_reduction"].plot(kind="bar")

What have we achieved by "notionally calibrating" our model to 13 head and 1 stream flow observations? Which parameters are informed? Will they matter for the forecast of interest? Which ones are un-informed?

## Forecast uncertainty

In [None]:
df = sc.get_forecast_summary()
df

In [None]:
df = df.loc[:,"percent_reduction"].dropna()
df.plot(kind="bar")

### one of important assumptions we have made is that the model is able to fit observations to a level that is commensurate with measurement noise... Are we comfortable with this assumption? We will discuss this more in `pestpp-glm_part2.ipynb`

In [None]:
# recall...
pst.observation_data.loc[pst.nnz_obs_names,:]

### Related to the worth of observation data is the worth of parameter knowledge or "measurement". We quantify the worth of knowing parameters by calculating the parameter contributions to by "fixing" individual parameters or parameter groups and quantifying the uncertainty reduction as a result. Let's do by group.

In [None]:
df = sc.get_par_group_contribution()

In [None]:
par_contrib

In [None]:
base = df.loc["base",:]
df = 100.0 * (base - df) / base

In [None]:
for forecast in df.columns:
    fore_df = df.loc[:,forecast].copy()
    fore_df.sort_values(inplace=True, ascending=False)
    ax = fore_df.iloc[:10].plot(kind="bar")
    ax.set_title(forecast)
    ax.set_ylabel("percent variance reduction")
    plt.show()

In [None]:
df = sc.get_removed_obs_importance()
base = df.loc["base",:]
df = 100 * (df - base) / base
df

In [None]:
for forecast in df.columns:
    fore_df = df.loc[:,forecast].copy()
    fore_df.sort_values(inplace=True, ascending=False)
    ax = fore_df.iloc[:10].plot(kind="bar")
    ax.set_title(forecast)
    ax.set_ylabel("percent variance increase")
    plt.show()

In [None]:
df = sc.get_added_obs_importance()

In [None]:
### toward optimizing data acquisition: what data should we collect 

## an extra: parameter identifiability