In [None]:
%pylab inline
from __future__ import print_function
import os
import numpy as np

##Model background
Here is an example based on the Henry saltwater intrusion problem.  The synthetic model is a 2-dimensional SEAWAT model (X-Z domain) with 1 row, 120 columns and 20 layers.  The left boundary is a specified flux of freshwater, the right boundary is a specified head and concentration saltwater boundary.  The model has two stress periods: an initial steady state (calibration) period, then a transient period with less flux (forecast).  

<img src="henry/domain.png" width=800/>

The inverse problem has 603 parameters: 600 hydraulic conductivity pilot points, 1 global hydraulic conductivity, 1 specified flux multiplier for history matching and 1 specified flux multiplier for forecast conditions.  The inverse problem has 36 obseravtions (21 heads and 15 concentrations) measured at the end of the steady-state calibration period.  Additional, zero-weight observations of head and concentration are also available at each observation location at the end of the transient forecast period as is the distance from the left edge of the domain to the 1%, 10% and 50% saltwater contours in the basal model layer at the end of the forecast stress period.  These distances, named ```pd_one```, ```pd_ten``` and ```pd_half``` are the forecasts we are interested in.  I previously calculated the jacobian matrix, which is in the `henry/` folder, along with the PEST control file.



##Using `pyemu`

In [None]:
import pyemu

First create a linear_analysis object.  We will use `schur`  derived type, which replicates the behavior of the `PREDUNC` suite of PEST.  We pass it the name of the jacobian matrix file.  Since we don't pass an explicit argument for `parcov` or `obscov`, `pyemu` attempts to build them from the parameter bounds and observation weights in a pest control file (.pst) with the same base case name as the jacobian.  Since we are interested in forecast uncertainty as well as parameter uncertainty, we also pass the names of the forecast sensitivity vectors we are interested in, which are stored in the jacobian as well.  Note that the `forecasts` argument can be a mixed list of observation names, other jacobian files or PEST-compatible ASCII matrix files.

In [None]:
forecasts = ["pd_one","pd_ten","pd_half"]
la = pyemu.schur(jco=os.path.join("henry", "pest.jcb"), forecasts=forecasts)


The screen output can be redirected to a log file by passing a file name to the `verbose` keyword argument.  Or screen output can be stopped by passing `False` to the `verbose` argument

In [None]:
la = pyemu.schur(jco=os.path.join("henry", "pest.jcb"), forecasts=forecasts,verbose=False)
obs_names = la.pst.obs_names
[obs_names.remove(name) for name in ["pd_one","pd_ten","pd_half"]]
la = la.get(par_names=la.pst.par_names,obs_names=obs_names)

We can inspect the parcov and obscov attributes by saving them to files.  We can save them PEST-compatible ASCII or binary matrices (`.to_ascii()` or `.to_binary()`), PEST-compatible uncertainty files (`.to_uncfile()`), or simply as numpy ASCII arrays (`numpy.savetxt()`).  In fact, all matrix and covariance objects (including the forecasts) have these methods.  


In [None]:
la.parcov.to_uncfile(os.path.join("henry", "parcov.unc"), covmat_file=os.path.join("henry","parcov.mat"))

When saving an uncertainty file, if the covariance object is diagonal (`self.isdiagonal == True`), then you can force the uncertainty file to use standard deviation blocks instead of covariance matrix blocks by explicitly passing `covmat_file` as `None`:

In [None]:
la.obscov.to_uncfile(os.path.join("henry", "obscov.unc"), covmat_file=None)

##Posterior parameter uncertainty analysis
Let's calculate and save the posterior parameter covariance matrix:

In [None]:
la.posterior_parameter.to_ascii(os.path.join("henry", "posterior.mat"))

You can open this file in a text editor to examine.  The diagonal of this matrix is the posterior variance of each parameter. Since we already calculated the posterior parameter covariance matrix, additional calls to the `posterior_parameter` decorated method only require access:


In [None]:
la.posterior_parameter.to_dataframe() #look so nice in the notebook

We can see the posterior variance of each parameter along the diagonal of this matrix. Now, let's make a simple plot of prior vs posterior uncertainty for the 600 pilot point parameters

In [None]:
fig = plt.figure(figsize=(10,5))
ax = plt.subplot(111)
#the prior is already diagonal
prior_var = la.parcov.x[3:,0]
#extract the diagaonl of the posterior
post_var = np.diag(la.posterior_parameter.x)[3:]
#calculate the % uncertainty reduction 
ureduce = 100.0 * (1.0 - (post_var/prior_var))
index = np.arange(600)
width = 1.0
ax.bar(index,ureduce,width=width,facecolor='b',edgecolor="none")
ax.set_ylabel("% uncertainty reduction")
ax.set_xlabel("parameter number")
plt.show()

We can see that the at most, the uncertainty of any one of the 600 hydraulic conductivity parameters is only reduced by 5% and the uncertainty of many parameters has not been reduced at all, meaning these parameters are not informed by the observations.  

##Prior forecast uncertainty
Now let's examine the prior variance of the forecasts:

In [None]:
prior = la.prior_forecast
print(prior) # dict keyed on forecast name


Sometimes, it is more intuitive to think in terms of standard deviation, which in this case has units of ```meters``` and can be thought of as the "+/-" around the model-predicted distance from the left edge of the domain to the three saltwater concentration contours

In [None]:
for pname,var in la.prior_forecast.items():
    print(pname,np.sqrt(var))

##Posterior forecast uncertainty
Now, let's calculate the posterior uncertainty (variance) of each forecast:

In [None]:
post = la.posterior_forecast
for pname,var in post.items():
    print(pname,np.sqrt(var))

That's it - we have completed linear-based uncertainty analysis for a model with 603 parameters and we completed it before actual inversion so we can estimate the worth of continuing and actually completing the expense inversion process!  We can see that the data we have provide atleast some conditioning to each of these forecasts, indicating that the history-matching process is valuable:

In [None]:
print("{0:15s} {1:>10s} {2:>10s} {3:>10s}".format("forecast","prior var","post var","% reduced"))
for pname in prior.keys():
    uncert_reduction = 100.0 * ((prior[pname] - post[pname]) / prior[pname])
    print("{0:15s} {1:10.3f} {2:10.3f} {3:10.3f}".format(pname,prior[pname],post[pname],uncert_reduction))
    #print pname,prior[pname],post[pname],uncert_reduction

It is interesting that the uncertainty of the forecasts is reduced substantially even though the uncertainty for any one parameter is only slightly reduced.  This is because the right combinations of forecast-sensitive parameters are being informed by the observations.

##Data worth
Now, let's try to identify which observations are most important to reducing the posterior uncertainty (e.g.the forecast worth of every observation).  We simply recalculate Schur's complement without some observations and see how the posterior forecast uncertainty increases


```importance_of_obesrvation_groups()``` is a thin wrapper that calls the underlying ```importance_of_observations()``` method using the observation groups in the pest control file and stacks the results into a ```pandas DataFrame```.  

lets see if the heads or the concentrations are more important:

In [None]:
df = la.importance_of_observation_groups()
df

```base``` row are the results of Schur's complement calculation using all observations.  The increase in posterior forecast uncertainty for the ```head``` and ```conc``` cases show how much forecast uncertainty increases when the head and concentrations observations are not used in history matching

So, it looks like the heads and concentrations are both important for reducing the posterior uncertainty of the forecasts. 

##parameter contribution to uncertainty
Lets look at which parameters are contributing most to forecast uncertainty.  for demostration purposes, lets group the hydraulic conductivity parameters by row.

In [None]:
par_groups = {}
for pname in la.pst.par_names:
    if pname.startswith('k'):
        row = "k_row_"+pname[2:4]
        if row not in par_groups.keys():
            par_groups[row] = []
        par_groups[row].append(pname)

par_groups["global_k"] = "global_k"
par_groups["histmatch_mult"] = "mult1"
par_groups["forecast_mult"] = "mult2"
df = la.get_contribution_dataframe(par_groups)
df

We see that the largest contributions to forecast uncertainty depends on the forecast. Forecast ```pd_half``` is most sensitive to hydraulic conductivity parameters in row 10.  However, Forecasts ```pd_one``` and ```pd_ten``` are most sensitive to the ```global_k``` parameter.