# Running `PESTPP-IES`

In this notebook we will use the pest interface that we constructed and tested in the previous notebooks to do prior and posterior parameter and forecast non-linear uncertainty analysis! exciting

In [None]:
import sys
import os
import shutil
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning) 
sys.path.append('../../dependencies/')
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
font = {'size'   : 12}
mpl.rc('font', **font)
import flopy as fp
import pyemu


First let's make sure the previous steps have been completed

In [None]:
mname = "sgn_50"
t_d = os.path.join("..","..","models","template")
assert os.path.exists(t_d)

In [None]:
pst = pyemu.Pst(os.path.join(t_d,"sgn.pst"))
assert pst.nobs != pst.nnz_obs

A critical first step in any predictive modeling analysis is a simple Monte Carlo analysis.  While simple in concept, this analysis provides incredible insights for many aspects of the modeling analysis, including clues about model stability (or otherwise), prior-data conflict, and, if the outputs of interest are included as "observations" in the control file (as they are in this case), then you also get Prior predictive uncertainty!

Mechanically, if you set `noptmax=-1`, this tell PESTPP-IES to evaluate the initial (e.g. prior) parameter ensemble and quit. easy as!

In [None]:
pst.control_data.noptmax = -1
pst.pestpp_options["save_binary"] = True
pst.write(os.path.join(t_d,"sgn.pst"),version=2)

Here we will use a pyemu helper function to start PESTPP-IES parallel mode so that we have a master instance and several workers.  The master coordinates the runs that need to be done and the workers just work....

VERY IMPORTANT:  the `num_worker` argument needs to be set with respect to the computational power of your machine.  If you have a beefy workstation, then 10 is reasonable.  If you a simple laptop, you probably need to use 4 or 5.  

In [None]:
worker_root = os.path.join("..","..","models")
pmc_m_d = os.path.join(worker_root,"master_prior_mc")
pyemu.os_utils.start_workers(t_d,"pestpp-ies","sgn.pst",num_workers=10,
                             master_dir=pmc_m_d,worker_root=worker_root,
                            port=4269)

Sweet!  Now we are in a position to plot the prior monte carlo results.  First, let's compare the simulated outputs to the observed values.  In this case, PESTPP-IES creates an "obs+noise" ensemble that, as the name suggests, is an observation ensemble of obseration values with unique, additive observation noise realizations (the observation standard deviation is taken as the inverse of the observation weight unless otherwise specified).  Conceptually, this means that the posterior PESTPP-IES result will account for both parameter and observation noise uncertainty!  

Below we will plot each obs+noise PDF with the corresponding simulated PDF

In [None]:
plot_cols = pst.observation_data.loc[pst.nnz_obs_names].apply(lambda x: x.usecol + " "+x.oname,axis=1).to_dict()
plot_cols = {v: [k] for k, v in plot_cols.items()}
obs_plus_noise = pyemu.ObservationEnsemble.from_binary(pst=pst,filename=os.path.join(pmc_m_d,"sgn.obs+noise.jcb"))
pr_oe = pyemu.ObservationEnsemble.from_binary(pst=pst,filename=os.path.join(pmc_m_d,"sgn.0.obs.jcb"))
pyemu.plot_utils.ensemble_helper({"r":obs_plus_noise,"0.5":pr_oe},
                                 plot_cols=plot_cols,bins=20,sync_bins=False,
                                func_dict={o:lambda x: np.log10(x) for o in pst.nnz_obs_names if "conc" in o},
                                density=True)
plt.show()

Any thoughts on these plots?  Any plot where the prior simulated PDF doesnt statistically cover the obs+noise PDF is a problem...

OK! Now we are ready for some (attempted) history matching.  Let's use 2 iterations

In [None]:
noptm = 2
pst.control_data.noptmax = noptm
pst.pestpp_options["ies_no_noise"] = False
pst.write(os.path.join(t_d,"sgn.pst"),version=2)
ies_m_d = os.path.join(worker_root,"master_ies")

VERY IMPORTANT:  the `num_worker` argument needs to be set with respect to the computational power of your machine.  If you have a beefy workstation, then 10 is reasonable.  If you a simple laptop, you probably need to use 4 or 5. 

In [None]:
pyemu.os_utils.start_workers(t_d,"pestpp-ies","sgn.pst",num_workers=10,
                             master_dir=ies_m_d,worker_root=worker_root,port=4269)

Sweet!  Let's take a peek at the phi summary information from `pestpp-ies`

In [None]:
df = pd.read_csv(os.path.join(ies_m_d,"sgn.phi.actual.csv"))
df

In [None]:
fig,ax = plt.subplots(1,1)
_ = [ax.plot(df.total_runs,np.log10(df.loc[:,i].values),"0.5",lw=0.5) for i in df.columns[5:]]
ax.set_ylabel("$log_{10}\\phi$")
ax.set_xlabel("model runs")

Now let's plot up the observations plus noise, the prior simulated values, and the posterior simulated values. first lets load up the prior and posterior observation ensembles

In [None]:
#pr_oe = pd.read_csv(os.path.join(ies_m_d,"sgn.0.obs.csv"),index_col=0)
#pt_oe = pd.read_csv(os.path.join(ies_m_d,"sgn.{0}.obs.csv".format(noptm)),index_col=0)

pr_oe = pyemu.ObservationEnsemble.from_binary(pst=pst,filename=os.path.join(ies_m_d,"sgn.0.obs.jcb"))
pt_oe = pyemu.ObservationEnsemble.from_binary(pst=pst,filename=os.path.join(ies_m_d,"sgn.{0}.obs.jcb".format(noptm)))
obs_plus_noise = pyemu.ObservationEnsemble.from_binary(pst=pst,filename=os.path.join(ies_m_d,"sgn.obs+noise.jcb"))


In [None]:
plot_cols = pst.observation_data.loc[pst.nnz_obs_names].apply(lambda x: x.usecol + " "+x.oname,axis=1).to_dict()
plot_cols = {v: [k] for k, v in plot_cols.items()}
pyemu.plot_utils.ensemble_helper({"r":obs_plus_noise,"0.5":pr_oe,"b":pt_oe},
                                  plot_cols=plot_cols,bins=20,sync_bins=False,
                                  func_dict={o:lambda x: np.log10(x) for o in pst.nnz_obs_names if "conc" in o})
plt.show()

Now lets plot up some head and concentration maps- always fun! Remember when we added observations for the simulated head and concentration in all model cells? here is where that pays off!  Here we will get pieces of the `pyemu.Pst.observation_data` dataframe that are for the layer head and concentrations of interest.  We can use `lay1_hds`/`lay1_ucn`, `lay2_hds`/`lay2_ucn` or `lay3_hds`/`lay3_ucn`

In [None]:
otag = "lay1"
otime = 1
obs = pst.observation_data
h_tag = "{0}__t{1}_hds".format(otag,otime)
c_tag = "{0}_t{1}_ucn".format(otag,otime)

print(h_tag,c_tag)

lay_hobs = obs.loc[obs.obsnme.str.contains(h_tag),:].copy()
assert lay_hobs.shape[0] > 0
lay_cobs = obs.loc[obs.obsnme.str.contains(c_tag),:].copy()
assert lay_cobs.shape[0] > 0
lay_hobs.loc[:,"i"] = lay_hobs.i.astype(int)
lay_cobs.loc[:,"i"] = lay_cobs.i.astype(int)
lay_hobs.loc[:,"j"] = lay_hobs.j.astype(int)
lay_cobs.loc[:,"j"] = lay_cobs.j.astype(int)

Let's just plot a few realizations

In [None]:
reals = pt_oe.index[:4].tolist()
if "base" not in reals:
    reals.append("base")
reals

Below, we just work out the min and max concentration and head values so that the plots are coherent

In [None]:
cmn = pt_oe.loc[reals,lay_cobs.obsnme].min().min()
cmx = pt_oe.loc[reals,lay_cobs.obsnme].max().max()
hmn = pt_oe.loc[reals,lay_hobs.obsnme].min().min()
hmx = pt_oe.loc[reals,lay_hobs.obsnme].max().max()
hlevels = np.linspace(hmn,hmx,4)

Now for some matplotlib hackery! For each realization, we will instantiate an empty numpy array and then fill it with the realization values. Then plot and add some nice things...

In [None]:
lay_hobs

In [None]:
for real in reals:
    pr_harr = np.zeros((lay_hobs.i.max()+1,lay_hobs.j.max()+1))
    pr_harr[lay_hobs.i,lay_hobs.j] = pr_oe.loc[real,lay_hobs.obsnme]
    pr_carr = np.zeros((lay_cobs.i.max()+1,lay_cobs.j.max()+1))
    pr_carr[lay_cobs.i,lay_cobs.j] = pr_oe.loc[real,lay_cobs.obsnme]
    
    pt_harr = np.zeros((lay_hobs.i.max()+1,lay_hobs.j.max()+1))
    pt_harr[lay_hobs.i,lay_hobs.j] = pt_oe.loc[real,lay_hobs.obsnme]
    pt_carr = np.zeros((lay_cobs.i.max()+1,lay_cobs.j.max()+1))
    pt_carr[lay_cobs.i,lay_cobs.j] = pt_oe.loc[real,lay_cobs.obsnme]

    pr_carr[pr_carr<0.001] = np.nan
    pt_carr[pt_carr<0.001] = np.nan
    

    fig,axes = plt.subplots(1,2,figsize=(12,5))
    axes[0].imshow(pr_carr,vmin=cmn,vmax=cmx)
    cb = axes[1].imshow(pt_carr,vmin=cmn,vmax=cmx)
    plt.colorbar(cb,ax=axes[1])
    
    cs = axes[0].contour(pr_harr,levels=hlevels,colors="0.5")
    axes[0].clabel(cs)
    cs = axes[1].contour(pt_harr,levels=hlevels,colors="0.5")
    axes[1].clabel(cs)
    axes[0].set_title("{0} prior realization {1}".format(otag,real))
    axes[1].set_title("{0} posterior realization {1}".format(otag,real))
    
    
    plt.show()


For each realization, we see extreme concentration values in the prior, as we should since the prior represents expert knowledge only and no detailed aquifer-specific information that is contained in the observations.  But the posterior realizations are more tame after conditioning the parameters on those aquifer-specific data.

Now let's run PESTPP-IES again but this time without using concentration observations.  This will give us a measure of how important those concentration observations are for reducing predictive uncertainty...

In [None]:
conc_nnz_obs = [o for o in pst.nnz_obs_names if "conc" in o]
pst.observation_data.loc[conc_nnz_obs,"weight"] = 0
pst.write(os.path.join(t_d,"sgn.pst"),version=2)
ies_m_d_ho = os.path.join(worker_root,"master_ies_headonly")


From here on, this is exactly the same code used above...

VERY IMPORTANT:  the `num_worker` argument needs to be set with respect to the computational power of your machine.  If you have a beefy workstation, then 10 is reasonable.  If you a simple laptop, you probably need to use 4 or 5. 

In [None]:
pyemu.os_utils.start_workers(t_d,"pestpp-ies","sgn.pst",num_workers=10,
                             master_dir=ies_m_d_ho,worker_root=worker_root,port=4269)

In [None]:
pr_oe_ho = pyemu.ObservationEnsemble.from_binary(pst=pst,filename=os.path.join(ies_m_d_ho,"sgn.0.obs.jcb"))
pt_oe_ho = pyemu.ObservationEnsemble.from_binary(pst=pst,filename=os.path.join(ies_m_d_ho,"sgn.{0}.obs.jcb".format(noptm)))
obs_plus_noise = pyemu.ObservationEnsemble.from_binary(pst=pst,filename=os.path.join(ies_m_d_ho,"sgn.obs+noise.jcb"))

reals_ho = pt_oe_ho.index[:4].tolist()
reals_ho.append("base")
reals_ho

In [None]:
cmn = pt_oe_ho.loc[reals_ho,lay_cobs.obsnme].min().min()
cmx = pt_oe_ho.loc[reals_ho,lay_cobs.obsnme].max().max()
hmn = pt_oe_ho.loc[reals_ho,lay_hobs.obsnme].min().min()
hmx = pt_oe_ho.loc[reals_ho,lay_hobs.obsnme].max().max()
hlevels = np.linspace(hmn,hmx,4)

In [None]:
for real in reals_ho:
    pr_harr = np.zeros((lay_hobs.i.max()+1,lay_hobs.j.max()+1))
    pr_harr[lay_hobs.i,lay_hobs.j] = pr_oe_ho.loc[real,lay_hobs.obsnme]
    pr_carr = np.zeros((lay_cobs.i.max()+1,lay_cobs.j.max()+1))
    pr_carr[lay_cobs.i,lay_cobs.j] = pr_oe_ho.loc[real,lay_cobs.obsnme]
    
    pt_harr = np.zeros((lay_hobs.i.max()+1,lay_hobs.j.max()+1))
    pt_harr[lay_hobs.i,lay_hobs.j] = pt_oe_ho.loc[real,lay_hobs.obsnme]
    pt_carr = np.zeros((lay_cobs.i.max()+1,lay_cobs.j.max()+1))
    pt_carr[lay_cobs.i,lay_cobs.j] = pt_oe_ho.loc[real,lay_cobs.obsnme]

    pr_carr[pr_carr<0.001] = np.nan
    pt_carr[pt_carr<0.001] = np.nan
    

    fig,axes = plt.subplots(1,2,figsize=(12,5))
    axes[0].imshow(pr_carr,vmin=cmn,vmax=cmx)
    cb = axes[1].imshow(pt_carr,vmin=cmn,vmax=cmx)
    plt.colorbar(cb,ax=axes[1])
    
    cs = axes[0].contour(pr_harr,levels=hlevels,colors="0.5")
    axes[0].clabel(cs)
    cs = axes[1].contour(pt_harr,levels=hlevels,colors="0.5")
    axes[1].clabel(cs)
    axes[0].set_title("{0} prior realization {1}".format(otag,real))
    axes[1].set_title("{0} posterior realization {1}".format(otag,real))
    
    
    plt.show()

Not as drastic of a change prior-to-posterior as we saw when using the concentration observations...
Now lets compare the simulated mass discharged to the GHBs with and without using concentration observations...

In [None]:
ghb_mass_onames = obs.loc[obs.obsnme.apply(lambda x: "tcum" in x and "ghb" in x),"obsnme"]

In [None]:
ghb_mass_onames

In [None]:
np.log10(np.abs(pt_oe.loc[:,ghb_mass_oname].values))

In [None]:
for ghb_mass_oname in ghb_mass_onames:
    fig,ax = plt.subplots(1,1)
    ax.hist(np.log10(np.abs(pt_oe.loc[:,ghb_mass_oname].values)),fc="m",alpha=0.5)
    ax.hist(np.log10(np.abs(pt_oe_ho.loc[:,ghb_mass_oname].values)),fc="c",alpha=0.5)
    ax.set_title(ghb_mass_oname)
    ax.set_xlabel("$log_{10}$ Kg")
    plt.show()

We see the value of those concentration observations now: the range of mass discharged to the GHBs is narrow when concentration observations are used for history matching - the uncertainty in an important simulation results is lower... 

Let's do the same plotting for the wel type boundaries, but this time, lets compare the difference in mass removed by the wel boundaries since this is a direct measure of the effectiveness of the pump-and-treat system's performance.

In [None]:
well_mass_onames = obs.loc[obs.obsnme.apply(lambda x: "tcum" in x and "wel" in x),"obsnme"]
d = pt_oe.loc[:,well_mass_onames[0]].values - pt_oe.loc[:,well_mass_onames[1]].values
d_ho = pt_oe_ho.loc[:,well_mass_onames[0]].values - pt_oe_ho.loc[:,well_mass_onames[1]].values

In [None]:
fig,ax = plt.subplots(1,1)
ax.hist(np.log10(d),fc="m",alpha=0.5)
ax.hist(np.log10(d_ho),fc="c",alpha=0.5)
_ = ax.set_xlabel("change in mass removed by wells ($log_{10} Kg$)")

boom!  again, the value of the concentration observations is clear:  Those observations have significantly reduce the uncertainty around the mass recovered by the pump-and-treat system.