# When worlds collide: optimization under uncertainty

In the previous optimization notebook ("freyberg_opt_1.ipynb"), we saw how we can use the PEST interface to also implement formal constrained management optimization.  And it was awesome!

But what about all those other notebooks where we droned on and on about prediction uncertainty and parameter estimation/data assimilation?  Was that all for nothing?! 

No!  It wasn't.  In fact, in the same way that predictions are uncertain, constraints based on simulation results are also uncertain.  Let's take this a step further:  In many cases, the predictions we are focused on in our modeling have certain values that are important to decision makers.  We saw this in the previous optimization notebook:  future surface-water/groundwater exchange flux was a prediction in earlier notebooks, but was also used as a constraint to avoid an unwanted outcome (too little flux to sustain ecological flows).  So its only natural to think "how can we combine the uncertainty analysis concepts with management optimization?".  This is referrred to optimization under uncertainty and crux of the deal is the concept of "chance constraints".  

Most optimization algorithms require that the constraint be assigned a single "right-hand side" value - the value not to violate.  But uncertainty analysis gives us a range (or statistical distribution) of possible constraint/prediction values.  How can we recitify this problem?  Well, if we realize that the statistical distribution covers the range of possible values that a given constraint may take (because of uncertainty), then that leads us to a concept of "risk shifting".  "risk" (aka reliability) is a simple scalar algorithmic control that ranges from 0.0 to 1.0.  A risk value of 0.5 is called "risk neutral" and it implies that 50% of the mass of a constraint probability density function is on either side of the value that corresponds to risk = 0.5 - think of the mean of a normal distribution:  at the mean value, half of the distribution is on either side.  A risk of 0.95 implies we want to be 95% sure the constraint will not be violated and is referred to as risk averse.  A risk averse stance implies we will have to accept a sub optimal objective function to be 95% sure.  The other side of the distribution is referred to as "risk tolerant" and it implies a decreasingly small chance that the constraint will actually be satisfied (danger zone!).  

Why did we say "actually be satisfied" above?  Well, just like predictions, we dont know what the "true" or "real world" value of the model-based constraints is (if we did, we wouldnt need to be modeling at all!).  So we dont know what the "true" the constraint will take because its something we cant or haven't observed.  

Ok, enough words.  Let's see how this works in practice.  Not that to run this notebook, you will have needed to run both the previous optimization notebook, as well as the PESTPP-IES part 1 notebook.


### Admin

Start off with the usual loading of dependencies and preparing model and PEST files. We will be continuing to work with the modified-Freyberg model (see "intro to model" notebook), and the high-dimensional PEST dataset prepared in the "pstfrom pest setup" and "obs and weights" notebooks. 

For the purposes of this notebook, you do not require familiarity with previous notebooks (but it helps...). 

Simply run the next few cells by pressing `shift+enter`.

In [None]:
import os
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning) 
import numpy as np
import pandas as pd
font = {'family' : 'normal',
        'size'   : 15}
import matplotlib
matplotlib.rc('font', **font)
import matplotlib.pyplot as plt;
import shutil
import psutil

import sys
sys.path.insert(0,os.path.join("..", "..", "dependencies"))
import pyemu
import flopy
assert "dependencies" in flopy.__file__
assert "dependencies" in pyemu.__file__
sys.path.insert(0,"..")
import herebedragons as hbd



To maintain continuity in the series of tutorials, we we use the PEST-dataset prepared in the "obs and weigths" tutorial. Run the next cell to copy fthe necessary files across. Note that if you will need to run the previous notebooks in the correct order beforehand.

Specify the path to the PEST dataset template folder. Recall that we will prepare our PEST dataset files in this folder, keeping them separate from the original model files. Then copy across pre-prepared model and PEST files:

In [None]:
# specify the temporary working folder
t_d = os.path.join('freyberg6_template_chance')
if os.path.exists(t_d):
    shutil.rmtree(t_d)

org_t_d = os.path.join("freyberg6_template")
if not os.path.exists(org_t_d):
    raise Exception("you need to run the '/part2_8_opt/freyberg_opt_1.ipynb' notebook")

shutil.copytree(org_t_d,t_d)

In [None]:
pst_path = os.path.join(t_d, 'freyberg_mf6.pst')

In [None]:
pst = pyemu.Pst(pst_path)

## Stacks

So mechanically, how do we come up with this constraint PDF?  We saw previously in the PESTPP-IES notebook that we had to run the posterior parameter ensemble to yield a predictive PDF.  Well its no different here:  We will grab that PESTPP-IES posterior parameter ensemble (and manipulate it a little to remove decision variables) and then identify that ensemble as a "stack" of parameter realizations that can be run thru the model to yield constraint PDFs.  Easy as!  

Beware tho: including a stack in the optimization means we need to evaluate the stack at least once (see "coupling" below) which means we need queue up and run the stack along with the response matrix pertubation runs from before...lucky for you PESTPP-OPT does this automagically!

In [None]:
# check that the pestpp-ies directory exists and that the posterior parameter ensemble exists
ies_dir = os.path.join("..","part2_6_ies","master_ies_1")
if not os.path.exists(ies_dir):
    raise Exception("you need to run the 'part2_6_ies/freyberg_ies_1_basics.ipynb' notebook")

In [None]:
pe_files = [f for f in os.listdir(ies_dir) if f.endswith(".par.csv") and f.startswith("freyberg_mf6")]
pe_files.sort()
pe_files
pe = pd.read_csv(os.path.join(ies_dir,pe_files[-1]),index_col=0)

Now load the parameter ensemble from the last iteration of PESTPP-IES:

In [None]:
pe = pd.read_csv(os.path.join(ies_dir,pe_files[-1]),index_col=0)
pe

In [None]:
par = pst.parameter_data
par.loc[par.partrans=="fixed","partrans"] = "none"
wpar = par.loc[par.parnme.str.contains("wel") & par.parnme.str.contains("cn"),"parnme"]
pe.loc[:,wpar.values] = 1.0

In [None]:
pe.to_csv(os.path.join(t_d,"par_stack.csv"))

In [None]:
pst.pestpp_options["opt_par_stack"] = "par_stack.csv"
pst.pestpp_options["opt_risk"] = 0.95

In [None]:
obs_org = pst.observation_data.copy()
obs = pst.observation_data
#obs.loc[obs.apply(lambda x: x.weight > 0 and "wel" in x.obsnme,axis=1),"weight"] = 0.0


In [None]:
pst.noptmax = 1
pst.write(pst_path,version=2)

## An aside on "coupling": interaction between decision variables, parameters, and constraints

blahblahblah




# Attention!

You must specify the number which is adequate for ***your*** machine! Make sure to assign an appropriate value for the following `num_workers` variable:

In [None]:
num_workers = 15 #psutil.cpu_count(logical=False) # update according to your available resources!

Then specify the folder in which the PEST manager will run and record outcomes. It should be different from the `t_d` folder. 

In [None]:
m_d = os.path.join('master_opt_2')

The following cell deploys the PEST agents and manager and then starts the run using `pestpp-opt`. Run it by pressing `shift+enter`.

If you wish to see the outputs in real-time, switch over to the terminal window (the one which you used to launch the `jupyter notebook`). There you should see `pestpp-opt`'s progress. 

If you open the tutorial folder, you should also see a bunch of new folders there named `worker_0`, `worker_1`, etc. These are the agent folders. `pyemu` will remove them when PEST finishes running.

This run should take a while to complete (depending on the number of workers and the speed of your machine). If you get an error, make sure that your firewall or antivirus software is not blocking `pestpp-opt` from communicating with the agents (this is a common problem!).

In [None]:
pyemu.os_utils.start_workers(t_d,"pestpp-opt","freyberg_mf6.pst",num_workers=num_workers,worker_root=".",
                           master_dir=m_d)

### Processing PESTPP-OPT


In [None]:
obs = obs_org.loc[obs_org.weight > 0,:].copy()
wel_constraint_names = obs.loc[obs.obsnme.str.contains("inc") & obs.obsnme.str.contains("wel"),"obsnme"]
swgw_constraint_names = obs.loc[obs.obsnme.str.contains("inc") & obs.obsnme.str.contains("sfr"),"obsnme"]

In [None]:
[f for f in os.listdir(m_d) if f.endswith(".rei")]

Now we also have "chance" files, which, as the name implies, are residual files that represent the estimated and simulated observation quantities with the chance/risk offsets included.  Let's compare:

In [None]:
swgw_rhs = obs.loc[swgw_constraint_names,"obsval"].max()
wel_rhs = obs.loc[wel_constraint_names,"obsval"].max()
chance_df = pyemu.pst_utils.read_resfile(os.path.join(m_d,"freyberg_mf6.1.est+chance.rei"))
est_df = pyemu.pst_utils.read_resfile(os.path.join(m_d,"freyberg_mf6.1.est.rei"))
constraints = swgw_constraint_names.tolist()
constraints.extend(wel_constraint_names)
fig,ax = plt.subplots(1,1,figsize=(10,3))
est_df.loc[swgw_constraint_names,"chance+estimated"] = chance_df.loc[swgw_constraint_names,"modelled"]
est_df.loc[:,"estimated"] = est_df.modelled.values
est_df.loc[swgw_constraint_names,["estimated","chance+estimated"]].plot(ax=ax,kind="bar")
ax.plot(ax.get_xlim(),[swgw_rhs,swgw_rhs],"k--")

So here we see the cost of uncertainty:  we have to leave a large amount groundwater in the system so that be can be sure (at 95% confidence) that the sw-gw flux will 0.0 or negative (the orange bars are right at 0.0 for most stress periods, showing the precision of the optimization solver)

Now let's tie it all together:

In [None]:
wpar = par.loc[par.pargp=="decvars",:]
future_wpar_names = wpar.parnme
par_df = pyemu.pst_utils.read_parfile(os.path.join(m_d,"freyberg_mf6.par"))
par_df.loc[future_wpar_names,:]

In [None]:
wpar = wpar.loc[future_wpar_names,:].copy()
wpar.loc[:,"kij"] = wpar.apply(lambda x: (x.idx0,x.idx1,x.idx2),axis=1)
wpar.loc[:,"optimal"] = par_df.loc[wpar.parnme,"parval1"]
wpar

In [None]:
inst_vals = wpar.inst.unique()
inst_vals.sort()
inst_vals

In [None]:
colors = ["r","g","b","c","m","y","0.5"]
vals = {}
for inst in inst_vals:
    ipar = wpar.loc[wpar.inst==inst,:].copy()
    ipar.sort_values(by="kij",inplace=True)
    ipar.index = ipar.kij
    #ipar.optimal.plot(ax=ax,kind="bar",color=colors)
    vals[inst] = ipar.optimal

In [None]:

fig,axes = plt.subplots(2,1,figsize=(20,20))
colors = ["r","g","b","c","m","y","0.5"]
df = pd.DataFrame(vals).T
df.plot(ax=axes[0],kind="bar",color=colors)
axes[0].set_ylim(0,9)
nconst = len(wel_constraint_names)-1
axes[1].plot(np.arange(nconst),est_df.loc[wel_constraint_names,"modelled"].values[1:],"b",lw=1.5)
axes[1].plot(axes[1].get_xlim(),[wel_rhs,wel_rhs],"b--",lw=3.5)
axes[1].fill_between(np.arange(nconst),np.zeros(nconst) + wel_rhs,
                     est_df.loc[wel_constraint_names,"modelled"].values[1:],facecolor="b",alpha=0.5)
axt = plt.twinx(axes[1])
axt.plot(np.arange(nconst),est_df.loc[swgw_constraint_names,"modelled"].values[1:],"m",lw=1.5)
axt.fill_between(np.arange(nconst),np.zeros(nconst) + swgw_rhs,
                     est_df.loc[swgw_constraint_names,"modelled"].values[1:],facecolor="m",alpha=0.5)
axes[1].set_xticklabels(inst_vals)
axes[0].set_xlim(0,12)
axes[1].set_xticks(np.arange(len(wel_constraint_names)-1))
axes[1].set_xlim(0,12)
axt.plot(axes[1].get_xlim(),[swgw_rhs,swgw_rhs],"m--",lw=3.5)
axes[1].set_ylim(-10000,0)
axt.set_ylim(-700,800)
axes[0].set_title("Decision Variables",loc="left")
axes[1].set_title("Constraints",loc="left")
[i.set_color("b") for i in axes[1].get_yticklabels()]
[i.set_color("m") for i in axt.get_yticklabels()]
lb = axes[1].set_ylabel("groundwater extraction rate")
lb.set_color('b')
lb = axt.set_ylabel("sw-gw exchange rate")
lb.set_color('m')
plt.tight_layout()

Again, we see the cost of uncertainty: we are pumping much less water compared to the risk-neutral case so that we can leave that groundwater to discharge to the surface-water system.  #reliability

## Reusing previous results 


If we can rely on those estimated values from the solver, we can do some trickery to skip any additional model runs while we explore the additional problem formulations.  This means we can change the constrain right-hand sides and risk....


In [None]:
shutil.copy2(os.path.join(m_d,"freyberg_mf6.1.jcb"),os.path.join(m_d,"restart.jcb"))
pst.pestpp_options["base_jacobian"] = "restart.jcb"
shutil.copy2(os.path.join(m_d,"freyberg_mf6.1.jcb.rei"),os.path.join(m_d,"restart.res"))
pst.pestpp_options["hotstart_resfile"] = "restart.res"
shutil.copy2(os.path.join(m_d,"freyberg_mf6.1.obs_stack.csv"),os.path.join(m_d,"obs_stack.csv"))
pst.pestpp_options["opt_obs_stack"] = "obs_stack.csv"
pst.pestpp_options.pop("opt_par_stack",None)
pst.pestpp_options["opt_skip_final"] = True

pst.write(os.path.join(m_d,"freyberg_mf6_restart.pst"))


In [None]:
pyemu.os_utils.run("pestpp-opt freyberg_mf6_restart.pst",cwd=m_d)

Let's do something fun:  sweep over a full range of risk values and see how the objective function changes:

In [None]:
def scrape_rec():
    with open(os.path.join(m_d,"freyberg_mf6_restart.rec"),'r') as f:
        for line in f:
            if "---  best objective function value:" in line:
                print(line)
                obj_val = float(line.strip().split()[-1])
                break
    return obj_val

obj_vals = []
cwname = "oname:cum_otype:lst_usecol:wel_totim:4383.5"
cw_vals = []
pst.observation_data.loc[cwname,"obgnme"] = "less_than"
pst.observation_data.loc[cwname,"obsval"] = 1.0e+10
pst.observation_data.loc[cwname,"weight"] = 1.0


risk_vals = np.linspace(0.001,0.999,100)
for risk_val in risk_vals:
    pst.pestpp_options["opt_risk"] = risk_val
    pst.write(os.path.join(m_d,"freyberg_mf6_restart.pst"))
    pyemu.os_utils.run("pestpp-opt freyberg_mf6_restart.pst",cwd=m_d)
    obj_vals.append(scrape_rec())
    df = pyemu.pst_utils.read_resfile(os.path.join(m_d,"freyberg_mf6_restart.1.est.rei"))
    cw_vals.append(df.loc[cwname,"modelled"])
    



In [None]:
cw_vals

In [None]:
fig,ax = plt.subplots(1,1,figsize=(10,10))
ax.plot(risk_vals,obj_vals)
ax.grid()
ax.set_xlabel("risk")
ax.set_ylabel("objective function")
ax.set_xticks(np.arange(0,1.1,0.1))
axt = plt.twinx(ax)
axt.plot(risk_vals,np.array(cw_vals)*-1,alpha=0.0)
axt.set_ylabel("cumulative groundwater extracted ($L^3$)")
plt.tight_layout()
plt.show()

So that is a million dollar plot!  We are seeing the optimal solution to the constrained groundwater management problem across varying risk stances. And we can assign cost of uncertainty in terms of volume of extracted.