# When worlds collide: optimization under uncertainty

In the previous optimization notebook ("freyberg_opt_1.ipynb"), we saw how we can use the PEST interface to also implement formal constrained management optimization.  And it was awesome!

But what about all those other notebooks where we droned on and on about prediction uncertainty and parameter estimation/data assimilation?  Was that all for nothing?! 

No!  It wasn't.  In fact, in the same way that predictions are uncertain, constraints based on simulation results are also uncertain.  Let's take this a step further:  In many cases, the predictions we are focused on in our modeling have certain values that are important to decision makers.  We saw this in the previous optimization notebook:  future surface-water/groundwater exchange flux was a prediction in earlier notebooks, but was also used as a constraint to avoid an unwanted outcome (too little flux to sustain ecological flows).  So its only natural to think "how can we combine the uncertainty analysis concepts with management optimization?".  This is referrred to optimization under uncertainty and crux of the deal is the concept of "chance constraints".  

Most optimization algorithms require that the constraint be assigned a single "right-hand side" value - the value not to violate.  But uncertainty analysis gives us a range (or statistical distribution) of possible constraint/prediction values.  How can we recitify this problem?  Well, if we realize that the statistical distribution covers the range of possible values that a given constraint may take (because of uncertainty), then that leads us to a concept of "risk shifting".  "risk" (aka reliability) is a simple scalar algorithmic control that ranges from 0.0 to 1.0.  A risk value of 0.5 is called "risk neutral" and it implies that 50% of the mass of a constraint probability density function is on either side of the value that corresponds to risk = 0.5 - think of the mean of a normal distribution:  at the mean value, half of the distribution is on either side.  A risk of 0.95 implies we want to be 95% sure the constraint will not be violated and is referred to as risk averse.  A risk averse stance implies we will have to accept a sub optimal objective function to be 95% sure.  The other side of the distribution is referred to as "risk tolerant" and it implies a decreasingly small chance that the constraint will actually be satisfied (danger zone!).  

Why did we say "actually be satisfied" above?  Well, just like predictions, we dont know what the "true" or "real world" value of the model-based constraints is (if we did, we wouldnt need to be modeling at all!).  So we dont know what the "true" the constraint will take because its something we cant or haven't observed.  

Ok, enough words.  Let's see how this works in practice.  Not that to run this notebook, you will have needed to run both the previous optimization notebook, as well as the PESTPP-IES part 1 notebook.


### Admin

Start off with the usual loading of dependencies and preparing model and PEST files. We will be continuing to work with the modified-Freyberg model (see "intro to model" notebook), and the high-dimensional PEST dataset prepared in the "pstfrom pest setup" and "obs and weights" notebooks. 

For the purposes of this notebook, you do not require familiarity with previous notebooks (but it helps...). 

Simply run the next few cells by pressing `shift+enter`.

In [None]:
import os
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning) 
import numpy as np
import pandas as pd
font = {'family' : 'normal',
        'size'   : 15}
import matplotlib
matplotlib.rc('font', **font)
import matplotlib.pyplot as plt;
import shutil
import psutil

import sys
sys.path.insert(0,os.path.join("..", "..", "dependencies"))
import pyemu
import flopy
assert "dependencies" in flopy.__file__
assert "dependencies" in pyemu.__file__
sys.path.insert(0,"..")
import herebedragons as hbd



To maintain continuity in the series of tutorials, we we use the PEST-dataset prepared in the "obs and weigths" tutorial. Run the next cell to copy fthe necessary files across. Note that if you will need to run the previous notebooks in the correct order beforehand.

Specify the path to the PEST dataset template folder. Recall that we will prepare our PEST dataset files in this folder, keeping them separate from the original model files. Then copy across pre-prepared model and PEST files:

In [None]:
# specify the temporary working folder
t_d = os.path.join('freyberg6_template_chance')
if os.path.exists(t_d):
    shutil.rmtree(t_d)

org_t_d = os.path.join("freyberg6_template")
if not os.path.exists(org_t_d):
    raise Exception("you need to run the '/part2_8_opt/freyberg_opt_1.ipynb' notebook")

shutil.copytree(org_t_d,t_d)

In [None]:
pst_path = os.path.join(t_d, 'freyberg_mf6.pst')

In [None]:
pst = pyemu.Pst(pst_path)

## Stacks

So mechanically, how do we come up with this constraint PDF?  We saw previously in the PESTPP-IES notebook that we had to run the posterior parameter ensemble to yield a predictive PDF.  Well its no different here:  We will grab that PESTPP-IES posterior parameter ensemble (and manipulate it a little to remove decision variables) and then identify that ensemble as a "stack" of parameter realizations that can be run thru the model to yield constraint PDFs.  Easy as!

In [None]:
# check that the pestpp-ies directory exists and that the posterior parameter ensemble exists

## An aside on "coupling": interaction between decision variables, parameters, and constraints

blahblahblah




# Attention!

You must specify the number which is adequate for ***your*** machine! Make sure to assign an appropriate value for the following `num_workers` variable:

In [None]:
num_workers = psutil.cpu_count(logical=False) # update according to your available resources!

Then specify the folder in which the PEST manager will run and record outcomes. It should be different from the `t_d` folder. 

In [None]:
m_d = os.path.join('master_opt_1')

The following cell deploys the PEST agents and manager and then starts the run using `pestpp-opt`. Run it by pressing `shift+enter`.

If you wish to see the outputs in real-time, switch over to the terminal window (the one which you used to launch the `jupyter notebook`). There you should see `pestpp-opt`'s progress. 

If you open the tutorial folder, you should also see a bunch of new folders there named `worker_0`, `worker_1`, etc. These are the agent folders. `pyemu` will remove them when PEST finishes running.

This run should take a while to complete (depending on the number of workers and the speed of your machine). If you get an error, make sure that your firewall or antivirus software is not blocking `pestpp-opt` from communicating with the agents (this is a common problem!).

In [None]:
pyemu.os_utils.start_workers(t_d,"pestpp-opt","freyberg_mf6.pst",num_workers=num_workers,worker_root=".",
                           master_dir=m_d)

### Processing PESTPP-OPT

Ok, so now what? Well let's check out the constraints (since the include both the water use and sw-gw exchange fluxes).  Here are the files that might have what we need:

In [None]:
[f for f in os.listdir(m_d) if f.endswith(".rei")]

Wat?! Whats with this "est" and "sim" stuff?  Well, in PESTPP-OPT, the linear-programming solution yields what it thinks the final constraint values should be, based on the assumed linearity of the response matrix - these are the "est"imated constraint values.  But we know that the relation between decision variables and constraints might be non-linear (nah, really?!).  So PESTPP-OPT actually "sim"ulates the model one last time with the optimal decision variable values to verify the results. (the ".jcb.rei" files are the simulation results where the response matrix was calculated).  Lets compare these:

In [None]:
sim_df = pyemu.pst_utils.read_resfile(os.path.join(m_d,"freyberg_mf6.1.est.rei"))
est_df = pyemu.pst_utils.read_resfile(os.path.join(m_d,"freyberg_mf6.1.sim.rei"))
constraints = swgw_constraint_names.tolist()
constraints.extend(wel_constraint_names)
fig,ax = plt.subplots(1,1,figsize=(10,3))
sim_df.loc[swgw_constraint_names,"est"] = est_df.loc[swgw_constraint_names,"modelled"]
sim_df.loc[swgw_constraint_names,["modelled","est"]].plot(ax=ax,kind="bar")
ax.plot(ax.get_xlim(),[-250,-250],"k--")

Ok, so we see that there is some mild nonlinearity but we are still pretty close.  #winning

Hackery alert:  now lets visualize the pattern of groundwater use across the future stress periods and plot that with the constraint information:

In [None]:
par_df = pyemu.pst_utils.read_parfile(os.path.join(m_d,"freyberg_mf6.par"))
par_df.loc[future_wpar_names,:]

In [None]:
wpar = wpar.loc[future_wpar_names,:].copy()
wpar.loc[:,"kij"] = wpar.apply(lambda x: (x.idx0,x.idx1,x.idx2),axis=1)
wpar.loc[:,"optimal"] = par_df.loc[wpar.parnme,"parval1"]
wpar

In [None]:
inst_vals = wpar.inst.unique()
inst_vals.sort()
inst_vals

In [None]:
colors = ["r","g","b","c","m","y","0.5"]
vals = {}
for inst in inst_vals:
    ipar = wpar.loc[wpar.inst==inst,:].copy()
    ipar.sort_values(by="kij",inplace=True)
    ipar.index = ipar.kij
    #ipar.optimal.plot(ax=ax,kind="bar",color=colors)
    vals[inst] = ipar.optimal

In [None]:
fig,axes = plt.subplots(2,1,figsize=(20,20))
colors = ["r","g","b","c","m","y","0.5"]
df = pd.DataFrame(vals).T
df.plot(ax=axes[0],kind="bar",color=colors)
axes[0].set_ylim(0,9)
nconst = len(wel_constraint_names)-1
axes[1].plot(np.arange(nconst),sim_df.loc[wel_constraint_names,"modelled"].values[1:],"b",lw=1.5)
axes[1].plot(axes[1].get_xlim(),[-2350,-2350],"b--",lw=3.5)
axes[1].fill_between(np.arange(nconst),np.zeros(nconst) - 2350.,
                     sim_df.loc[wel_constraint_names,"modelled"].values[1:],facecolor="b",alpha=0.5)
axt = plt.twinx(axes[1])
axt.plot(np.arange(nconst),sim_df.loc[swgw_constraint_names,"modelled"].values[1:],"m",lw=1.5)
axt.fill_between(np.arange(nconst),np.zeros(nconst) - 250.,
                     sim_df.loc[swgw_constraint_names,"modelled"].values[1:],facecolor="m",alpha=0.5)
axes[1].set_xticklabels(inst_vals)
axes[0].set_xlim(0,12)
axes[1].set_xticks(np.arange(len(wel_constraint_names)-1))
axes[1].set_xlim(0,12)
axt.plot(axes[1].get_xlim(),[-250,-250],"m--",lw=3.5)
axes[1].set_ylim(-10000,0)
axt.set_ylim(-700,800)
axes[0].set_title("Decision Variables",loc="left")
axes[1].set_title("Constraints",loc="left")
[i.set_color("b") for i in axes[1].get_yticklabels()]
[i.set_color("m") for i in axt.get_yticklabels()]
lb = axes[1].set_ylabel("groundwater extraction rate")
lb.set_color('b')
lb = axt.set_ylabel("sw-gw exchange rate")
lb.set_color('m')
plt.tight_layout()

If you can see past the plotting hacks, you'll see that the optimal solution is relatively complex in terms of which extraction wells are active each stress period and that the optimal solution makes substantially more water during stress periods 13 thru 16 (blue shaded region) but then must back off the extraction rate to meet the sw-gw constraints during stress period 22.  In fact, extra sw-gw exchange flux for stress periods 19-22 (magenta fill) is left in the stream - this is likely because the system memory and an imperfect spatial distribution of extraction wells.  Notice that in the later stress periods, the extraction is moved to wells located in the northern portion of the domain (smaller "j" values in the k-i-j info)

In the next notebook, we will move beyond deterministic/risk neutral optimization to including posterior parameter uncertainties in the optimization...

If you are interested in increasing the complexity of this optimization problem, try experiementing with requiring more sw-gw exchange (more negative than -250) and/or requiring more groundwater extraction (more negative than -2350.0).  You will soon see "infeasible" in the .rec file, meaning there is not a combination of extraction well rates that can simultaneously satisfy ecological and economic needs...