# Introduction to constrained multi-objective management optimization (under uncertainty) - yeah, its getting deep!

In the two PESTPP-OPT notebooks, we introduced the concept of constrained management optimization under uncertainty.  We saw standard risk-neutral optimization and then piled on the learning and concepts with the idea of chances, chance constraints, risk/reliability, and stacks.  So if you are reading this notebook...

Ok, so now let's talk about the nature of constraints and objective functions.  In the Freyberg example, we have been treating the sw-gw exchange flux and the aggregate groundwater extraction rate for each stress period as a "hard" inequality constraint: thou shall not violate! But in many settings there is a general stakeholder preference to avoid unwanted outcomes but the exact nature of that avoidance is not known: "Sure we want to keep some groundwater flowing into the surface-water system but we also want have plenty of water to drink".  Very imprecise...so how can we deal with this is science nerds?

Well, one way is to use so-called "multi-objective" optimization, where the goal is to map the trade off between competing objectives. Unfortunately, this kind of trade-off mapping is very (very) computationally expensive because in most cases, we have to resort to "global" evolutionary-type algorithms.  Note that "multi-objective" doesnt mean go crazy with objectives.  Five or six is probably the most that be used for algorithmic reasons.  

If you are intersted in learning more about multi-objective optimization, #LMGTFY: "pareto frontier", "pareto dominance", "nsga-II", etc...


### Admin

Start off with the usual loading of dependencies and preparing model and PEST files. We will be continuing to work with the modified-Freyberg model (see "intro to model" notebook), and the high-dimensional PEST dataset prepared in the "pstfrom pest setup" and "obs and weights" notebooks. 

For the purposes of this notebook, you do not require familiarity with previous notebooks (but it helps...). 

Simply run the next few cells by pressing `shift+enter`.

In [None]:
import os
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning) 
import numpy as np
import pandas as pd
font = {'family' : 'normal',
        'size'   : 15}
import matplotlib
matplotlib.rc('font', **font)
import matplotlib.pyplot as plt;
import shutil
import psutil

import sys
sys.path.insert(0,os.path.join("..", "..", "dependencies"))
import pyemu
import flopy
assert "dependencies" in flopy.__file__
assert "dependencies" in pyemu.__file__
sys.path.insert(0,"..")
import herebedragons as hbd



To maintain continuity in the series of tutorials, we we use the PEST-dataset prepared in the "obs and weigths" tutorial. Run the next cell to copy fthe necessary files across. Note that if you will need to run the previous notebooks in the correct order beforehand.

Specify the path to the PEST dataset template folder. Recall that we will prepare our PEST dataset files in this folder, keeping them separate from the original model files. Then copy across pre-prepared model and PEST files:

In [None]:
# specify the temporary working folder
t_d = os.path.join('freyberg6_template')
if os.path.exists(t_d):
    shutil.rmtree(t_d)

org_t_d = os.path.join("..","part2_8_opt","freyberg6_template")
if not os.path.exists(org_t_d):
    raise Exception("you need to run the '/part2_8_opt/freyberg_opt_1.ipynb' notebook")

shutil.copytree(org_t_d,t_d)

In [None]:
pst_path = os.path.join(t_d, 'freyberg_mf6.pst')

### Inspect the PEST Dataset

OK. We can now get started.

Load the PEST control file as a `Pst` object. We are going to use the PEST control file that was created in the "pstfrom pest setup" tutorial. This control file has observations with weights equal to the inverse of measurement noise (**not** weighted for visibility!).

In [None]:
pst = pyemu.Pst(pst_path)

### Run PESTPP-MOU

`PESTPP-MOU` implements a constrained multiple and single objective "global" optimization using evolutionary algorithms.  Additional terminology alert!


 - "individual": an optimization problem candidate solution. So just a decision variable vector - one value for each decision variable
 - "population":  well, a collection of individuals
 - "generation": a complete cycle of the evolutionary algorithm (think "iteration"), which involves generating a new, "child" population by combining parents, evaluting the children's fitness (running the population thru the model), and the (natural) selection, where the "best" individuals from the parent and child population are keep.  No doubt "best" is where things get complicated...
 - "generator":  the algorithmic process to generate a child population.  Differential evolution is the default in PESTPP-MOU but there are others
 - "selector": the algorithmic process to pick the best individuals in the population to move to the next generation.  For single objective formulations, this is trivial. For multiobjective formulations, selection is also complex.


Well, there you have it - you are now ready for PESTPP-MOU! but wait, how big should the population be?  How many generations should I use?  Great questions!  Generally it is said that the population should be about twice as large as the number of decision variables.  As for generations, lots (and this is the problem!).  Like 50, 100, or more are not uncommon...




In [None]:
pst.pestpp_options["mou_objectives"] = ["oname:cum_otype:lst_usecol:sfr_totim:4383.5","oname:cum_otype:lst_usecol:wel_totim:4383.5"]
pst.observation_data.loc[pst.pestpp_options["mou_objectives"],'weight'] = 1.0
pst.observation_data.loc[pst.pestpp_options["mou_objectives"],'obgnme'] = "less_than_obj"
pst.prior_information.loc[:,"weight"] = 0.0

pst.pestpp_options["mou_population_size"] = 160 #twice the number of decision variables

pst.control_data.noptmax = 0
pst.write(pst_path,version=2)                       

In [None]:
pyemu.os_utils.run("pestpp-mou freyberg_mf6.pst",cwd=t_d)

In [None]:
pst.control_data.noptmax = 100
pst.write(pst_path,version=2)

# Attention!

You must specify the number which is adequate for ***your*** machine! Make sure to assign an appropriate value for the following `num_workers` variable:

In [None]:
num_workers = 15# psutil.cpu_count(logical=False) # update according to your available resources!

Then specify the folder in which the PEST manager will run and record outcomes. It should be different from the `t_d` folder. 

In [None]:
m_d = os.path.join('master_mou_1')

The following cell deploys the PEST agents and manager and then starts the run using `pestpp-mou`. Run it by pressing `shift+enter`.

If you wish to see the outputs in real-time, switch over to the terminal window (the one which you used to launch the `jupyter notebook`). There you should see `pestpp-mou`'s progress. 

If you open the tutorial folder, you should also see a bunch of new folders there named `worker_0`, `worker_1`, etc. These are the agent folders. `pyemu` will remove them when PEST finishes running.

This run should take a while to complete (depending on the number of workers and the speed of your machine). If you get an error, make sure that your firewall or antivirus software is not blocking `pestpp-mou` from communicating with the agents (this is a common problem!).

In [None]:
pyemu.os_utils.start_workers(t_d,"pestpp-mou","freyberg_mf6.pst",num_workers=num_workers,worker_root=".",
                           master_dir=m_d)

### Processing PESTPP-OPT

Ok, so now what? Well let's check out the constraints (since the include both the water use and sw-gw exchange fluxes).  Here are the files that might have what we need:

In [None]:
[f for f in os.listdir(m_d) if f.endswith(".rei")]

Wat?! Whats with this "est" and "sim" stuff?  Well, in PESTPP-OPT, the linear-programming solution yields what it thinks the final constraint values should be, based on the assumed linearity of the response matrix - these are the "est"imated constraint values.  But we know that the relation between decision variables and constraints might be non-linear (nah, really?!).  So PESTPP-OPT actually "sim"ulates the model one last time with the optimal decision variable values to verify the results. (the ".jcb.rei" files are the simulation results where the response matrix was calculated).  Lets compare these:

In [None]:
sim_df = pyemu.pst_utils.read_resfile(os.path.join(m_d,"freyberg_mf6.1.est.rei"))
est_df = pyemu.pst_utils.read_resfile(os.path.join(m_d,"freyberg_mf6.1.sim.rei"))
constraints = swgw_constraint_names.tolist()
constraints.extend(wel_constraint_names)
fig,ax = plt.subplots(1,1,figsize=(10,3))
sim_df.loc[swgw_constraint_names,"est"] = est_df.loc[swgw_constraint_names,"modelled"]
sim_df.loc[swgw_constraint_names,["modelled","est"]].plot(ax=ax,kind="bar")
ax.plot(ax.get_xlim(),[swgw_rhs,swgw_rhs],"k--")

Ok, so we see that there is some mild nonlinearity but we are still pretty close.  #winning

Hackery alert:  now lets visualize the pattern of groundwater use across the future stress periods and plot that with the constraint information:

In [None]:
par_df = pyemu.pst_utils.read_parfile(os.path.join(m_d,"freyberg_mf6.par"))
par_df.loc[future_wpar_names,:]

In [None]:
wpar = wpar.loc[future_wpar_names,:].copy()
wpar.loc[:,"kij"] = wpar.apply(lambda x: (x.idx0,x.idx1,x.idx2),axis=1)
wpar.loc[:,"optimal"] = par_df.loc[wpar.parnme,"parval1"]
wpar

In [None]:
inst_vals = wpar.inst.unique()
inst_vals.sort()
inst_vals

In [None]:
colors = ["r","g","b","c","m","y","0.5"]
vals = {}
for inst in inst_vals:
    ipar = wpar.loc[wpar.inst==inst,:].copy()
    ipar.sort_values(by="kij",inplace=True)
    ipar.index = ipar.kij
    #ipar.optimal.plot(ax=ax,kind="bar",color=colors)
    vals[inst] = ipar.optimal

In [None]:
fig,axes = plt.subplots(2,1,figsize=(20,20))
colors = ["r","g","b","c","m","y","0.5"]
df = pd.DataFrame(vals).T
df.plot(ax=axes[0],kind="bar",color=colors)
axes[0].set_ylim(0,9)
nconst = len(wel_constraint_names)-1
axes[1].plot(np.arange(nconst),sim_df.loc[wel_constraint_names,"modelled"].values[1:],"b",lw=1.5)
axes[1].plot(axes[1].get_xlim(),[wel_rhs,wel_rhs],"b--",lw=3.5)
axes[1].fill_between(np.arange(nconst),np.zeros(nconst) + wel_rhs,
                     sim_df.loc[wel_constraint_names,"modelled"].values[1:],facecolor="b",alpha=0.5)
axt = plt.twinx(axes[1])
axt.plot(np.arange(nconst),sim_df.loc[swgw_constraint_names,"modelled"].values[1:],"m",lw=1.5)
axt.fill_between(np.arange(nconst),np.zeros(nconst) + swgw_rhs,
                     sim_df.loc[swgw_constraint_names,"modelled"].values[1:],facecolor="m",alpha=0.5)
axes[1].set_xticklabels(inst_vals)
axes[0].set_xlim(0,12)
axes[1].set_xticks(np.arange(len(wel_constraint_names)-1))
axes[1].set_xlim(0,12)
axt.plot(axes[1].get_xlim(),[swgw_rhs,swgw_rhs],"m--",lw=3.5)
axes[1].set_ylim(-10000,0)
axt.set_ylim(-700,800)
axes[0].set_title("Decision Variables",loc="left")
axes[1].set_title("Constraints",loc="left")
[i.set_color("b") for i in axes[1].get_yticklabels()]
[i.set_color("m") for i in axt.get_yticklabels()]
lb = axes[1].set_ylabel("groundwater extraction rate")
lb.set_color('b')
lb = axt.set_ylabel("sw-gw exchange rate")
lb.set_color('m')
plt.tight_layout()

If you can see past the plotting hacks, you'll see that the optimal solution is relatively complex in terms of which extraction wells are active each stress period and that the optimal solution makes substantially more water during stress periods 13 thru 16 (blue shaded region) but then must back off the extraction rate to meet the sw-gw constraints during stress period 22.  In fact, extra sw-gw exchange flux for stress periods 19-22 (magenta fill) is left in the stream - this is likely because the system memory and an imperfect spatial distribution of extraction wells.  Notice that in the later stress periods, the extraction is moved to wells located in the northern portion of the domain (smaller "j" values in the k-i-j info)

In the next notebook, we will move beyond deterministic/risk neutral optimization to including posterior parameter uncertainties in the optimization...

If you are interested in increasing the complexity of this optimization problem, try experiementing with requiring more sw-gw exchange (more negative than 0) and/or requiring more groundwater extraction (more negative than -2350.0).  You will soon see "infeasible" in the .rec file, meaning there is not a combination of extraction well rates that can simultaneously satisfy ecological and economic needs...