# Setting up a PEST interface from MODFLOW6 using the `PstFrom` class with `PyPestUtils` for advanced pilot point parameterization

In [None]:
import os
import shutil
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pyemu
import flopy

In [None]:
import sys
sys.path.append(os.path.join("..","..","pypestutils"))

In [None]:
import pypestutils as ppu

An existing MODFLOW6 model is in the directory `freyberg_mf6`.  Lets check it out:

In [None]:
org_model_ws = os.path.join('freyberg_mf6')
os.listdir(org_model_ws)

You can see that all the input array and list data for this model have been written "externally" - this is key to using the `PstFrom` class. 

Let's quickly viz the model top just to remind us of what we are dealing with:

In [None]:
id_arr = np.loadtxt(os.path.join(org_model_ws,"freyberg6.dis_idomain_layer3.txt"))
top_arr = np.loadtxt(os.path.join(org_model_ws,"freyberg6.dis_top.txt"))
top_arr[id_arr==0] = np.nan
plt.imshow(top_arr)

Now let's copy those files to a temporary location just to make sure we don't goof up those original files:

In [None]:
tmp_model_ws = "temp_pst_from_ppu"
if os.path.exists(tmp_model_ws):
    shutil.rmtree(tmp_model_ws)
shutil.copytree(org_model_ws,tmp_model_ws)
os.listdir(tmp_model_ws)

Now we need just a tiny bit of info about the spatial discretization of the model - this is needed to work out separation distances between parameters for build a geostatistical prior covariance matrix later.

Here we will load the flopy sim and model instance just to help us define some quantities later - flopy is not required to use the `PstFrom` class.

In [None]:
sim = flopy.mf6.MFSimulation.load(sim_ws=tmp_model_ws)
m = sim.get_model("freyberg6")


Here we use the simple `SpatialReference` pyemu implements to help us spatially locate parameters

In [None]:
sr = pyemu.helpers.SpatialReference.from_namfile(
        os.path.join(tmp_model_ws, "freyberg6.nam"),
        delr=m.dis.delr.array, delc=m.dis.delc.array)
sr

Now we can instantiate a `PstFrom` class instance

In [None]:
template_ws = "freyberg6_template"
pf = pyemu.utils.PstFrom(original_d=tmp_model_ws, new_d=template_ws,
                 remove_existing=True,
                 longnames=True, spatial_reference=sr,
                 zero_based=False,start_datetime="1-1-2018")


## Observations

So now that we have a `PstFrom` instance, but its just an empty container at this point, so we need to add some PEST interface "observations" and "parameters".  Let's start with observations using MODFLOW6 head.  These are stored in `heads.csv`:

In [None]:
df = pd.read_csv(os.path.join(tmp_model_ws,"heads.csv"),index_col=0)
df

The main entry point for adding observations is (surprise) `PstFrom.add_observations()`.  This method works on the list-type observation output file.  We need to tell it what column is the index column (can be string if there is a header or int if no header) and then what columns contain quantities we want to monitor (e.g. "observe") in the control file - in this case we want to monitor all columns except the index column:

In [None]:
hds_df = pf.add_observations("heads.csv",insfile="heads.csv.ins",index_cols="time",
                    use_cols=list(df.columns.values),prefix="hds",)
hds_df

We can see that it returned a dataframe with lots of useful info: the observation names that were formed (`obsnme`), the values that were read from `heads.csv` (`obsval`) and also some generic weights and group names.  At this point, no control file has been created, we have simply prepared to add this observations to the control file later.  

In [None]:
[f for f in os.listdir(template_ws) if f.endswith(".ins")]

Nice!  We also have a PEST-style instruction file for those obs.

Now lets do the same for SFR observations:

In [None]:
df = pd.read_csv(os.path.join(tmp_model_ws, "sfr.csv"), index_col=0)
sfr_df = pf.add_observations("sfr.csv", insfile="sfr.csv.ins", index_cols="time", use_cols=list(df.columns.values))
sfr_df

Sweet as!  Now that we have some observations, let's add parameters!

## Pilot points and `PyPestUtils`

This notebook is mostly meant to demonstrate some advanced pilot point parameterization that is possible with `PyPestUtils`, so we will only focus on HK and VK pilot point parameters.  This is just to keep the example short.  In practice, please please please parameterize boundary conditions too!

In [None]:
v = pyemu.geostats.ExpVario(contribution=1.0,a=5000,bearing=0,anisotropy=5)
pp_gs = pyemu.geostats.GeoStruct(variograms=v, transform='log')

In [None]:
pp_gs.plot()
print("spatial variogram")

Now let's get the idomain array to use as a zone array - this keeps us from setting up parameters in inactive model cells:

In [None]:
ib = m.dis.idomain[0].array

Find HK files for the upper and lower model layers (assuming model layer 2 is a semi-confining unit)

In [None]:
hk_arr_files = [f for f in os.listdir(tmp_model_ws) if "npf_k_" in f and f.endswith(".txt") and "layer2" not in f]
hk_arr_files

In [None]:
for arr_file in hk_arr_files:
    tag = arr_file.split('.')[1].replace("_","-")
    pf.add_parameters(filenames=arr_file,par_type="pilotpoints",
                       par_name_base=tag,pargp=tag,zone_array=ib,
                       upper_bound=10.,lower_bound=0.1,ult_ubound=100,ult_lbound=0.01,
                       pp_options={"pp_space":3},geostruct=pp_gs)
    #let's also add the resulting hk array that modflow sees as observations
    # so we can make easy plots later...
    pf.add_observations(arr_file,prefix=tag,
                        obsgp=tag,zone_array=ib)

If you are familiar with how `PstFrom` has worked historically, we handed off the process to solve for the factor file (which requires solving the kriging equations for each active node) to a pure python (well, with pandas and numpy).  This was ok for toy models, but hella slow for big ugly models.  If you look at the log entries above, you should see that the instead, `PstFrom` successfully handed off the solve to `PyPestUtils`, which is exponentially faster for big models.  sweet ez! 

In [None]:
tpl_files = [f for f in os.listdir(template_ws) if f.endswith(".tpl")]
tpl_files

In [None]:
with open(os.path.join(template_ws,tpl_files[0]),'r') as f:
    for _ in range(2):
        print(f.readline().strip())
        


So those might look like pretty redic parameter names, but they contain heaps of metadata to help you post process things later...

So those are you standard pilot points for HK in layer 1 - same as it ever was...

### build the control file, pest interface files, and forward run script
At this point, we have some parameters and some observations, so we can create a control file:

In [None]:
pf.mod_sys_cmds.append("mf6")
pf.pre_py_cmds.insert(0,"import sys")
pf.pre_py_cmds.insert(1,"sys.path.append(os.path.join('..','..','..','pypestutils'))")
pf.build_pst()

In [None]:
_ = [print(line.rstrip()) for line in open(os.path.join(template_ws,"forward_run.py"))]

## Setting initial parameter bounds and values

Now, just for fun, lets push some initial parameter values (the `parval1` quantities) to their upper bounds before drawing the ensemble.  This will result in some ugly prior draws with values "stacked" at the upper value.  This is meant to mimic the situation where the initial parameter values arent "centered" WRT the bounds. 

In [None]:
par = pf.pst.parameter_data
par.pname.unique()
hk1par = par.loc[par.pname.str.contains("-layer1"),:]
assert hk1par.shape[0] > 0

In [None]:
par.loc[hk1par.parnme,"parval1"] = hk1par.parubnd.values - (hk1par.parubnd.values*0.3)

In [None]:
par.loc[hk1par.parnme,:]

In [None]:
pf.pst.write(os.path.join(pf.new_d,"pest_org.pst"),version=2)

# Generating a prior parameter ensemble, then run and viz a real

In [None]:
np.random.seed(122341)
pe = pf.draw(num_reals=100)

In [None]:
pe_org = pe._df.copy()
pe.enforce()
fig,ax = plt.subplots(1,1)
pe.loc[:,hk1par.parnme[0]]._df.plot(kind="hist",ax=ax,fc="0.5",density=True,alpha=0.5)
pe_org.loc[:,hk1par.parnme[0]].plot(kind="hist",ax=ax,fc="b",density=True,alpha=0.5)


Yikes! This happens a lot...what can we do about?  Well, in the optimization world, there is the idea that you can "relax" constraints by changing them from strict "thou shalt not" quantities to penalties, which express more of a desire or preference.  In practice, how super sure are we that the bounds on HK are really the max and min values we are willing accept?  Maybe we would tolerate some transgressions across these bounds?  If you are in this camp, here is a way to "relax" the problem.

First, we need to add the parameter values as observations, so that each time we run the model, we record the parameter values as output quantities we are monitoring:

In [None]:
pf.pst.add_pars_as_obs(pst_path=pf.new_d)

Let's see what we added:

In [None]:
pf.pst.observation_data.tail()

See how we now have observations that match the parameter names, but with "greater_than" and "less_than" column values and that they are set to the current lower and upper parameter bounds, respectively?  PESTPP-IES treats these observations as "range observations" or double-inequality values so that any value between these limits is accepted without penalty.  Notice also that the weights have been set as proportional to the distance between the bounds - this results in the penalty for values outside of the acceptable range being proportional to the standard deviation implied by the distance between the bounds - nice! 

Now, we can increase the distance between bounds...there is a method for that:

In [None]:
pf.pst.dialate_par_bounds(dialate_factor=2.0)

In [None]:
pf.pst.parameter_data

In [None]:
np.random.seed(122341)
pe_dialated = pf.draw(num_reals=100)

In [None]:
pe_dialated.enforce()

In [None]:
fig,axes = plt.subplots(3,1,sharex=True)
pe.loc[:,hk1par.parnme[0]]._df.plot(kind="hist",ax=axes[0],fc="0.5",density=True,alpha=0.5)
axes[0].set_title("original with enforcement - yuck!")
pe_org.loc[:,hk1par.parnme[0]].plot(kind="hist",ax=axes[1],fc="b",density=True,alpha=0.5)
axes[1].set_title("original without enforcement")
pe_dialated.loc[:,hk1par.parnme[0]]._df.plot(kind="hist",ax=axes[2],fc="m",density=True,alpha=0.5)
axes[2].set_title("dilated bounds with enforcement")
