# Pilot Points Setup

In this notebook we are going to calibrate the Freyberg model with pilot points as the parameterisation device for hydraulic conductivity. We are using the same PEST control file constructed in the "freyberg pilotpoints setup" notebook. 

 
### Admin
We have provided some pre-cooked PEST dataset files, wraped around the modified Freyberg model. This is the same dataset introduced in the "freyberg_pest_setup" and subsequent notebooks. We pick up here after the "freyberg pilotpoints setup" notebook.

The functions in the next cell import required dependencies and prepare a folder for you. This folder contains the model files and a preliminary PEST setup. Run the cells, then inspect the new folder named "freyberg_mf6" which has been created in your tutorial directory. (Just press `shift+enter` to run the cells). 

In [1]:
import sys
import os
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning) 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt;
import psutil
import shutil

sys.path.insert(0,os.path.join("..", "..", "dependencies"))
import pyemu
import flopy
assert "dependencies" in flopy.__file__
assert "dependencies" in pyemu.__file__
sys.path.insert(0,"..")
import herebedragons as hbd

plt.rcParams['font.size'] = 10
pyemu.plot_utils.font =10

In [2]:
# folder containing original model files
org_d = os.path.join('..', '..', 'models', 'monthly_model_files_1lyr_newstress')
# a dir to hold a copy of the org model files
tmp_d = os.path.join('freyberg_mf6')
if os.path.exists(tmp_d):
    shutil.rmtree(tmp_d)
shutil.copytree(org_d,tmp_d)
# get executables
hbd.prep_bins(tmp_d)
# get dependency folders
hbd.prep_deps(tmp_d)
# run our convenience functions to prepare the PEST and model folder
hbd.prep_pest(tmp_d)

ins file for heads.csv prepared.
ins file for sfr.csv prepared.
noptmax:0, npar_adj:1, nnz_obs:24
written pest control file: freyberg_mf6\freyberg.pst


<pyemu.pst.pst_handler.Pst at 0x1788a6dd6f0>

### Run the non-pilot point setup
Just so we can compare; its quite quick so no worries.

In [3]:
pst_base = pyemu.Pst(os.path.join(tmp_d,'freyberg.pst'))
pst_base.control_data.noptmax=20
par = pst_base.parameter_data
par.loc['rch0', 'partrans'] = 'log'
obs = pst_base.observation_data
obs.loc[(obs.obgnme=="gage-1") & (obs['gage-1'].astype(float)<=3804.5), "weight"] = 0.05
pst_base.write(os.path.join(tmp_d, 'freyberg.pst'))
pyemu.os_utils.run("pestpp-glm freyberg.pst", cwd=tmp_d)
pst_base = pyemu.Pst(os.path.join(tmp_d,'freyberg.pst'))
assert pst_base.phi

noptmax:20, npar_adj:2, nnz_obs:30


Rcall what phi we achieve usgin homogneous `hk1` and `rch0`:

In [4]:
pst_base.phi

472.6138312468844

Let's see if we can beat that.

### Now run the pilot point setup

Load the PEST control file set up with pilot point parameters:

In [9]:
# convenience function that builds a new control file with pilot point parameters for hk
hbd.add_ppoints(tmp_d)

   could not remove start_datetime
starting interp point loop for 800 points
took 2.868298 seconds
1 pars dropped from template file freyberg_mf6\freyberg6.npf_k_layer1.txt.tpl
29 pars added from template file .\hkpp.dat.tpl
starting interp point loop for 800 points
took 2.700344 seconds
2 pars dropped from template file freyberg_mf6\freyberg6.rch.tpl
29 pars added from template file .\rchpp.dat.tpl
noptmax:20, npar_adj:58, nnz_obs:30
new control file: 'freyberg_pp.pst'


Load the control file with pilot point parameters:

In [10]:
pst = pyemu.Pst(os.path.join(tmp_d,'freyberg_pp.pst'))

As usual, run once to check that it works (trust but verify!):

In [11]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [12]:
pst.control_data.noptmax=0
pst.write(os.path.join(tmp_d, 'freyberg_pp.pst'))

pyemu.os_utils.run("pestpp-glm freyberg_pp.pst", cwd=tmp_d)

noptmax:0, npar_adj:58, nnz_obs:30


Exception: run() returned non-zero: 1

Check if it completed sucessfully:

In [None]:
pst = pyemu.Pst(os.path.join(tmp_d, 'freyberg_pp.pst'))
assert pst.phi

Right then, let's increase NOPTMAX and start parameter estimation:

In [None]:
pst.control_data.noptmax = 8 #enough for this problem

Always remember to re-write the control file!

In [None]:
pst.write(os.path.join(tmp_d, 'freyberg_pp.pst'))

OK, good to go.

Now, when using derivative-based methods such as are implemented in PEST++GLM (and PEST/PEST_HP) the cost of more adjustable parameters is ...more run-time. Recall that PEST(++) needs to run the model the same number of times as there are adjustable parameters in order to fill the Jacobian matrix.

We just went from having 2 adjustable paramters to having 30. So this is going to take quite a bit longer. 

Up until now, we have been running a single instance of `pestpp-glm`. Now, we are going to run `pestpp-glm` in parallel. 

To speed up the process, you will want to distribute the workload across as many parallel agents as possible. Normally, you will want to use the same number of agents (or less) as you have available CPU cores. Most personal computers (i.e. desktops or laptops) these days have between 4 and 10 cores. Servers or HPCs may have many more cores than this. Another limitation to keep in mind is the read/write speed of your machines disk (e.g. your hard drive). PEST and the model software are going to be reading and writting lots of files. This often slows things down if agents are competing for the same resources to read/write to disk.

The first thing we will do is specify the number of agents we are going to use.

### Attention!

You must specify the number which is adequate for ***your*** machine! Make sure to assign an appropriate value for the following `num_workers` variable. (If you are unsure how many cores you have, you can use `psutil` to check).

In [None]:
psutil.cpu_count(logical=False)

In [None]:
# set the number of parallel agents
num_workers = 6

Next, we shall specify the PEST run-manager/master directory folder as `m_d`. This is where outcomes of the PEST run will be recorded. It should be different from the `t_d` folder, which contains the "template" of the PEST dataset. This keeps everything separate and avoids silly mistakes.

In [None]:
m_d='master_pp'

The following cell deploys the PEST agents and manager and then starts the run using `pestpp-swp`. Run it by pressing `shift+enter`.

If you wish to see the outputs in real-time, switch over to the terminal window (the one which you used to launch the `jupyter notebook`). There you should see `pestpp-swp`'s progress written to the terminal window in real-time. 

If you open the tutorial folder, you should also see a bunch of new folders there named `worker_0`, `worker_1`, etc. These are the agent folders. The `master_priormc` folder is where the manager is running. 

This run should take several minutes to complete (depending on the number of workers and the speed of your machine). If you get an error, make sure that your firewall or antivirus software is not blocking `pestpp-glm` from communicating with the agents (this is a common problem!).

> **Pro Tip**: Running PEST from within a `jupyter notebook` has a tendency to slow things down and hog alot of RAM. When modelling in the "real world" it is more efficient to implement workflows in scripts which you can call from the command line.

In [None]:
pyemu.os_utils.start_workers(tmp_d, # the folder which contains the "template" PEST dataset
                            'pestpp-glm', #the PEST software version we want to run
                            'freyberg_pp.pst', # the control file to use with PEST
                            num_workers=num_workers, #how many agents to deploy
                            worker_root='.', #where to deploy the agent directories; relative to where python is running
                            master_dir=m_d, #the manager directory
                            )

## Outcomes

Re-load the control file and check the new Phi:

In [None]:
pst = pyemu.Pst(os.path.join(m_d, 'freyberg_pp.pst'))
assert pst.phi!=pst_base.phi
pst.phi

Sweet - we did way better! More parmaeters means more "flexibility" for PEST to obtain a better fit:

In [None]:
pst.phi / pst_base.phi

Check out the Phi progress. Not too bad. 

In [None]:
df_obj = pd.read_csv(os.path.join(m_d,"freyberg_pp.iobj"),index_col=0)
df_obj.total_phi.plot();
plt.ylabel('total_phi')

What about the fits with measured values? Doing better than before, for sure. 

 > (side note: recall we "conveniently" didn't add pilot points for recharge as well...not to mention all the other poorly known or unknowable parameter values...pumping rates, storage parameters, GHB parameters, aquifer geometry...etc, etc...perhaps we should have?)

In [None]:
figs = pst.plot(kind="1to1");

In [None]:
df_paru = pd.read_csv(os.path.join(m_d,"freyberg_pp.par.usum.csv"),index_col=0)
hk_pars = [p for p in pst.par_names if p.startswith("hk")]
df_hk = df_paru.loc[hk_pars,:]
ax = pyemu.plot_utils.plot_summary_distributions(df_hk,label_post=True)
mn = np.log10(pst.parameter_data.loc[hk_pars[0].lower(),"parlbnd"])
mx = np.log10(pst.parameter_data.loc[hk_pars[0].lower(),"parubnd"])
ax.plot([mn,mn],ax.get_ylim(),"k--")
ax.plot([mx,mx],ax.get_ylim(),"k--")

In [None]:

pst.parrep(os.path.join(m_d, "freyberg_pp.par" ))
pst.write_input_files(pst_path=m_d)
pyemu.geostats.fac2real(os.path.join(m_d,"hkpp.dat"),
                        factors_file=os.path.join(m_d,"hkpp.dat.fac"),
                        out_file=os.path.join(m_d,"freyberg6.npf_k_layer1.txt"))


In [None]:
df_pp = pyemu.pp_utils.pp_tpl_to_dataframe(os.path.join(m_d,"hkpp.dat.tpl"))
sim = flopy.mf6.MFSimulation.load(sim_ws=m_d, verbosity_level=0) #modflow.Modflow.load(fs.MODEL_NAM,model_ws=working_dir,load_only=[])
gwf= sim.get_model()
ax = gwf.npf.k.plot(colorbar=True,alpha=0.5)
ax.scatter(df_pp.x,df_pp.y,marker='x')

### Something is wrong...how does the calibrated HK field have so much more variability than the "truth"?  We better checkout the forecasts:

In [None]:
figs, axes = pyemu.plot_utils.plot_summary_distributions(os.path.join(m_d,
                    "freyberg_pp.pred.usum.csv"),subplots=True)
for ax in axes:
    fname = ax.get_title().lower()
    ylim = ax.get_ylim()
    v = pst.observation_data.loc[fname,"obsval"]
    ax.plot([v,v],ylim,"b--")
    ax.set_ylim(0, ylim[-1])

### Doh! What happened?  Answer: overfitting: we specified lots of parameters, so we are able to fit the observations really well - too well.  

### Even though we are able to measure water levels very precisely, the model has problems (model error), so we shouldn't expect the model to reproduce the observations so well.  But how do we control this overfitting??? 