# Global Sensitivity Analysis

The explanatory power of sensitivity analysis is unparalleled.  The insights provided by the causality of sensitivity analysis gives modelers and stakeholders a direct view of whats "important" in a modeling analysis.  When so-called "local" analyses have been and continue to be used as a by-product of Jacobian matrices, these analyses are "local" in the sense that model input and output relations they represent are approximated as linear and are only valid in a highly localized region in parameter space, in the immediate vicinity of where the Jacobian was evaluated.  Global sensitivity analyses (GSA), as the name suggests, attempt to evaluate and represent the relation between model inputs and outputs, effectively removing the two issues with Jacobian based analyses.  But this comes at a high computational cost #nofreelunch. 

So a nice middle ground in the spectrum of GSA techniques is the Method of Morris.  It samples sensitivities across parameter space so it can provide a measure of how sensitivities vary as parameter changes.  It is also relatively computationally efficient as GSA approaches go - we can get by with as little as 4 runs per adjustable parameter, but this efficiency has a cost:  Morris can't explicitly differentiate between sources of nonlinearity: parameter correlation or/and actual nonlinearity in the model itself.  And let's be honest: 4 runs per pararmeter is still a lot of runs if you have 100s or more parameters and/or a slow running model, so in practice, we typically only evaluate very broad-scale parameters with GSA.

In this notebook we will demonstrate how to run and interpret `pestpp-sen` results applied to the freyberg model interface we setup previously.
 

### The modified Freyberg PEST dataset

The modified Freyberg model is introduced in another tutorial notebook (see ["intro to freyberg model"](../part0_02_intro_to_freyberg_model/intro_freyberg_model.ipynb)). The current notebook picks up following the ["freyberg psfrom pest setup"](../part2_01_pstfrom_pest_setup/freyberg_pstfrom_pest_setup.ipynb) notebook, in which a high-dimensional PEST dataset was constructed using `pyemu.PstFrom`. You may also wish to go through the ["intro to pyemu"](../part0_intro_to_pyemu/intro_to_pyemu.ipynb) and ["pstfrom sneakpeak"](../part1_02_pest_setup/pstfrom_sneakpeak.ipynb) notebooks beforehand.

The next couple of cells load necessary dependencies and call a convenience function to prepare the PEST dataset folder for you. This is the same dataset that was constructed during the ["freyberg psfrom pest setup"](../part2_01_pstfrom_pest_setup/freyberg_pstfrom_pest_setup.ipynb) tutorial. Simply press `shift+enter` to run the cells.

In [None]:
import os
import shutil
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning) 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt;
import psutil 

import sys
import pyemu
import flopy
assert "dependencies" in flopy.__file__
assert "dependencies" in pyemu.__file__
sys.path.insert(0,"..")
import herebedragons as hbd

In [None]:
# specify the temporary working folder
t_d = os.path.join('freyberg6_template')
if os.path.exists(t_d):
    shutil.rmtree(t_d)

org_t_d = os.path.join("..","part2_02_obs_and_weights","freyberg6_template")
if not os.path.exists(org_t_d):
    raise Exception("you need to run the '/part2_02_obs_and_weights/freyberg_obs_and_weights.ipynb' notebook")
shutil.copytree(org_t_d,t_d)

Load the PEST control file as a `Pst` object.

In [None]:
pst_path = os.path.join(t_d, 'pest.pst')
pst = pyemu.Pst(os.path.join(t_d, 'pest.pst'))

In [None]:
# check to see if obs&weights notebook has been run
if not pst.observation_data.observed.sum()>0:
    raise Exception("You need to run the '/part2_02_obs_and_weights/freyberg_obs_and_weights.ipynb' notebook")

### `fixed`-ing and `tied`-ing parameters to reduce the computational burden

Using the `fixed` and `tied` parameter transform options, we can effectively reduce the number of adjustable parameters. 
Since we have many "constant" type pars, lets fixed the grid scale and pilot point pars to get the number of runs down...

In [None]:
par = pst.parameter_data
par.ptype.value_counts()

In [None]:
par.loc[par.ptype=="gr","partrans"] = "fixed"
par.loc[par.ptype=="pp","partrans"] = "fixed"
pst.adj_par_names

In [None]:
pst.npar_adj

Then, re-write the PEST control file. If you open `pest.pst` in a text editor, you'll see a new PEST++ control variable has been added.

In [None]:
pst.write(os.path.join(t_d, 'pest.pst'),version=2)

Always good to do the 'ole `noptmax=0` test:

Now, we are going to run `pestpp-ies` in parallel with `noptmax=-1` to simulate the prior Monte Carlo. 

To speed up the process, you will want to distribute the workload across as many parallel agents as possible. Normally, you will want to use the same number of agents (or less) as you have available CPU cores. Most personal computers (i.e. desktops or laptops) these days have between 4 and 10 cores. Servers or HPCs may have many more cores than this. Another limitation to keep in mind is the read/write speed of your machines disk (e.g. your hard drive). PEST and the model software are going to be reading and writting lots of files. This often slows things down if agents are competing for the same resources to read/write to disk.

The first thing we will do is specify the number of agents we are going to use.

# Attention!

You must specify the number which is adequate for ***your*** machine! Make sure to assign an appropriate value for the following `num_workers` variable:

In [None]:
num_workers = psutil.cpu_count(logical=False) #update this according to your resources
num_workers

Next, we shall specify the PEST run-manager/master directory folder as `m_d`. This is where outcomes of the PEST run will be recorded. It should be different from the `t_d` folder, which contains the "template" of the PEST dataset. This keeps everything separate and avoids silly mistakes.

In [None]:
m_d = os.path.join('master_morris')

The following cell deploys the PEST agents and manager and then starts the run using `pestpp-ies` (using `pestpp-ies pest.pst /h localhost:4004` on the agents, and `pestpp-ies pest.pst /h :4004` on the manager).

Run it by pressing `shift+enter`.

If you wish to see the outputs in real-time, switch over to the terminal window (the one which you used to launch the `jupyter notebook`). There you should see `pestpp-ies`'s progress written to the terminal window in real-time. 

If you open the tutorial folder, you should also see a bunch of new folders there named `worker_0`, `worker_1`, etc. These are the agent folders. The `master_priormc` folder is where the manager is running. 

This run should take several minutes to complete (depending on the number of workers and the speed of your machine). If you get an error, make sure that your firewall or antivirus software is not blocking `pestpp-ies` from communicating with the agents (this is a common problem!).

> **Pro Tip**: Running PEST from within a `jupyter notebook` has a tendency to slow things down and hog alot of RAM. When modelling in the "real world" it is more efficient to implement workflows in scripts which you can call from the command line.

In [None]:
pyemu.os_utils.start_workers(t_d, # the folder which contains the "template" PEST dataset
                            'pestpp-sen', #the PEST software version we want to run
                            'pest.pst', # the control file to use with PEST
                            num_workers=num_workers, #how many agents to deploy
                            worker_root='.', #where to deploy the agent directories; relative to where python is running
                            master_dir=m_d, #the manager directory
                            )

### Explore the Outcomes

`pestpp-sen` writes the results of the Method of Morris to 3 primary output files: 
 - "pest.msn": the sensitivity summary of the objective function as currently defined in the control file
 - "pest.group.msn": the sensitivity summary of each of observation group that contains atleast one non-zero weighted observation
 - "pest.mos": the sensitivity summary of every observation in the control file, regardless of weight values


In [None]:
msn = pd.read_csv(os.path.join(m_d,"pest.msn"))
gmsn = pd.read_csv(os.path.join(m_d,"pest.group.msn"))
mos = pd.read_csv(os.path.join(m_d,"pest.mos"))


The msn file lists each adjustable parameter, the number of model runs for that parameter, then the mean sensitivity ("sen_mean"), the mean absolute sensitivity ("sen_mean_abs"), which can be important of the sign of the sensitivity is changing, and the standard deviation of the sensitivity ("sen_std_dev"):

In [None]:
msn.columns

similar for the group summary, except now there is "obs_group_name" column:

In [None]:
gmsn.columns

Things a only slightly different for the individual observation summary:  There is a "mean" and "sigma" column, which are the raw mean and standard deviation of the so-called "elementary effects", where as the "scaled_sen" column, as the name implies, scales the absolute value of the elementary effects by the parameter standard deviation (implied by the bounds) and the obseration standard deviation (implied by the weights). 

In [None]:
mos.columns

Ok, lets see what we have. First the objective function (why are we so crazy about the past?):

In [None]:
num_to_label = 3
#msn.loc[:,["sen_mean_abs","sen_std_dev"]] = msn.loc[:,["sen_mean_abs","sen_std_dev"]].apply(np.log10)
msn.sort_values(by="sen_mean_abs",ascending=False,inplace=True)
fig,ax = plt.subplots(1,1,figsize=(10,10))
ax.scatter(msn.sen_mean_abs,msn.sen_std_dev,marker=".",s=20,c="r")
for i in range(num_to_label):
    x,y = msn.sen_mean_abs.iloc[i],msn.sen_std_dev.iloc[i]
    name = msn.parameter_name.iloc[i]
    ax.text(x,y,name)
mn = min(ax.get_xlim()[0],ax.get_ylim()[0])
mx = max(ax.get_xlim()[1],ax.get_ylim()[1])
ax.plot([mn,mx],[mn,mx],"k--",lw=1.5)
ax.set_xlim(mn,mx)
ax.set_ylim(mn,mx)
ax.set_xlabel("mean sens")
ax.set_ylabel("stdev sens")

ax.grid()



Locations that plot above the 1-to-1 line indicate a sensivity that varies considerably across parameter space.  So in this setting, we see that broad-scale HK, storage, and recharge are the most influential inputs, and the there is some nonlinearity in the relation between HK/storage and the objective function.  Its important to remember we have drastically reduced the dimensionality of the parmaeterization to allow us to use this analysis, so in some ways, this reinforces the needed to use spatially distributed parameters for these properties to avoid artifically inflating the importance of these constant parameters.

Cool, but what about the forecasts?



In [None]:
num_to_label = 5
forecasts = pst.pestpp_options["forecasts"].split(",")
for forecast in forecasts:
    fmso = mos.loc[mos.observation_name==forecast,:].copy()
    fmso.sort_values(by="sen_mean_abs",ascending=False,inplace=True)
    fig,ax = plt.subplots(1,1,figsize=(10,10))
    ax.scatter(fmso.sen_mean_abs,fmso.sen_std_dev,marker=".",s=10,c="k")
    for i in range(num_to_label):  
        x,y = fmso.sen_mean_abs.iloc[i],fmso.sen_std_dev.iloc[i]
        name = fmso.parameter_name.iloc[i]
        t = ax.text(x*1.03,y*1.03,name)
        t.set_bbox(dict(facecolor='w', alpha=1.0, edgecolor='k'))
    mn = min(ax.get_xlim()[0],ax.get_ylim()[0])
    mx = max(ax.get_xlim()[1],ax.get_ylim()[1])
    ax.plot([mn,mx],[mn,mx],"k--",lw=1.5)
    ax.set_xlim(mn,mx)
    ax.set_ylim(mn,mx)
    ax.set_xlabel("mean sens")
    ax.set_ylabel("stdev sens")
    ax.grid()
    ax.set_title(forecast,loc="left")
    fig.tight_layout()
    

With these results, we can see that different parameters are important for fitting that past vs each individual prediction of interest...an important lesson that seems to be hard for people to grasp...