# A deeper dive into the Gauss-Levenberg-Marquardt (GLM) algorithm: weights and noise

The Gauss-Levenberg-Marquart (GLM) method is a gradient-based method used to search the objective function surface for its minimum value. It assumes that simulated values of observation targets vary continuously in response to changes in calibration-adjusted model parameters.  Two critical aspects of using ensemble-forms of GLM are how weights and observation noise are specified.  We will do this using the response surface of a simple 2-parameter form of the freyberg model

This notebook builds on the previous response-surface notebook - check it out for a discussion of the GLM lambda in the context of response surfaces...

### Admin
We have provided some pre-cooked PEST dataset files, wrapped around the modified Freyberg model. This is the same dataset introduced in the "freyberg_pest_setup" and "freyberg_k" notebooks. 

The functions in the next cell import required dependencies and prepare a folder for you. This folder contains the model files and a preliminary PEST setup. Run the cells, then inspect the new folder named "freyberg_mf6" which has been created in your tutorial directory. (Just press `shift+enter` to run the cells). 

In [None]:
import sys
import os
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning) 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt;
import shutil

# sys.path.insert(0,os.path.join("..", "..", "dependencies"))
import pyemu
import flopy
assert "dependencies" in flopy.__file__
assert "dependencies" in pyemu.__file__
sys.path.insert(0,"..")
import herebedragons as hbd
import response_surface as resurf

plt.rcParams['font.size'] = 10
pyemu.plot_utils.font =10

In [None]:
# folder containing original model files
org_d = os.path.join('..', '..', 'models', 'monthly_model_files_1lyr_newstress')
# a dir to hold a copy of the org model files
tmp_d = os.path.join('freyberg_mf6')

if os.path.exists(tmp_d):
    shutil.rmtree(tmp_d)
shutil.copytree(org_d,tmp_d)
# get executables
hbd.prep_bins(tmp_d)
# get dependency folders
hbd.prep_deps(tmp_d)
# run our convenience functions to prepare the PEST and model folder
hbd.prep_pest(tmp_d)

### Reminder - the modified-Freyberg model
Just a quick reminder of what the model looks like and what we are doing. 

It is a one-layer model. A river runs north-south, represented with the SFR package (green cells in the figure). On the southern border there is a GHB (cyan cells). No-flow cells are shown in black. Pumping wells are shown with red cells. 

Time-series of measured heads are available at the locations marked with black X's. River flux is also measured at three locations (headwater, tailwater and gage; not displayed).

The simulation starts with a steady state stress period, followed by twelve transient stress periods. These represent the historic period, for which measured data are available.

A subsequent twelve transient stress periods representing a period in the future. Modelling is undertaken to assess selected forecasts during the simulated period.

In [None]:
hbd.plot_freyberg(tmp_d)

### The PEST Control File

You may  wish to explore the `freyberg_mf6` folder which has been created in the tutorial directory. In it you will find a PEST control file named `freyberg.pst`.

Let's use `pyemu` to load the PEST control file and check some details. 

In [None]:
pst = pyemu.Pst(os.path.join(tmp_d, 'freyberg.pst'))
pst.par_names

In [None]:
par = pst.parameter_data
par

We shall explore the effect of having two adjustable parameters: `hk1` and `rch0`. As we saw previously, these two parameters are correlated. If we use only head observations for calibration we are unable to achive a unique solution. 

(We shall change `hk1` parameter bounds just to make visualization easier.)

In [None]:
par.loc['rch0', 'partrans'] = 'log'
par.loc['hk1', 'parlbnd'] = 1.5
par.loc['hk1', 'parubnd'] = 15

Re-write the control file:

In [None]:
pst.write(os.path.join(tmp_d, 'freyberg.pst'))

The `response_surface.py` file in the tutorial folder contains a few functions to run PEST++SWP and make plots. These run PEST++SWP a few hundred times for a combination of `hk1` and `rch0` values.

In [None]:
num_workers = 25

### Add Flux Observations

As we did in the "freyberg k, r and flux obs" tutorial, we know add a second set of observation data. These are measurments of stream flow. We now have observations of head and observations of stream flow

In [None]:
pst = pyemu.Pst(os.path.join(tmp_d, 'freyberg.pst'))

In [None]:
# set weights to gage-1 observations during calibration period
obs = pst.observation_data
obs_times = obs.loc[~obs['gage-1'].isnull(), 'gage-1'].astype(float).values
calib_times = [str(i) for i in obs_times if i<4018.5]
obs.loc[obs['gage-1'].isin(calib_times), 'weight'] = 0. # we know for experiments this is a good, error-based value...

Re-write the control file.

In [None]:
pst.write(os.path.join(tmp_d, 'freyberg.pst'))

Run PEST++SWP again to recalcualte the response surface.

In [None]:
org_d = "resp_weight1"
if os.path.exists(org_d):
    shutil.rmtree(org_d)
shutil.copytree(tmp_d,org_d)
resurf.run_respsurf(par_names=['hk1','rch0'],num_workers=num_workers,port=4269,WORKING_DIR=org_d)

And plot it up again. Now we see the objective function surface funneling down to a single point. We have achieved a unique solution.  The "trough of dispair" has been the "bowl of uniqueness"!  A clear demonstration of the value of unique and diverse data...

In [None]:
fig, ax, resp_surf = resurf.plot_response_surface(cmap='jet', figsize=(7,7),WORKING_DIR=org_d) #maxresp=1e3,

### Understanding how weights change the response surface (and the result!)

A critical point in all of this: The weights assigned to the observations define the shape of the objective function (likelihood) function - note: herein, we refer to this as the response surface. The extreme example of this is adding observations.  But even just changing the weights between obserations also changes things.  To see this, lets triple the weight of the surface water flux observations:

In [None]:
# set weights to gage-1 observations during calibration period
obs = pst.observation_data
obs_times = obs.loc[~obs['gage-1'].isnull(), 'gage-1'].astype(float).values
calib_times = [str(i) for i in obs_times if i<4018.5]
obs.loc[obs['gage-1'].isin(calib_times), 'weight'] = 0.009
pst.write(os.path.join(tmp_d, 'freyberg.pst'))
mod_d = "resp_weight2"
if os.path.exists(mod_d):
    shutil.rmtree(mod_d)
shutil.copytree(tmp_d,mod_d)
resurf.run_respsurf(par_names=['hk1','rch0'],num_workers=num_workers,port=4269,WORKING_DIR=mod_d)

In [None]:
fig, ax, resp_surf = resurf.plot_response_surface(cmap='jet', figsize=(7,7),WORKING_DIR=mod_d) #maxresp=1e3,

Thats a pretty shocking difference, especially when we consider that this is a simple 2-D (ie 2-parameter) problem.  Whats more, the goal of our data assimilation analyses is to navigate this surface to the (region around the) minimum.  Let's see how changing the weights changes the parameter posterior distribution

# Understanding how weights change results


In [None]:
t_d = "template"

if os.path.exists(t_d):
    shutil.rmtree(t_d)
shutil.copytree(org_d,t_d)
# get executables
hbd.prep_bins(t_d)
# get dependency folders
hbd.prep_deps(t_d)
# run our convenience functions to prepare the PEST and model folder
hbd.prep_pest(t_d)
pst = pyemu.Pst(os.path.join(t_d,"freyberg.pst"))
obs = pst.observation_data
obs_times = obs.loc[~obs['gage-1'].isnull(), 'gage-1'].astype(float).values
calib_times = [str(i) for i in obs_times if i<4018.5]
obs.loc[obs['gage-1'].isin(calib_times), 'weight'] = 0.003
obs.loc[pst.nnz_obs_names,"standard_deviation"] = 10. / obs.loc[pst.nnz_obs_names,"weight"]

par = pst.parameter_data
par.loc['rch0', 'partrans'] = 'log'
par.loc['hk1', 'parlbnd'] = 1.5
par.loc['hk1', 'parubnd'] = 15

par.loc[pst.adj_par_names,"partrans"] = "none"
par.loc[pst.adj_par_names,"parval1"] = (par.loc[pst.adj_par_names,"parlbnd"] + par.loc[pst.adj_par_names,"parubnd"]) / 2.0

pst.control_data.noptmax = 6
pst.write(os.path.join(t_d,"freyberg.pst"))

In [None]:
pst = pyemu.Pst(os.path.join(t_d,"freyberg.pst"))
obs = pst.observation_data
pst.write(os.path.join(t_d,"freyberg.pst"),version=2)
m_d = "master_orgweight"
pyemu.os_utils.start_workers(t_d,"pestpp-ies","freyberg.pst",num_workers=num_workers,worker_root=".",master_dir=m_d)

In [None]:
fig, ax, resp_surf = resurf.plot_response_surface(cmap='jet', figsize=(7,7),WORKING_DIR=org_d) #maxresp=1e3,
pes = []
for i in range(pst.control_data.noptmax+1):
    fname = os.path.join(m_d,"freyberg.{0}.par.csv".format(i))
    if not os.path.exists(fname):
        break
    pe = pd.read_csv(fname,index_col=0)    
    pes.append(pe)
assert len(pes) > 0
for real in pes[-1].index:
    xvals  = [pe.loc[real,"hk1"] for pe in pes]
    yvals  = [pe.loc[real,"rch0"] for pe in pes]
    ax.plot(xvals,yvals,marker=".",c="0.5",lw=0.5)
xvals = pes[-1].loc[:,"hk1"].values
yvals = pes[-1].loc[:,"rch0"].values
ax.scatter(xvals,yvals,marker=".",c="b",zorder=10)

In [None]:
pst = pyemu.Pst(os.path.join(t_d,"freyberg.pst"))
obs = pst.observation_data
#obs.loc[pst.nnz_obs_names,"standard_deviation"] = 1. / obs.loc[pst.nnz_obs_names,"weight"]
obs = pst.observation_data
obs_times = obs.loc[~obs['gage-1'].isnull(), 'gage-1'].astype(float).values
calib_times = [str(i) for i in obs_times if i<4018.5]
obs.loc[obs['gage-1'].isin(calib_times), 'weight'] = 0.009
pst.write(os.path.join(t_d,"freyberg.pst"),version=2)
m_d = "master_largerweight"
pyemu.os_utils.start_workers(t_d,"pestpp-ies","freyberg.pst",num_workers=num_workers,worker_root=".",master_dir=m_d)

So the posterior for `hk1` covers the range of about 3-4 l/t and `rch0` spans about 1.05 to 1.15 l/t...pretty narrow...

In [None]:
fig, ax, resp_surf = resurf.plot_response_surface(cmap='jet', figsize=(7,7),WORKING_DIR=mod_d) #maxresp=1e3,
pes = []
for i in range(pst.control_data.noptmax+1):
    fname = os.path.join(m_d,"freyberg.{0}.par.csv".format(i))
    if not os.path.exists(fname):
        break
    pe = pd.read_csv(fname,index_col=0)    
    pes.append(pe)
for real in pes[-1].index:
    xvals  = [pe.loc[real,"hk1"] for pe in pes]
    yvals  = [pe.loc[real,"rch0"] for pe in pes]
    ax.plot(xvals,yvals,marker=".",c="0.5",lw=0.5)
xvals = pes[-1].loc[:,"hk1"].values
yvals = pes[-1].loc[:,"rch0"].values
ax.scatter(xvals,yvals,marker=".",c="b",zorder=10)

### Understanding how noise effects posterior results

Results so far, the noise realizations used in the assimilation process were derived from the assumption that the noise standard deviation was the inverse of the assigned weights.  This is not an ideal situation and in real-world practice (where models are imperfect simulators), we should always seperate the weights and noise.  This can be done in many, many ways.  Herein, we will do this by supplying a `standard_deviation` column in the observation data.  For our first experiemnt, lets assume the noise standard deviation is ten times greater than the noise implied by the inverse of the weights:

In [None]:
pst = pyemu.Pst(os.path.join(t_d,"freyberg.pst"))
obs = pst.observation_data
obs.loc[pst.nnz_obs_names,"standard_deviation"] = 10. / obs.loc[pst.nnz_obs_names,"weight"]
pst.write(os.path.join(t_d,"freyberg.pst"),version=2)
m_d = "master_bignoise"
pyemu.os_utils.start_workers(t_d,"pestpp-ies","freyberg.pst",num_workers=15,worker_root=".",master_dir=m_d)

In [None]:
fig, ax, resp_surf = resurf.plot_response_surface(cmap='jet', figsize=(7,7),WORKING_DIR=mod_d) #maxresp=1e3,
pes = []
for i in range(pst.control_data.noptmax+1):
    fname = os.path.join(m_d,"freyberg.{0}.par.csv".format(i))
    if not os.path.exists(fname):
        break
    pe = pd.read_csv(fname,index_col=0)    
    pes.append(pe)
for real in pes[-1].index:
    xvals  = [pe.loc[real,"hk1"] for pe in pes]
    yvals  = [pe.loc[real,"rch0"] for pe in pes]
    ax.plot(xvals,yvals,marker=".",c="0.5",lw=0.5)
xvals = pes[-1].loc[:,"hk1"].values
yvals = pes[-1].loc[:,"rch0"].values
ax.scatter(xvals,yvals,marker=".",c="b",zorder=10)
plt.show()
plt.close(fig)

Yowza!  Thats a very difference result - the posterior for both parameters has substantially more variance; `hk1` ranges from less than 2 l/t to 11 l/t, while `rch0` ranges from 0.8 l/t to 1.3 l/t.  Let's get a less noisy result for comparison:

In [None]:
pst = pyemu.Pst(os.path.join(t_d,"freyberg.pst"))
obs = pst.observation_data
obs.loc[pst.nnz_obs_names,"standard_deviation"] = 5. / obs.loc[pst.nnz_obs_names,"weight"]
pst.write(os.path.join(t_d,"freyberg.pst"),version=2)
m_d = "master_mediumnoise"
pyemu.os_utils.start_workers(t_d,"pestpp-ies","freyberg.pst",num_workers=15,worker_root=".",master_dir=m_d)


In [None]:
fig, ax, resp_surf = resurf.plot_response_surface(cmap='jet', figsize=(7,7),WORKING_DIR=mod_d) #maxresp=1e3,
pes = []
for i in range(pst.control_data.noptmax+1):
    fname = os.path.join(m_d,"freyberg.{0}.par.csv".format(i))
    if not os.path.exists(fname):
        break
    pe = pd.read_csv(fname,index_col=0)    
    pes.append(pe)
for real in pes[-1].index:
    xvals  = [pe.loc[real,"hk1"] for pe in pes]
    yvals  = [pe.loc[real,"rch0"] for pe in pes]
    ax.plot(xvals,yvals,marker=".",c="0.5",lw=0.5)
xvals = pes[-1].loc[:,"hk1"].values
yvals = pes[-1].loc[:,"rch0"].values
ax.scatter(xvals,yvals,marker=".",c="b",zorder=10)
plt.show()
plt.close(fig)

Ok now its clear - the noise is controlling the posterior variance of both parameters - this is not unexpected in a well-posed inverse problem as both parameters are being strongly conditioned by all the observations.  But in all cases, thought must be put into noise, and, at the very least, weights and noise should be specified explicitly, especially in situations where weights are being adjusted to balance contributions to the objective function.  