<img src="PEST++V3_cover.jpeg" style="float: left">

<img src="flopylogo.png" style="float: right">

<img src="AW&H2015.png" style="float: center">

# Looking at Parameter Identifiability

Sensitivity analyses can mask other artifacts that affect calibration and uncertainty. A primary issues is correlation between parameters.  For example, we saw that in a heads-only calibration we can't estimate both recharge and hydraulic conductivity independently - the parameters are correlated so that an increase in one can be offset with an increase in the other.  To address this shortcoming, Doherty and Hunt (2009) show that singular value decomposition can extend the sensitivity insight into __*parameter identifiability*__.  Parameter identifiability combines parameter insensitivity and correlation information, and reflects the robustness with which particular parameter values in a model might be calibrated. That is, an identifiable parameter is both sensitive and relatively uncorrelated and thus is more likely to be estimated (identified) than an insensitive and/or correlated parameter. 

Parameter identifiability is considered a "linear method" in that it assumes the Jacobian matrix sensitivities hold over a range of reasonable parameter values.  It is able to address parameter correlation through singular value decomposition (SVD), exactly as we've seen earlier in this course.  Parameter identifiability ranges from 0 (perfectly unidentifiable with the observations available) to 1.0 (fully identifiable). So, we typically plot identifiability using a stacked bar chart which is comprised of the included singular value contributions. Another way to think of it: if a parameter is strongly in the SVD solution space (low singular value so above the cutoff) it will have a higher identifiability. However, as Doherty and Hunt (2009) point out, identifiability is qualitative in nature because the singular value cutoff is user specified. 

You can access parameter identifiability at the command line using the PEST utility __*identpar*__.  As always, when you type identpar without arguments you'll get what the utility needs to run. For identpar.exe it looks like:



    IDENTPAR Version 14.01. Watermark Numerical Computing.


    IDENTPAR is run using the command:

        IDENTPAR casename numvec outbase matfile identfile [/s or /r]

    where

        casename  is a PEST control file basename,
        numvec    is the number of singular values to use,
        outbase   is the filename base of sensitivity vector output files,
        outfile   is the name of a matrix output file
        identfile is the name of a parameter identifiability output file, and
        /s or /r  instigates SVD on XtQX or Q^(1/2)X respectively (/s is default).

        Note: enter a filename of "null" for no pertinent output file.


The input of __numvec__ specifies the singular value cutoff that is used to calculate identifiability; the __identfile__ above provides the output in a format suitable for plotting.  

However, in our example here __we won't use the command line PEST utility__ but will instead take advantage of the pyemu version of parameter identifiability for convenience. Let's take a look at it more closely and see what we can learn from it and how to handle such information as the number of parameters rises.

### One last cool concept about identifiability the Doherty and Hunt (2009) point out:  
Because parameter identifiability uses the Jacobian matrix it is the *sensitivity* that matters, not the actual value specified. This means you can enter *hypothetical observations* to the existing observations, re-run the Jacobian matrix, and then re-plot identifiability. In this way identifiability becomes a quick but qualitative way to look at the worth of future data collection - an underused aspect of our modeling!   

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import sys
sys.path.append('..')
import numpy as np
import pandas as pd
import pyemu
import os, shutil
import re
from matplotlib.backends.backend_pdf import PdfPages
runall= False
import sensitivity_identifiability_helper as sih

In [None]:
import freyberg_setup as fs
fs.setup_pest_pp()
working_dir = fs.WORKING_DIR_PP
pst_name = fs.PST_NAME_PP

In [None]:
fs.plot_model(working_dir, pst_name)

# We need to calculate a Jacobian Matrix to look at sensitivity and identifiability

## we just need to sen `NOPTMAX=-1` in the PST control file and run PESTPP

In [None]:
inpst = pyemu.Pst(os.path.join(working_dir,'freyberg_pp.pst'))
inpst.control_data.noptmax=-1
inpst.write(os.path.join(working_dir,'freyberg_jac.pst'))

In [None]:
if runall==True:
    os.chdir(working_dir)
    pyemu.helpers.run('pestpp freyberg_jac.pst')
    os.chdir('..')
else:
    shutil.copy2('freyberg_jac.jcb', os.path.join(working_dir,'freyberg_jac.jcb'))

# Let's load up the resulting Jacobian and look at sensitivity and identifiability

## Make a Schur Complement object in `pyemu`

In [None]:
sc = pyemu.Schur(jco=os.path.join(working_dir,'freyberg_jac.jcb'))

## among other things, this loads the Jacobian matrix (called `jco`) as a property

In [None]:
plt.imshow(sc.jco.x[:25,:25].T)
plt.colorbar()


In [None]:
plt.imshow(np.log10(np.abs(sc.jco.x[:25,:25].T)))
plt.colorbar()

In [None]:
svals = sc.xtqx.s
plt.plot(svals.x)

In [None]:
plt.plot(svals.x)
plt.yscale('log')

# To look at identifiability we will need to create an `ErrVar` object in `pyemu`

In [None]:
ev = pyemu.ErrVar(jco=os.path.join(working_dir,'freyberg_jac.jcb'))

## We can get a dataframe of identifiability for any singular value cutoff

In [None]:
id_df = ev.get_identifiability_dataframe(singular_value=5).sort_values(by='ident', ascending=False)
id_df.head()

## It's easy to visualize these as stacked bar charts

In [None]:
id = sih.plot_id_bars(ev, 150)

## More meaningful to look at a singular value cutoff

In [None]:
id = sih.plot_id_bars(ev,10)

## How does this compare with CSS (Composite Scaled Sensitivities)?

In [None]:
plt.figure(figsize=(12,4))
ax = sc.get_par_css_dataframe()['pest_css'].sort_values(ascending=False).plot(kind='bar')
ax.set_yscale('log')

## We can read in the MLE covariance and look at correlation

In [None]:
covar = pyemu.Cov(sc.xtqx.x, names=sc.xtqx.row_names)
covar.df().head()

In [None]:
R = covar.to_pearson()
plt.imshow(R.df(), interpolation='nearest', cmap='viridis')
plt.colorbar()

### Let's look at correlation
CSS suffers from the challenge that values with high CSS may be corelated with other parameters. We can check that out. Identifiability, on the other hand, tends to spread among the correlated parameters so that identifiability is suppressed from each of those parameters. This makes a big difference between what is "sensitive" vs. "identifiable"

In [None]:
cpar = 'w0_r09_c16'
R.df().loc[cpar][np.abs(R.df().loc[cpar])>.5]

In [None]:
sih.plot_identifiability_spatial(ev, 13, True)