# pyEMU basics

In this exercise, we will explore some of the capabilities of pyemu to deal with the PEST file formats, such as .pst, .jco/.jcb, .unc, .cov, .mat, etc, as well as generating PEST interface elements

In [None]:
%matplotlib inline
import os
import numpy as np
import matplotlib.pyplot as plt
import pyemu

In [None]:
pyemu.__path__  # check that we're pointing to the provided snapshot of pyemu (and flopy) repos

We will use some pre-cooked files in this notebook:

In [None]:
f_d = "handling_files"

os.listdir(f_d)

### Control files and the `Pst` class

pyEMU encapsulates the PEST control file in the `Pst` class

In [None]:
pst = pyemu.Pst(os.path.join(f_d,"freyberg_pp.pst"))

In [None]:
pst

The "*" sections of the control file are stored as attributes of the `Pst` instance (the PEST variable names are used for consistency)

In [None]:
pst.parameter_data.head()

In [None]:
pst.observation_data.head()

Control data is handled by a special class that tries to prevent stupidity

In [None]:
pst.control_data.formatted_values

In [None]:
pst.control_data.noptmax = "junk"

PEST++ options are stored in a dict:

In [None]:
pst.pestpp_options

In [None]:
pst.pestpp_options["lambdas"].split(',')

### Writing a control file

In [None]:
pst.write(os.path.join(f_d,"test.pst"))

A preview of things to come...

In [None]:
pst.write(os.path.join(f_d,"test.pst"),version=2)

In [None]:
!cat "handling_files/test.pst"

### Constructing a control file from template and instruction files

### DIY: get a new control file from a template file (or files) and an instruction file (or files).  You can use the files in the `f_d` directory, from the GWV excersize, or you can write your own.  Change par bounds and obs weights then write

In [None]:
[f for f in os.listdir(f_d) if f[-3:] in ["tpl","ins"]]

In [None]:
pyemu.helpers.parse_dir_for_io_files(f_d)

In [None]:
# your code here
tpl_files = [os.path.join(f_d,"freyberg.rch.tpl")]
in_files = ["freyberg.rch"]
ins_files = [os.path.join(f_d,"freyberg.travel.ins")]
out_files = ["freyberg.travel"]
new_pst = pyemu.Pst.from_io_files(tpl_files=tpl_files,in_files=in_files,
                                  ins_files=ins_files,out_files=out_files)

In [None]:
new_pst.write(os.path.join(f_d,"test2.pst"))

In [None]:
new_pst.observation_data

In [None]:
new_pst.add_parameters

# Matrices

pyEMU implements a labeled matrix class and overloads the standard operators to make linear alg easier.  Let's start with covariance matrices:

In [None]:
cov = pyemu.Cov.from_parameter_data(pst)
cov

In [None]:
cov.row_names[:5]

In [None]:
cov.col_names[:5]

In [None]:
cov.isdiagonal

the `Cov` has some nice build-in methods:

In [None]:
cov.inv

In [None]:
cov.s #singular values

In [None]:
cov.v #right singular vectors

The actual array of values in the `.x` attribute:

In [None]:
cov.x[0:5]

### Check your understandig: Why is the `x` attribute 1-D?

In [None]:
x = cov.as_2d
x[x==0] = np.NaN
c = plt.imshow(x)
plt.colorbar(c)

In [None]:
post_cov = pyemu.Cov.from_ascii(os.path.join(f_d,"freyberg_pp.post.cov"))
post_cov.isdiagonal

In [None]:
x = post_cov.as_2d
x[x==0] = np.NaN
c = plt.imshow(x)
plt.colorbar(c)

In [None]:
df = post_cov.to_dataframe()
rsum = df.sum(axis=1).sort_values()
rsum

### DIY: plot the singular spectrum of the posterior covariance matrix.  Then convert the posterior covariance matrix to correlation matrix, mask the diagonal and plot

In [None]:
#hint: Cov.to_pearson()

### Residual handling

The `Pst` class tries load a residuals file in the constructor.  If that file is found, you can access some pretty cool stuff (you can pass the name of a residual file to the `Pst` constructor...).  The `res` attribute is stored as a `pd.DataFrame`

In [None]:
pst.phi

In [None]:
pst.phi_components

In [None]:
pst.res.head()

### Discrepancy based weight adjustment

In a perfect (model and algorithm) world, we would acheive a final objective function that is equal to the number of (non-zero weighted) observations. But because of model error and simplifying assumptions in the algorithms we use for history matching, this is rarely the case.  More often, the final objective function is much larger than the number of observations.  This implies that we were not able to "fit" as well as we thought we could (were "thought" is incapsulated in the weights in the control file).  This really matters when we do posterior uncertainty analyses following a PEST run - we will see this again in the FOSM and data-worth notebooks. Note: dont make this adjustment until after you are through with history matching!!!

The simpliest way to try to rectify this situation is to adjust the weights in the control file so that the resulting contribution to the objective function from each observation (or optional observation group) is equal 1 (or the number of members of the group).  This is related to Morozov's discrepancy principal (google it!).  `pyEMU` has a built in routine to help with this: `Pst.adjust_weights_discrepancy()` - great name!

In [None]:
# load a copy of the contol file so we dont goof up later activities with the original
pst2 = pyemu.Pst(os.path.join(f_d,"freyberg_pp.pst"))
obs = pst2.observation_data
fig,axes = plt.subplots(2,1,figsize=(10,10))
pst2.observation_data.loc[pst2.nnz_obs_names,"weight"].plot(kind="bar",ax=axes[0])
pst2.res.loc[pst2.nnz_obs_names,:].apply(lambda x: (x.residual * obs.loc[x.name,"weight"])**2,axis=1).plot(kind="bar",ax=axes[1])
axes[0].set_title("original weights")
axes[1].set_title("original contribution to objective function")
axes[0].set_xticklabels([])
print("original phi:",pst2.phi)

So we see the objective function is much larger than the number of observations and the contribution to phi varies substantially across the observations...

Now for the weight adjustment:

In [None]:
pst2.adjust_weights_discrepancy()
obs = pst2.observation_data
fig,axes = plt.subplots(2,1,figsize=(10,10))
pst2.observation_data.loc[pst2.nnz_obs_names,"weight"].plot(kind="bar",ax=axes[0])
pst2.res.loc[pst2.nnz_obs_names,:].apply(lambda x: (x.residual * obs.loc[x.name,"weight"])**2,axis=1).plot(kind="bar",ax=axes[1])
axes[0].set_title("adjusted weights")
axes[1].set_title("adjusted contribution to objective function")
axes[0].set_xticklabels([])
print("adjusted phi:",pst2.phi)

Now we see the max contribution to phi from any observaton is 1.0.  the reason some of them are less than 1.0 is because we did not want to turn the weights up for the observations that are being matched well (so we keep the weights the same or decrease only, we dont want to increase the weight)

### DIY: plot a bar chart of residuals for non-zero weighted obs

You can use the adjusted weight instance (`pst2`) or the original `pst`

### The Jacobian matrix

A dervied pyemu.Matrix type...

In [None]:
jco = pyemu.Jco.from_binary(os.path.join(f_d,"freyberg_pp.jcb"))

In [None]:
df = jco.to_dataframe()
df.head()

### DIY: form the normal matrix (`XtQX`) with non-zero weight obs and plot (`X` is the jacobian is `Q` is the inverse of the observation noise covariance matrix)

In [None]:
# hint Cov.from_observation_data()
obscov = pyemu.Cov.from_observation_data(pst)
Q = obscov.inv
xtqx = jco.T * Q * jco
x = xtqx.x
plt.imshow(x,vmin=-1,vmax=1)

### now invert XtQX:

### Some sweet potting sugar:

In [None]:
pst.plot(kind="phi_pie")

In [None]:
pst.plot(kind='prior')

In [None]:
pst.plot(kind="1to1")

### DIY: Adjust the weights so that both non-zero obs groups contribute equally to the objective function (and plot!) - no model runs required...

In [None]:
# hint: pst.adjust_weights
print(pst.nnz_obs_groups)
obsgrp_dict = {"calhead":100,"calflux":100}
pst.adjust_weights(obsgrp_dict=obsgrp_dict)
pst.plot(kind="phi_pie")

In [None]:
pst.phi

### Geostats in pyemu

These are pure python so they arent super fast...

In [None]:
v_contribution = 1.0 # variance
v_range = 1000
exp_vario = pyemu.geostats.ExpVario(v_contribution,v_range)
exp_vario.plot()

now lets build a covariance matrix from x-y points.  We can generate these randomly or just use the pilot points template file:

In [None]:
df = pyemu.pp_utils.pp_tpl_to_dataframe(os.path.join(f_d,"hkpp.dat.tpl"))
df.head()

In [None]:
plt.imshow(pyemu.geostats.ExpVario(0.1,5000).covariance_matrix(df.x,df.y,df.name).x)

Here we will just use a 1-D sequence to get a cov matrix (think "time series")

In [None]:
times = np.arange(0,365,1)
y = np.ones_like(times)
names = ["t_"+str(t) for t in times]

In [None]:
v_contribution = 1.0 # variance
v_range = 5 # days
exp_vario = pyemu.geostats.ExpVario(v_contribution,v_range)
exp_vario.plot()

In [None]:
cov = exp_vario.covariance_matrix(times,y,names)
plt.imshow(cov.x)

### Ensembles

The pyemu ensemble class inherit from pandas DataFrame so all that nice stuff is included for free

In [None]:
pe = pyemu.ParameterEnsemble.from_gaussian_draw(pst=pst,cov=pyemu.Cov.from_parameter_data(pst),num_reals=1000)
pe.head()

Check your understanding: where did the first (mean vector) and second (covariance matrix) moments come from in that ensemble generation?  

In [None]:
pe.iloc[:,0].hist()

In [None]:
pe.iloc[:,0].apply(np.log10).hist()

So that was really easy...but what if we want to express spatial/temporal correlation in the prior?  that means we need to form mixed block-diagonal/diagonal cov matrix and then draw from it. In this case, we have spatially correlated pilot point parameters:

In [None]:
df = pyemu.pp_utils.pp_tpl_to_dataframe(os.path.join(f_d,"hkpp.dat.tpl"))
df.head()

Let's build a combined, block diagonal matrix:

In [None]:
ev = pyemu.geostats.ExpVario(1.0,1000)
gs = pyemu.geostats.GeoStruct(variograms=ev)
cov = pyemu.helpers.geostatistical_prior_builder(pst=pst,struct_dict={gs:df})
x = cov.x.copy()
x[x<1.0e-3] = np.NaN
plt.imshow(x)

This is exactly the same line as above except here the `cov` includes some off-diagonals for the pilot points

In [None]:
pe = pyemu.ParameterEnsemble.from_gaussian_draw(pst=pst,cov=cov,num_reals=10000)
pe.head()

Lets plot the values of the pilot points in space to see their correlation (or lack thereof)

In [None]:
df.index = df.parnme
df.loc[:,"parval1"] = pe.loc[0,df.parnme].values
fig = plt.figure(figsize=(10,10))
ax = plt.subplot(111,aspect="equal")
plt.scatter(df.x,df.y,c=df.parval1,s=500)

You can "kind of" see that correlation, but if we krige these values to the model grid, we can really see it...

In [None]:
df.loc[:,"parval1"] = pe.loc[0,df.parnme]
df.index = np.arange(df.shape[0])
arr = pyemu.geostats.fac2real(df,factors_file=os.path.join(f_d,"hkpp.dat.fac"),out_file=None)

In [None]:
plt.imshow(np.log10(arr))

### DIY: experiment with changing the variogram range and seeing how it changes the resulting parameter fields

FORESHADOWING: we can also form an empirical covariance matrix from this par ensemble!

In [None]:
emp_cov = pe.covariance_matrix()
x = emp_cov.x.copy()
x[x<1.0e-3] = np.NaN
plt.imshow(x)

### Spectral simulation

Because pyemu is pure python (and because the developers are lazy), it only implments spectral simulation for grid-scale field generation.  For regular grids without anisotropy and without conditioning data ("known" property values), it is identical to sequential gaussian sim

In [None]:
ev = pyemu.geostats.ExpVario(1.0,1)
gs = pyemu.geostats.GeoStruct(variograms=ev)
ss = pyemu.geostats.SpecSim2d(np.ones(100),np.ones(100),gs)
plt.imshow(ss.draw_arrays()[0])

In [None]:
ev = pyemu.geostats.ExpVario(1.0,5)
gs = pyemu.geostats.GeoStruct(variograms=ev)
ss = pyemu.geostats.SpecSim2d(np.ones(100),np.ones(100),gs)
plt.imshow(ss.draw_arrays()[0])

In [None]:
ev = pyemu.geostats.ExpVario(1.0,500)
gs = pyemu.geostats.GeoStruct(variograms=ev)
ss = pyemu.geostats.SpecSim2d(np.ones(100),np.ones(100),gs)
plt.imshow(ss.draw_arrays()[0])