# and now for something completely different...

### During these excersizes, we have seen a traditional parameter estimation then uncertainty analysis workflow.  We also saw for the pilot points notebooks, about 500 model runs to "calibrate" the model and then another 500 to 1000 to find a decent set (ensemble) of realizations that fit the data acceptably well.  

### But...even using pilot points as a parameterization device is a form of regularization: ideally, we would have an HK parameter in every model cell. But, that is too expensive in terms of model runs for calibration and uncertainty analysis (at least as we have learned it so far).

### However, there are some new techniques that free us from these computational constraints so that we can more efficiently use lots of parameters.  One approach to this is the iterative Ensemble Smoother form of the GLM.  It is implemented in pyemu and a C++ version is in the works.  Let's see how this technique works for the freyberg model - a special version of the freyberg model with an HK parameter in every cell.

In [None]:
%matplotlib inline
import os, shutil
import sys
sys.path.append("..")
import numpy as np
from IPython.display import Image
import pandas as pd
import matplotlib.pyplot as plt

import flopy as flopy
import pyemu

In [None]:
import freyberg_setup as fs
fs.setup_pest_gr()
working_dir = fs.WORKING_DIR_GR
pst_name = fs.PST_NAME_GR

In [None]:
m = flopy.modflow.Modflow.load(fs.MODEL_NAM,model_ws=working_dir,load_only=["upw"],check=False)
pst = pyemu.Pst(os.path.join(working_dir,pst_name))
obs = pst.observation_data
obs.loc[obs.obgnme=="calhead","weight"] = 0.75
par = pst.parameter_data
hk_par = par.loc[par.pargp=="hk"].copy()
hk_par.loc[:,"i"] = hk_par.parnme.apply(lambda x: int(x.split('_')[1][1:]))
hk_par.loc[:,"j"] = hk_par.parnme.apply(lambda x: int(x.split('_')[2][1:]))
hk_par.loc[:,"x"] = m.sr.xcentergrid[hk_par.i,hk_par.j]
hk_par.loc[:,"y"] = m.sr.ycentergrid[hk_par.i,hk_par.j]
hk_par.head()

In [None]:
"number of parameters: {0} : WTF!".format(pst.npar)

In [None]:
os.chdir(working_dir)

### First we create an ``EnsembleSmoother`` instance:

In [None]:
pst.filename = pst_name
ies = pyemu.EnsembleSmoother(pst=pst,num_slaves=15,slave_dir=".",port=4005)

### ``EnsembleSmoother.initialize()`` does lots of things for you:
### - make draws from parcov for the initial ``ParameterEnsemble``
### - make draws from obscov for the "target" ``ObservationEnsemble``
### - runs the initial ``ParameterEnsemble`` forward to get the initial ``ObservationEnsemble``

In [None]:
ies.initialize(num_reals=100)

# So what we just did was essentially an unconstrained Monte Carlo with 100 realizations - that's nothing new...

### Let's visualize the first few ``hk`` fields - drawn from prior (uncalibrated)

In [None]:
for real in ies.parensemble.index[:4]:
    arr = np.zeros((m.nrow,m.ncol))
    arr[hk_par.i,hk_par.j] = ies.parensemble.loc[real,hk_par.parnme]
    m.upw.hk[0] = arr
    m.upw.hk[0].plot(alpha=0.5,colorbar=True)
    plt.show()

## Those don't look very "geologic" - why? Answer: the Prior! (its always about the Prior)

## Let's visualize the distributions (histograms) for each of the forecasts

### These distributions come from running the initial (uncalibrated) parameter ensemble

In [None]:
init_obs = ies.obsensemble.copy()
for forecast in pst.forecast_names:
    ax = ies.obsensemble.loc[:,forecast].hist(bins=10)
    ax.set_title(forecast)
    ylim = ax.get_ylim()
    v = ies.pst.observation_data.loc[forecast,"obsval"]
    ax.plot([v,v],ylim,"k--")
    plt.show()

## The initial (uncalibrated) phi distribution...not so good...

In [None]:
ies.current_phi.hist(bins=10)
plt.show()

### Since we only have a few observations we are trying to match, we can look at there distributions also.  The "blue" histogram is the results of the initial parameter ensemble evaluation.  The "red" is the "target" distribution: each observation has a unique value for each realization: the observed value + a realization of measurement noise

In [None]:
for oname in pst.nnz_obs_names:
    ax = ies.obsensemble_0.loc[:,oname].hist(bins=10,alpha=0.5,color='r')
    ies.obsensemble.loc[:,oname].hist(bins=10,ax=ax,alpha=0.5,color='b')
    ax.set_title(oname)
    plt.show()

### ``EnsembleSmoother.update()`` propagates the ensemble forward, updating the ``ParameterEnsemble`` through the GLM algorithm, then runs the new ``ParameterEnsemble``.  In other words, we are going to use an approximate (low fidelity) Jacobian to update the entire parameter ensemble, the we are going to run another Monte Carlo

In [None]:
ies.update()

### Let's see how phi is doing...

In [None]:
ies.current_phi.hist(bins=10)
plt.show()
ies.current_phi.mean()

### Notice how much the ``phi`` distribution has decreased compared to the initialized ``EnsembleSmoother``: Nice!

### Now let's run through a few more updates...and plot the phi distribution each time

In [None]:
for i in range(2):
    ies.update()
    phi = ies.current_phi
    ax = plt.subplot(111)
    phi.hist(bins=10,ax=ax)
    ax.set_title("iteration:{0}, total model runs:{1}, avg phi:{2}".format(ies.iter_num,ies.total_runs,phi.mean()))
    plt.show()

### Holy Crap!  phi has gotten really good after only a 400ish runs of the model - remember, there over 800 parameters. Let's see how the forecasts are doing...

In [None]:
for forecast in pst.forecast_names:
    ax = ies.obsensemble.loc[:,forecast].hist(bins=10,color='b',alpha=0.5)
    init_obs.loc[:,forecast].hist(bins=10,ax=ax,color='g',alpha=0.5)
    ax.set_title(forecast)
    ylim = ax.get_ylim()
    v = ies.pst.observation_data.loc[forecast,"obsval"]
    ax.plot([v,v],ylim,"k--")
    plt.show()

In [None]:
ies.total_runs

In [None]:
df_sum = pd.read_csv(pst_name+".iobj.csv")
df_sum

In [None]:
ax = plt.subplot(111)
real_cols = [c for c in df_sum.columns if c.startswith("0")]
[ax.plot(df_sum.total_runs,df_sum.loc[:,rc],'0.5',lw=0.25) for rc in real_cols]
ax.plot(df_sum.total_runs,df_sum.loc[:,"mean"],"k",lw=3)
plt.show()

## Awesome!  We are crushing phi...but how to the parameter fields looks?

In [None]:
for real in ies.parensemble.index[:4]:
    arr = np.zeros((m.nrow,m.ncol))
    arr[hk_par.i,hk_par.j] = ies.parensemble.loc[real,hk_par.parnme]
    m.upw.hk[0] = arr
    m.upw.hk[0].plot(alpha=0.5,colorbar=True)
    plt.show()

### Uh oh. The fields look like noise...how can we fix this? Solution: a full covariance matrix that expresses spatial correlation

# iES with a full covariance matrix

## Now let's rerun the iES process but with a full, geostatistical prior covariance matrix

In [None]:
v = pyemu.geostats.ExpVario(contribution=1.0,a=2500.0)
gs = pyemu.geostats.GeoStruct(variograms=v)

In [None]:
cov = pyemu.helpers.geostatistical_prior_builder(pst=pst,struct_dict={gs:[hk_par]},sigma_range=6)

### Let's see how this covariance looks compare to the one we used previously

In [None]:
plt.imshow(cov.x)

In [None]:
plt.imshow(ies.parcov.as_2d)

### Now we create a new ``iES`` and update 3 times...

In [None]:
ies = pyemu.EnsembleSmoother(pst=pst,num_slaves=15,slave_dir=".",parcov=cov,port=4005)
ies.initialize(num_reals=100)

### Let's visual the new parameter fields:

In [None]:
for real in ies.parensemble.index[:4]:
    arr = np.zeros((m.nrow,m.ncol))
    arr[hk_par.i,hk_par.j] = ies.parensemble.loc[real,hk_par.parnme]
    m.upw.hk[0] = arr
    m.upw.hk[0].plot(alpha=0.5,colorbar=True)
    plt.show()

### Those fields look much more "geologic" (what ever that means)...let's see how well the smoother does with these fields

In [None]:
for _ in range(3):
    ies.update()
    phi = ies.current_phi
    ax = plt.subplot(111)
    phi.hist(bins=10,ax=ax)
    ax.set_title("iteration:{0}, total model runs:{1}".format(ies.iter_num,ies.total_runs))
    plt.show()

### phi looks really good still...let's see how the final (calibrated) parameter fields look....

In [None]:
for real in ies.parensemble.index[:4]:
    arr = np.zeros((m.nrow,m.ncol))
    arr[hk_par.i,hk_par.j] = ies.parensemble.loc[real,hk_par.parnme]
    m.upw.hk[0] = arr
    m.upw.hk[0].plot(alpha=0.5,colorbar=True)
    plt.show()

In [None]:
for forecast in pst.forecast_names:
    ax = ies.obsensemble.loc[:,forecast].hist(bins=10,color='b',alpha=0.5,label="posterior")
    init_obs.loc[:,forecast].hist(bins=10,color="0.5",alpha=0.5,label="prior")
    ax.set_title(forecast)
    ylim = ax.get_ylim()
    v = ies.pst.observation_data.loc[forecast,"obsval"]
    ax.plot([v,v],ylim,"k--")
    plt.show()

### We see that the final (posterior) ensemble is bracketing the "truth" for all forecasts...yeah! 