# Setting up a PEST interface from MODFLOW6 using the `PstFrom` class

The `PstFrom` class is a generalization of the prototype `PstFromFlopy` class. The generalization in `PstFrom` means users need to explicitly define what files are to be parameterized and what files contain model outputs to treat as observations.  Two primary types of files are supported:  arrays and lists.  Array files contain a data type (usually floating points) while list files will have a few columns that contain index information and then columns of floating point values.  

In [9]:
import os
import shutil
import numpy as np
import pandas as pd
import pyemu
import flopy

An existing MODFLOW6 model is in the directory `freyberg_mf6`.  Lets check it out:

In [4]:
org_model_ws = os.path.join('freyberg_mf6')
os.listdir(org_model_ws)

['freyberg6.wel_stress_period_data_20.txt',
 'freyberg6.rch_recharge_2.txt',
 'freyberg6.sfr_perioddata_13.txt',
 'freyberg6.sfr_perioddata_3.txt',
 'freyberg6.rch_recharge_14.txt',
 'freyberg6_freyberg.hds',
 'freyberg6.dis_idomain_layer1.txt',
 'freyberg6.rch_recharge_15.txt',
 'freyberg6.sfr_perioddata_2.txt',
 'freyberg6.sfr_perioddata_12.txt',
 'freyberg6.rch_recharge_3.txt',
 'freyberg6.wel_stress_period_data_21.txt',
 'freyberg6.wel_stress_period_data_23.txt',
 'freyberg6.rch_recharge_1.txt',
 'freyberg6.sfr_perioddata_10.txt',
 'freyberg6.rch_recharge_17.txt',
 'freyberg6.dis_idomain_layer3.txt',
 'freyberg6.ghb_stress_period_data_1.txt',
 'freyberg6.dis_idomain_layer2.txt',
 'freyberg6.sfr_perioddata_1.txt',
 'freyberg6.rch_recharge_16.txt',
 'freyberg6.sfr_perioddata_11.txt',
 'freyberg6.sfr_connectiondata.txt',
 'freyberg6.wel_stress_period_data_22.txt',
 'freyberg6.sto_sy_layer3.txt',
 'freyberg6.sfr_perioddata_15.txt',
 'freyberg6.rch_recharge_4.txt',
 'freyberg6.dis_top.t

You can see that all the input array and list data for this model have been written "externally" - this is key to using the `PstFrom` class. 

Now let's copy those files to a temporary location just to make sure we don't goof up those original files:


In [8]:
tmp_model_ws = "temp_pst_from"
if os.path.exists(tmp_model_ws):
    shutil.rmtree(tmp_model_ws)
shutil.copytree(org_model_ws,tmp_model_ws)
os.listdir(tmp_model_ws)

['freyberg6.wel_stress_period_data_20.txt',
 'freyberg6.rch_recharge_2.txt',
 'freyberg6.sfr_perioddata_13.txt',
 'freyberg6.sfr_perioddata_3.txt',
 'freyberg6.rch_recharge_14.txt',
 'freyberg6_freyberg.hds',
 'freyberg6.dis_idomain_layer1.txt',
 'freyberg6.rch_recharge_15.txt',
 'freyberg6.sfr_perioddata_2.txt',
 'freyberg6.sfr_perioddata_12.txt',
 'freyberg6.rch_recharge_3.txt',
 'freyberg6.wel_stress_period_data_21.txt',
 'freyberg6.wel_stress_period_data_23.txt',
 'freyberg6.rch_recharge_1.txt',
 'freyberg6.sfr_perioddata_10.txt',
 'freyberg6.rch_recharge_17.txt',
 'freyberg6.dis_idomain_layer3.txt',
 'freyberg6.ghb_stress_period_data_1.txt',
 'freyberg6.dis_idomain_layer2.txt',
 'freyberg6.sfr_perioddata_1.txt',
 'freyberg6.rch_recharge_16.txt',
 'freyberg6.sfr_perioddata_11.txt',
 'freyberg6.sfr_connectiondata.txt',
 'freyberg6.wel_stress_period_data_22.txt',
 'freyberg6.sto_sy_layer3.txt',
 'freyberg6.sfr_perioddata_15.txt',
 'freyberg6.rch_recharge_4.txt',
 'freyberg6.dis_top.t

Now we need just a tiny bit of info about the spatial discretization of the model - this is needed to work out separation distances between parameters for build a geostatistical prior covariance matrix later...

In [10]:
sim = flopy.mf6.MFSimulation.load(sim_ws=tmp_model_ws)
m = sim.get_model("freyberg6")


loading simulation...
  loading simulation name file...
  loading tdis package...
  loading model gwf6...
    loading package dis...
    loading package ic...
    loading package npf...
    loading package sto...
    loading package oc...
    loading package wel...
    loading package rch...
    loading package ghb...
    loading package sfr...
    loading package obs...
  loading ims package freyberg6...


In [12]:
sr = pyemu.helpers.SpatialReference.from_namfile(
        os.path.join(tmp_model_ws, "freyberg6.nam"),
        delr=m.dis.delr.array, delc=m.dis.delc.array)
sr

   could not remove start_datetime


xul:0; yul:10000; rotation:0; proj4_str:None; units:meters; lenuni:2; length_multiplier:1.0

Now we can instantiate a `PstFrom` class instance

In [14]:
template_ws = "freyberg6_template"
pf = pyemu.prototypes.PstFrom(original_d=tmp_model_ws, new_d=template_ws,
                 remove_existing=True,
                 longnames=True, spatial_reference=sr,
                 zero_based=False,start_datetime="1-1-2018")

2020-05-08 10:18:07.233890 starting: opening PstFrom.log for logging
2020-05-08 10:18:07.234222 starting PstFrom process
2020-05-08 10:18:07.238573 starting: setting up dirs
2020-05-08 10:18:07.238797 starting: copying original_d 'temp_pst_from' to new_d 'freyberg6_template'
2020-05-08 10:18:07.329659 finished: copying original_d 'temp_pst_from' to new_d 'freyberg6_template' took: 0:00:00.090862
2020-05-08 10:18:07.329996 finished: setting up dirs took: 0:00:00.091423


So now that we have a `PstFrom` instance, we need to add some PEST interface "observations" and "parameters".  Let's start with observations using MODFLOW6 head.  These are stored in `heads.csv`:

In [16]:
df = pd.read_csv(os.path.join(tmp_model_ws,"heads.csv"),index_col=0)
df

Unnamed: 0_level_0,TRGW_2_2_15,TRGW_2_2_9,TRGW_2_3_8,TRGW_2_9_1,TRGW_2_13_10,TRGW_2_15_16,TRGW_2_21_10,TRGW_2_22_15,TRGW_2_24_4,TRGW_2_26_6,...,TRGW_0_9_1,TRGW_0_13_10,TRGW_0_15_16,TRGW_0_21_10,TRGW_0_22_15,TRGW_0_24_4,TRGW_0_26_6,TRGW_0_29_15,TRGW_0_33_7,TRGW_0_34_10
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,34.339372,34.581653,34.611271,34.872236,34.257588,34.136404,34.144487,34.027672,34.310869,34.171624,...,34.878147,34.263202,34.141617,34.150089,33.99238,34.316582,34.177245,33.909885,33.985756,33.890226
32.0,34.422185,34.680237,34.711364,34.97244,34.38069,34.245123,34.271719,34.137529,34.43667,34.312654,...,34.978308,34.385574,34.249921,34.276462,34.086955,34.441557,34.317,33.992042,34.101898,34.004997
61.0,34.495577,34.777642,34.811214,35.082668,34.481042,34.32975,34.375621,34.218584,34.55394,34.426574,...,35.088566,34.486939,34.335503,34.381386,34.157456,34.559386,34.432021,34.057642,34.195197,34.085663
92.0,34.540966,34.84769,34.884088,35.176852,34.535381,34.371788,34.431026,34.25709,34.630529,34.489744,...,35.183003,34.542429,34.378474,34.438019,34.191072,34.636992,34.496503,34.090249,34.243224,34.120639
122.0,34.537692,34.858948,34.897412,35.213451,34.519537,34.35289,34.413872,34.236319,34.633038,34.475282,...,35.219955,34.527373,34.360122,34.421738,34.173384,34.640429,34.48308,34.076125,34.225432,34.094988
153.0,34.485665,34.80541,34.84436,35.178093,34.436538,34.277193,34.326062,34.159879,34.555096,34.382574,...,35.18497,34.444549,34.284359,34.334209,34.107416,34.563074,34.390877,34.017332,34.144442,34.014826
183.0,34.399847,34.701522,34.739064,35.076753,34.311512,34.167961,34.194918,34.051615,34.420773,34.240694,...,35.083914,34.319099,34.174579,34.202683,34.013693,34.428786,34.248822,33.931927,34.025465,33.904396
214.0,34.299522,34.56998,34.604429,34.930548,34.172792,34.050218,34.050299,33.93636,34.259318,34.081111,...,34.937729,34.179416,34.055879,34.057091,33.913596,34.266768,34.088387,33.839822,33.895627,33.789732
245.0,34.213977,34.449149,34.47966,34.781462,34.059659,33.957166,33.933007,33.84625,34.116792,33.948424,...,34.78842,34.065183,33.961867,33.93865,33.835118,34.123339,33.954564,33.767182,33.791731,33.702517
275.0,34.165359,34.371695,34.398581,34.671283,33.99865,33.910022,33.870221,33.801381,34.028927,33.873414,...,34.677877,34.003361,33.914137,33.874983,33.796057,34.034601,33.878588,33.731237,33.737735,33.661381


The main entry point for adding observations is (surprise) `PstFrom.add_observations()`.  This method works on the list-type observation output file.  We need to tell it what column is the index column (can be string if there is a header or int if no header) and then what columns contain quantities we want to monitor (e.g. "observe") in the control file - in this case we want to monitor all columns except the index column:

In [17]:
hds_df = pf.add_observations("heads.csv",insfile="heads.csv.ins",index_cols="time",
                    use_cols=list(df.columns.values),prefix="hds")
hds_df

2020-05-08 10:23:42.699357 starting: adding observations from tabular output file
2020-05-08 10:23:42.701464 starting: reading list freyberg6_template/heads.csv
2020-05-08 10:23:42.721172 finished: reading list freyberg6_template/heads.csv took: 0:00:00.019708
2020-05-08 10:23:42.721862 starting: building insfile for tabular output file heads.csv
2020-05-08 10:23:42.788303 finished: building insfile for tabular output file heads.csv took: 0:00:00.066441
2020-05-08 10:23:42.788447 starting: adding observation from instruction file 'freyberg6_template/heads.csv.ins'
2020-05-08 10:23:42.812131 finished: adding observation from instruction file 'freyberg6_template/heads.csv.ins' took: 0:00:00.023684
2020-05-08 10:23:42.813378 finished: adding observations from tabular output file took: 0:00:00.114021


Unnamed: 0,obsnme,obsval,weight,obgnme
hds_use_col:trgw_0_13_10_time:1.0,hds_use_col:trgw_0_13_10_time:1.0,34.263202,1.0,obgnme
hds_use_col:trgw_0_13_10_time:122.0,hds_use_col:trgw_0_13_10_time:122.0,34.527373,1.0,obgnme
hds_use_col:trgw_0_13_10_time:153.0,hds_use_col:trgw_0_13_10_time:153.0,34.444549,1.0,obgnme
hds_use_col:trgw_0_13_10_time:183.0,hds_use_col:trgw_0_13_10_time:183.0,34.319099,1.0,obgnme
hds_use_col:trgw_0_13_10_time:214.0,hds_use_col:trgw_0_13_10_time:214.0,34.179416,1.0,obgnme
...,...,...,...,...
hds_use_col:trgw_2_9_1_time:640.0,hds_use_col:trgw_2_9_1_time:640.0,34.623235,1.0,obgnme
hds_use_col:trgw_2_9_1_time:671.0,hds_use_col:trgw_2_9_1_time:671.0,34.617522,1.0,obgnme
hds_use_col:trgw_2_9_1_time:701.0,hds_use_col:trgw_2_9_1_time:701.0,34.682258,1.0,obgnme
hds_use_col:trgw_2_9_1_time:732.0,hds_use_col:trgw_2_9_1_time:732.0,34.802225,1.0,obgnme


We can see that it returned a dataframe with lots of useful info: the observation names that were formed (`obsnme`), the values that were read from `heads.csv` (`obsval`) and also some generic weights and group names.  At this point, no control file has been created, we have simply prepared to add this observations to the control file later.  

In [21]:
[f for f in os.listdir(template_ws) if f.endswith(".ins")]

['heads.csv.ins']

Nice!  We also have a PEST-style instruction file for those obs.

Now lets do the same for SFR observations:

In [23]:
df = pd.read_csv(os.path.join(tmp_model_ws, "sfr.csv"), index_col=0)
sfr_df = pf.add_observations("sfr.csv", insfile="sfr.csv.ins", index_cols="time", use_cols=list(df.columns.values))
sfr_df

2020-05-08 10:27:12.566446 starting: adding observations from tabular output file
2020-05-08 10:27:12.571128 starting: reading list freyberg6_template/sfr.csv
2020-05-08 10:27:12.575157 finished: reading list freyberg6_template/sfr.csv took: 0:00:00.004029
2020-05-08 10:27:12.575521 starting: building insfile for tabular output file sfr.csv
2020-05-08 10:27:12.598212 finished: building insfile for tabular output file sfr.csv took: 0:00:00.022691
2020-05-08 10:27:12.598365 starting: adding observation from instruction file 'freyberg6_template/sfr.csv.ins'
2020-05-08 10:27:12.612115 finished: adding observation from instruction file 'freyberg6_template/sfr.csv.ins' took: 0:00:00.013750
2020-05-08 10:27:12.612800 finished: adding observations from tabular output file took: 0:00:00.046354


Sweet as!  Now that we have some observations, let's add parameters!

Since we are all sophisticated and recognize the importance of expressing spatial and temporal uncertainty in the model inputs (and the corresponding spatial correlation in those uncertain inputs), let's use geostatistics to express uncertainty.  To do that we need to define "geostatistical structures":

In [None]:
gr_gs = pyemu.geostats.GeoStruct(variograms=v)
wel_gs = pyemu.geostats.GeoStruct(variograms=v,name="wel")
rch_temporal_gs = pyemu.geostats.GeoStruct(variograms=pyemu.geostats.ExpVario(contribution=1.0,a=60))