* The notebook is developed to code the methodology and extract relevant files that are suitable as input to PEST.
- pestpp/pestpp/benchmarks/mf6_freyberg - This is used as the standard benchmark folder from which the files are referred.
- Currently, the freyberg6_run_glm.pst which solves the Levenberg - Marquardt algorithm is considered.

**ind_pargrp_1.ipynb**
- Goal: To test the parameters influence individually with unique DERINC values, considering 0.0 m temperature at the surface as Top B.C. First for only 1 simulation.
- Description: Assigning individual parameter group names for the variables and simulate with individual  DERINCs. Here assigned a DERINC = 0.1
- Comments: Decide on a good DERINC value for the parameters (0.1 for all)

- Refer the file - _C:\Users\radhakrishna\OneDrive\Documents\Hannover_PhD\Work\Senstivity_analysis\PESTCalibration_model.xlsx\Simulation_strategy_

**The following steps have been considered for preparing the individual components of the control file:**

1. Parameter groups - external
2. Parameter names - external
3. Observation data
4. Template files (Edited manually)
5. Instruction files


In [6]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

  and should_run_async(code)


In [7]:
%matplotlib inline
import sys,os
import colors
import numpy as np
import matplotlib.cm
from matplotlib import pyplot as plt
import matplotlib.gridspec as gridspec
import h5py
import pandas as pd
from datetime import datetime
import matplotlib.image as mpimg

In [8]:
import shutil
import numpy as np
import pandas as pd
import pyemu
import flopy

### 1. Parameter groups external
- Extracting the data from the example

In [10]:
pargrp_data_example = pd.read_csv('../Freyberg_example/freyberg6_run_glm.pargrp_data.csv') 
pargrp_data_example

Unnamed: 0,pargpnme,inctyp,derinc,derinclb,forcen,derincmul,dermthd,splitthresh,splitreldiff,splitaction
0,sto_ss_0,relative,0.01,0.0,switch,2.0,parabolic,1e-05,0.5,smaller
1,npf_k33_1,relative,0.01,0.0,switch,2.0,parabolic,1e-05,0.5,smaller
2,npf_k_1,relative,0.01,0.0,switch,2.0,parabolic,1e-05,0.5,smaller
3,sto_ss_1,relative,0.01,0.0,switch,2.0,parabolic,1e-05,0.5,smaller
4,npf_k_0,relative,0.01,0.0,switch,2.0,parabolic,1e-05,0.5,smaller
5,npf_k33_0,relative,0.01,0.0,switch,2.0,parabolic,1e-05,0.5,smaller
6,npf_k_2,relative,0.01,0.0,switch,2.0,parabolic,1e-05,0.5,smaller
7,sto_sy_0,relative,0.01,0.0,switch,2.0,parabolic,1e-05,0.5,smaller
8,sto_ss_2,relative,0.01,0.0,switch,2.0,parabolic,1e-05,0.5,smaller
9,npf_k33_2,relative,0.01,0.0,switch,2.0,parabolic,1e-05,0.5,smaller


In [11]:
pargrp_data_example.dtypes


pargpnme         object
inctyp           object
derinc          float64
derinclb        float64
forcen           object
derincmul       float64
dermthd          object
splitthresh     float64
splitreldiff    float64
splitaction      object
dtype: object

#### Defining the variables in Parameter Group variables: 
- PARGPNME : Previously we had two parameter groups - 'thermal' , 'hydraulic'. To provide DERINC values for each parameter, we will be having individual parameter groups. 'n_m_gp', 'tcs_m_gp', 'af_m_gp', 'af_p_gp'   
- INCTYP : 'relative' - The increment used for forward-difference calculation of derivatives with respect to any parameter belonging to the group is calculated as the fraction of the current value of that parameter; that fraction is provided as the real variable DERINC. Ex: If current value of parameter = 10 & DERINC =  0.01. Then the next parameter value = 0.01 * 10 = 0.1.
- DERINC : '0.1' {n_m_gp}, '0.1' {tcs_m_gp}, '0.1' {af_m_gp}, '0.1' {af_p_gp} - The fraction of increment of the current value of parameter. [Consider the range of variation of the parameters, upper and lower bounds]. Currently, I'll be assigning the same DERINC value for all parameters and then later on switch to assigning individual increments.
- DERINCLB : '0.0'
- FORCEN : 'switch'. In the first iteration, forward difference method is employed. From the second iteration, it switches to central difference method for the remainder of the inversion process on the iteration after which the relative objective function reduction between successive iterations is less than PHIREDSWH. Note we need to define PHIREDSWH (Where is it defined in this format?)
- DERINCMUL : '1.0' - If three-point derivatives calculation is employed, the value of DERINC is multiplied by DERINCMUL.
- DERMTHD : 'parabolic' - This is preferred as it provides greater accuracy.
- [SPLITTHRESH] [SPLITRELDIFF] [SPLITACTION] - For the first analysis, we ignore the three variables.

In [12]:
pargrp = pargrp_data_example.copy()
# Dropping all rows to replace with the two newly defined rows
pargrp.drop(pargrp.index, inplace=True)
# Dropping the columns - splitthresh, splitreldiff, splitaction
pargrp.drop(columns=['splitthresh', 'splitreldiff', 'splitaction'], inplace=True)

# Adding row information:
pargrp.loc[0] = ['n_m_gp','relative',0.1,0.0,'switch',1.0,'parabolic']
pargrp.loc[1] = ['tcs_m_gp','relative',0.1,0.0,'switch',1.0,'parabolic']
pargrp.loc[2] = ['af_m_gp','relative',0.1,0.0,'switch',1.0,'parabolic']
pargrp.loc[3] = ['af_p_gp','relative',0.1,0.0,'switch',1.0,'parabolic']


pargrp

Unnamed: 0,pargpnme,inctyp,derinc,derinclb,forcen,derincmul,dermthd
0,n_m_gp,relative,0.1,0.0,switch,1.0,parabolic
1,tcs_m_gp,relative,0.1,0.0,switch,1.0,parabolic
2,af_m_gp,relative,0.1,0.0,switch,1.0,parabolic
3,af_p_gp,relative,0.1,0.0,switch,1.0,parabolic


In [13]:
pargrp.dtypes

pargpnme      object
inctyp        object
derinc       float64
derinclb     float64
forcen        object
derincmul    float64
dermthd       object
dtype: object

In [14]:
# Exporting the parameter group csv file
pargrp.to_csv('rk_model_glm.pargrp_data.csv', index=False)

### 2. Parameter names external
There are four parameters that we are considering for the first analysis: n_m, tcs_m, af_m, af_p

In [15]:
par_data_example = pd.read_csv('../Freyberg_example/freyberg6_run_glm.par_data.csv') 
par_data_example

Unnamed: 0,parnme,partrans,parchglim,parval1,parlbnd,parubnd,pargp,scale,offset,dercom,partied
0,npf_k33_0_000_000,log,factor,0.300000,0.030000,3.000000,npf_k33_0,1.0,0.0,1,
1,npf_k33_0_000_001,tied,factor,0.300000,0.030000,3.000000,npf_k33_0,1.0,0.0,1,npf_k33_0_000_000
2,npf_k33_0_000_002,tied,factor,0.300000,0.030000,3.000000,npf_k33_0,1.0,0.0,1,npf_k33_0_000_000
3,npf_k33_0_000_003,tied,factor,0.300000,0.030000,3.000000,npf_k33_0,1.0,0.0,1,npf_k33_0_000_000
4,npf_k33_0_000_004,tied,factor,0.300000,0.030000,3.000000,npf_k33_0,1.0,0.0,1,npf_k33_0_000_000
...,...,...,...,...,...,...,...,...,...,...,...
8170,welflx_2_9_16_5,tied,factor,202.364469,56.212353,393.486468,welflux_5,-1.0,0.0,1,welflx_2_11_13_5
8171,welflx_2_9_16_6,tied,factor,243.645615,67.679338,473.755363,welflux_6,-1.0,0.0,1,welflx_2_11_13_6
8172,welflx_2_9_16_7,tied,factor,262.442880,72.900800,510.305600,welflux_7,-1.0,0.0,1,welflx_2_11_13_7
8173,welflx_2_9_16_8,tied,factor,259.844454,72.179015,505.253105,welflux_8,-1.0,0.0,1,welflx_2_11_13_8


The following details need to be defined:
- PARNME: 'n_m', 'tcs_m', 'af_m', 'af_p' - It is the parameter name
- PARTRANS: 'none' {'log' - Can we considered later if the inversion process does not occur. Log-transformations helps in ensuring that the parameter changes and model output changes are more linear.}
- PARCHGLIM: 'factor' - Alteration to a parameter's value is factor-limited. { PARCHGLIM must be provided with a value of “relative” or “factor”. The former designates that alterations to a parameter’s value are factor-limited whereas the latter designates that alterations to its value are relative-limited. }
- PARVALI: 1.2, 1, 0.05, 0.005 - These are the starting values for the parameters
- PARLBND: 1.05, 0.8, 0.02, 0.002 - Lower bounds for the parameters.
- PARUBND: 3, 2.5, 0.1, 0.01 - Upper bounds for the parameters.
- PARGRP: 'n_m_gp','tcs_m_gp','af_m_gp', 'af_p_gp' - It is the parameter group names associated with the parameters. 
- SCALE: 1.0, 1.0, 1.0, 1.0 - No scale or offset is provided
- OFFSET: 0.0, 0.0, 0.0, 0.0 - No scale or offset is provided
- DERCOM: 1, 1, 1, 1 - Only model command exists. Hence we give 1 which represents 'ats'
- Partied: This column will be dropped since we have no tied elements

In [16]:
par_data = par_data_example.copy()
# Dropping all rows to replace with the two newly defined rows
par_data.drop(par_data.index, inplace=True)
# Dropping the columns - splitthresh, splitreldiff, splitaction
par_data.drop(columns=['partied'], inplace=True)

# Adding row information:
par_data.loc[0] = ['n_m','none','factor', 1.2, 1.05, 3, 'n_m_gp', 1.0, 0.0, 1]
par_data.loc[1] = ['tcs_m','none','factor', 1.0, 0.8, 2.5, 'tcs_m_gp', 1.0, 0.0, 1]
par_data.loc[2] = ['af_m','none','factor', 0.05, 0.02, 0.1, 'af_m_gp', 1.0, 0.0, 1]
par_data.loc[3] = ['af_p','none','factor', 0.005, 0.002, 0.01, 'af_p_gp', 1.0, 0.0, 1]


par_data

Unnamed: 0,parnme,partrans,parchglim,parval1,parlbnd,parubnd,pargp,scale,offset,dercom
0,n_m,none,factor,1.2,1.05,3.0,n_m_gp,1.0,0.0,1
1,tcs_m,none,factor,1.0,0.8,2.5,tcs_m_gp,1.0,0.0,1
2,af_m,none,factor,0.05,0.02,0.1,af_m_gp,1.0,0.0,1
3,af_p,none,factor,0.005,0.002,0.01,af_p_gp,1.0,0.0,1


In [17]:
# Exporting the parameter group csv file
par_data.to_csv('rk_model_glm.par_data.csv', index=False)

### 3. Observation data

In [9]:
filename_measurements = 'AWS_Yakou_ITP_Data_2015-20.xlsx'

### Processing data

In [5]:
df_AWS = pd.read_excel(f'{filename_measurements}',index_col=0,parse_dates=True)

  and should_run_async(code)


In [6]:
# Dropping an unnecessary column:
df_AWS.drop(['Unnamed: 34'], axis=1,inplace=True)

  and should_run_async(code)


In [7]:
df_AWS_details = pd.read_excel('Available_data.xlsx', sheet_name='AWS_Yakou_ITP_Data_2019_extract',index_col=0,parse_dates=True)  
df_AWS_details.head()

  and should_run_async(code)


Unnamed: 0_level_0,Parameter,Product Name & Direction,Variable name,Depth/Height [m],Time steps [mins],Time interval,Units,Comments
Sl.No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,Wind speed at 10 m (m/s),"010C/020C; 10 m, north",WS_10m,10.0,10,01.01.2019 00:00 - 31.12.2019 23:50,m/s,
2,Wind direction at 10 m (°),"010C/020C; 10 m, north",WD_10m,10.0,10,01.01.2019 00:00 - 31.12.2019 23:50,°,Why is the data all red?
3,Air temperature at 5 m (°C),"HMP45C; 5 m, north",Ta_5m,5.0,10,01.01.2019 00:00 - 31.12.2019 23:50,°C,
4,Relative humidity at 5 m (%),"HMP45C; 5 m, north",RH_5m,5.0,10,01.01.2019 00:00 - 31.12.2019 23:50,%,Is it relative? Says RH - Hence should be relative humidity
5,Precipitation at 10 m (mm),rain gauge (TE525M; 10 m),Rain,10.0,10,01.01.2019 00:00 - 31.12.2019 23:50,mm,Precipitation measurement at 10 m? Seems odd


In [8]:
### Renaming the columns - 
data_AWS = df_AWS.copy()
data_AWS.columns = df_AWS_details['Parameter'].values
data_AWS.columns

  and should_run_async(code)


Index(['Wind speed at 10 m (m/s)', 'Wind direction at 10 m (°)',
       'Air temperature at 5 m (°C)', 'Relative humidity at 5 m (%)',
       'Precipitation at 10 m (mm)', 'Air pressure (hPa)',
       'Infrared temperature (°C)', 'Infrared temperature (°C)', 'PAR_down',
       'PAR_up', 'Incoming shortwave radiation (W/m^2)',
       'Outgoing shortwave radiation (W/m^2)',
       'Incoming longwave radiation (W/m^2)',
       'Outgoing longwave radiation (W/m^2)', 'Net radiation (W/m^2)',
       'Soil heat flux at - 0.06 m (W/m^2) - 1',
       'Soil heat flux at - 0.06 m (W/m^2) - 2',
       'Soil heat flux at - 0.06 m (W/m^2) - 3',
       'Soil moisture at - 0.04 m (%)', 'Soil moisture at - 0.1 m (%)',
       'Soil moisture at - 0.2 m (%)', 'Soil moisture at - 0.4 m (%)',
       'Soil moisture at - 0.8 m (%)', 'Soil moisture at - 1.2 m (%)',
       'Soil moisture at - 1.6 m (%)', 'Soil temperature at - 0.00 m (°C)',
       'Soil temperature at - 0.04 m (°C)', 'Soil temperature at - 0.1 

### The data has -6999 values for missing data
- Therefore replacing -6999 with NaN vaues

In [9]:
data_AWS_syn = data_AWS.copy()
data_AWS_syn = data_AWS_syn.replace(-6999,None)

  and should_run_async(code)


In [10]:
data_AWS_prep = data_AWS_syn.copy()

  and should_run_async(code)


In [11]:
len(data_AWS_syn.index[np.where(np.isnan(data_AWS_syn))[0]])

  and should_run_async(code)


40078

In [12]:
len(data_AWS_syn['Wind speed at 10 m (m/s)']), len(data_AWS_syn.columns), len(data_AWS_syn['Wind speed at 10 m (m/s)'])*len(data_AWS_syn.columns)

  and should_run_async(code)


(278132, 33, 9178356)

### Comments:
- Few NaN values are present (278132/9178356 = 3.03% ). Since the data is available every 10 mins, we have considered to resample the data. 
- Find the number of NaN values in each column to identify where they are occuring and then decide weather to consider the data or not? - Try later

### Resampling the data to daily mean values 
* To observe the data on a seasonal scale

In [13]:
data_AWS_syn = data_AWS_syn.resample('D').mean()

  and should_run_async(code)


#### Precipitation - Daily summation of the values are determined instead of mean

In [14]:
data_AWS_prep_dailysum = data_AWS_prep['Precipitation at 10 m (mm)'].resample('D').sum()

  and should_run_async(code)


### Reducing the data to the year 2017

In [15]:
data_AWS_syn_2017 = data_AWS_syn.loc['2017']

# Changing the units of temperature to Kelvin

#data_AWS_syn_2017['Soil temperature at - 0.01 m (K)'] = data_AWS_syn_2017['Soil temperature at - 0.00 m (°C)'] + 273.15
#data_AWS_syn_2017['Soil temperature at - 0.04 m (K)'] = data_AWS_syn_2017['Soil temperature at - 0.04 m (°C)'] + 273.15
#data_AWS_syn_2017['Soil temperature at - 0.1 m (K)'] = data_AWS_syn_2017['Soil temperature at - 0.1 m (°C)'] + 273.15
#data_AWS_syn_2017['Soil temperature at - 0.2 m (K)'] = data_AWS_syn_2017['Soil temperature at - 0.2 m (°C)'] + 273.15
#data_AWS_syn_2017['Soil temperature at - 0.4 m (K)'] = data_AWS_syn_2017['Soil temperature at - 0.4 m (°C)'] + 273.15
#data_AWS_syn_2017['Soil temperature at - 0.8 m (K)'] = data_AWS_syn_2017['Soil temperature at - 0.8 m (°C)'] + 273.15
#data_AWS_syn_2017['Soil temperature at - 1.2 m (K)'] = data_AWS_syn_2017['Soil temperature at - 1.2 m (°C)'] + 273.15
#data_AWS_syn_2017['Soil temperature at - 1.6 m (K)'] = data_AWS_syn_2017['Soil temperature at - 1.6 m (°C)'] + 273.15

depths = [0.04, 0.1, 0.2, 0.4, 0.8, 1.2, 1.6]

for i, depth in enumerate(depths):
    data_AWS_syn_2017[f'Soil temperature at - {depth} m (K)'] = data_AWS_syn_2017[f'Soil temperature at - {depth} m (°C)'] + 273.15
    

  and should_run_async(code)


How do you convert the data to be PEST suitable?

In [16]:
# Need to change VWC to saturation of liquid [Which is the output]
# Coverting saturation to VWC [Liquid]: Saturation = VWC / (porosity * 100)

porosity_peat = 0.5 # Peat layer until 0.385
porosity_mineral = 0.3 # Mineral layer until 2.24 m
depths = [0.04, 0.1, 0.2, 0.4, 0.8, 1.2, 1.6]

for i, depth in enumerate(depths):
    if depth < 0.385:
        data_AWS_syn_2017[f'point - {depth} m saturation liquid'] = data_AWS_syn_2017[f'Soil moisture at - {depth} m (%)']/(porosity_peat*100)
    else:
        data_AWS_syn_2017[f'point - {depth} m saturation liquid'] = data_AWS_syn_2017[f'Soil moisture at - {depth} m (%)']/(porosity_mineral*100)

  and should_run_async(code)


In [25]:
obs_data_example = pd.read_csv('../Freyberg_example/freyberg6_run_glm.obs_data.csv') 
obs_data_example

  and should_run_async(code)


Unnamed: 0,obsnme,obsval,weight,obgnme
0,gage_1_20151231,951.710,0.000000,gage
1,gage_1_20160131,1530.100,0.004357,gage
2,gage_1_20160229,1855.300,0.003593,gage
3,gage_1_20160331,1907.100,0.003496,gage
4,gage_1_20160430,1747.700,0.003815,gage
...,...,...,...,...
1020,trgw_2_9_1_20170831,34.864,0.000000,trgw_2_9_1
1021,trgw_2_9_1_20170930,34.780,0.000000,trgw_2_9_1
1022,trgw_2_9_1_20171031,34.771,0.000000,trgw_2_9_1
1023,trgw_2_9_1_20171130,34.875,0.000000,trgw_2_9_1


The following details need to be defined:

Note: Here we need to add the values from the excel that we read previously.

- OBSNME: stemp_{depth}_{1-365} & smois_{depth}_{1-365}: Observation names - We have temperature and moisture values at 7 depths. 
- OBSVAL: The corresponding values need to be added in pandas from the dataframe - data_AWS_syn_2017
- WEIGHT: Assigning equal weights to all variables. 1/5110 = 1.9569 10e-4; 2 {soil_temp, soil_mois} * 7 {7 sensors} * 365 {1-365}  = 5110 
- obgnme: 'temp', 'mois'


In [26]:
len(np.arange(0,5110,1))

  and should_run_async(code)


5110

In [27]:
depth = 0.1
times = 1
data_AWS_syn_2017[f'Soil moisture at - 0.1 m (%)'][times - 1]

  and should_run_async(code)


8.352500000000006

In [28]:
data_AWS_syn_2017.columns

  and should_run_async(code)


Index(['Wind speed at 10 m (m/s)', 'Wind direction at 10 m (°)',
       'Air temperature at 5 m (°C)', 'Relative humidity at 5 m (%)',
       'Precipitation at 10 m (mm)', 'Air pressure (hPa)',
       'Infrared temperature (°C)', 'Infrared temperature (°C)', 'PAR_down',
       'PAR_up', 'Incoming shortwave radiation (W/m^2)',
       'Outgoing shortwave radiation (W/m^2)',
       'Incoming longwave radiation (W/m^2)',
       'Outgoing longwave radiation (W/m^2)', 'Net radiation (W/m^2)',
       'Soil heat flux at - 0.06 m (W/m^2) - 1',
       'Soil heat flux at - 0.06 m (W/m^2) - 2',
       'Soil heat flux at - 0.06 m (W/m^2) - 3',
       'Soil moisture at - 0.04 m (%)', 'Soil moisture at - 0.1 m (%)',
       'Soil moisture at - 0.2 m (%)', 'Soil moisture at - 0.4 m (%)',
       'Soil moisture at - 0.8 m (%)', 'Soil moisture at - 1.2 m (%)',
       'Soil moisture at - 1.6 m (%)', 'Soil temperature at - 0.00 m (°C)',
       'Soil temperature at - 0.04 m (°C)', 'Soil temperature at - 0.1 

In [29]:
# Creating a new dataframes with columns = obs_data_example.index and rows = 0 - 5110
# There are 365 values for one sensor in each year, hence we move from 0 - 364 & then it repeats again for the next sensor.

obs_data = pd.DataFrame(data=None,columns=obs_data_example.columns,index=np.arange(1,5111,1))


depths = [0.04, 0.1, 0.2, 0.4, 0.8, 1.2, 1.6]

times = np.arange(1, 366, 1)

# Adding column information:


x = 0
for i, depth in enumerate(depths):
    for j, time in enumerate(times):
        # Column name = obsnme
        obs_data.iloc[x,0] = f'stemp_{depth}_{j}'
        # Column name = obsval
        obs_data.iloc[x,1] = data_AWS_syn_2017[f'Soil temperature at - {depth} m (K)'][j]
        # Column name = weights : Assigning equal weight to all variables
        obs_data.iloc[x,2] = 1/5110
        # Column name = obgnme : Assigning observation group name
        obs_data.iloc[x,3] = 'temp'
        x = x + 1

for i, depth in enumerate(depths):
    for j, time in enumerate(times):
        obs_data.iloc[x,0] = f'smois_{depth}_{j}'
        # Column name = obsval
        obs_data.iloc[x,1] = data_AWS_syn_2017[f'point - {depth} m saturation liquid'][j]
        # Column name = weights : Assigning equal weight to all variables
        obs_data.iloc[x,2] = 1/5110
        # Column name = obgnme : Assigning observation group name
        obs_data.iloc[x,3] = 'mois'
        x = x + 1

        
obs_data

  and should_run_async(code)


Unnamed: 0,obsnme,obsval,weight,obgnme
1,stemp_0.04_0,261.023264,0.000196,temp
2,stemp_0.04_1,261.006667,0.000196,temp
3,stemp_0.04_2,260.854861,0.000196,temp
4,stemp_0.04_3,261.032708,0.000196,temp
5,stemp_0.04_4,261.347014,0.000196,temp
...,...,...,...,...
5106,smois_1.6_360,0.168612,0.000196,mois
5107,smois_1.6_361,0.167619,0.000196,mois
5108,smois_1.6_362,0.166811,0.000196,mois
5109,smois_1.6_363,0.16606,0.000196,mois


In [30]:
# Test - observation data in PEST 
obs_data.iloc[1000]

  and should_run_async(code)


obsnme    stemp_0.2_270
obsval       274.658646
weight         0.000196
obgnme             temp
Name: 1001, dtype: object

In [31]:
# Test - actual observation data
data_AWS_syn_2017[f'Soil temperature at - 0.2 m (K)'][270]

  and should_run_async(code)


274.6586458333333

In [32]:
# Exporting the parameter group csv file
obs_data.to_csv('rk_model_glm.obs_data.csv', index=False)

  and should_run_async(code)


### 4. Template files

- It could be created manually by editing the files.

### 5. Instruction files


In [115]:
# Simulation data - I have manually edited this file to remove all initial lines
sim_data = pd.read_csv('rk_model_glm_obs_dat.dat',sep=' ') 

sim_data

Unnamed: 0,time [s],point -0.04 temperature [K],point -0.1 temperature [K],point -0.2 temperature [K],point -0.4 temperature [K],point -0.8 temperature [K],point -1.2 temperature [K],point -1.6 temperature [K],point -0.04 saturation liquid,point -0.1 saturation liquid,point -0.2 saturation liquid,point -0.4 saturation liquid,point -0.8 saturation liquid,point -1.2 saturation liquid,point -1.6 saturation liquid
0,0.0,270.150000,270.150000,270.150000,270.150000,270.150000,270.150000,270.150000,0.112949,0.121899,0.121899,0.121899,0.121899,0.121899,0.121899
1,86400.0,261.652266,263.295593,265.624812,268.197106,269.781201,270.089641,270.145229,0.107571,0.106694,0.109828,0.115042,0.120251,0.121613,0.121876
2,172800.0,261.459825,262.569818,264.313520,266.733188,269.029989,269.836751,270.100691,0.107521,0.105897,0.107941,0.111763,0.117480,0.120485,0.121665
3,259200.0,262.665377,263.256887,264.328393,266.131844,268.398335,269.512045,270.014415,0.107855,0.106650,0.107960,0.110668,0.115584,0.119180,0.121266
4,345600.0,263.255535,263.698865,264.495959,265.905777,267.949258,269.188441,269.894629,0.108038,0.107169,0.108182,0.110285,0.114411,0.118011,0.120734
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
361,31190400.0,262.914728,262.945770,263.023392,263.267113,263.960163,264.822732,266.067327,0.107931,0.106301,0.106387,0.106662,0.107489,0.108631,0.110557
362,31276800.0,256.619808,258.161280,259.974858,261.901732,263.649212,264.746456,266.019845,0.106554,0.102154,0.103508,0.105219,0.107109,0.108524,0.110476
363,31363200.0,256.228922,257.258541,258.762280,260.835472,263.149134,264.552488,265.944987,0.106493,0.101552,0.102579,0.104232,0.106528,0.108258,0.110350
364,31449600.0,257.736143,258.199659,259.048259,260.566269,262.772838,264.323388,265.842164,0.106738,0.102180,0.102790,0.103999,0.106113,0.107953,0.110179


In [116]:
# Creating an instruction file suitable for the analysis
ins_data = pd.DataFrame(columns=sim_data.columns, index=sim_data.index)

# Adding the simulated variables [temperature & Moisture] in the instruction file
# Temperature
for i, depth in enumerate(depths):
    for j, time in enumerate(times):
        # Column name = obsnme
        # (i + 1) - Signifies the start from the 2nd column
        ins_data.iloc[j+1, i+1] = f' !stemp_{depth}_{j}! ' 

# Moisture
for i, depth in enumerate(depths):
    for j, time in enumerate(times):
        # Column name = obsnme
        # (i + 1) - Signifies the start from the 2nd column
        ins_data.iloc[j+1, i+8] = f' !smois_{depth}_{j}! ' 



ins_data.head()

Unnamed: 0,time [s],point -0.04 temperature [K],point -0.1 temperature [K],point -0.2 temperature [K],point -0.4 temperature [K],point -0.8 temperature [K],point -1.2 temperature [K],point -1.6 temperature [K],point -0.04 saturation liquid,point -0.1 saturation liquid,point -0.2 saturation liquid,point -0.4 saturation liquid,point -0.8 saturation liquid,point -1.2 saturation liquid,point -1.6 saturation liquid
0,,,,,,,,,,,,,,,
1,,!stemp_0.04_0!,!stemp_0.1_0!,!stemp_0.2_0!,!stemp_0.4_0!,!stemp_0.8_0!,!stemp_1.2_0!,!stemp_1.6_0!,!smois_0.04_0!,!smois_0.1_0!,!smois_0.2_0!,!smois_0.4_0!,!smois_0.8_0!,!smois_1.2_0!,!smois_1.6_0!
2,,!stemp_0.04_1!,!stemp_0.1_1!,!stemp_0.2_1!,!stemp_0.4_1!,!stemp_0.8_1!,!stemp_1.2_1!,!stemp_1.6_1!,!smois_0.04_1!,!smois_0.1_1!,!smois_0.2_1!,!smois_0.4_1!,!smois_0.8_1!,!smois_1.2_1!,!smois_1.6_1!
3,,!stemp_0.04_2!,!stemp_0.1_2!,!stemp_0.2_2!,!stemp_0.4_2!,!stemp_0.8_2!,!stemp_1.2_2!,!stemp_1.6_2!,!smois_0.04_2!,!smois_0.1_2!,!smois_0.2_2!,!smois_0.4_2!,!smois_0.8_2!,!smois_1.2_2!,!smois_1.6_2!
4,,!stemp_0.04_3!,!stemp_0.1_3!,!stemp_0.2_3!,!stemp_0.4_3!,!stemp_0.8_3!,!stemp_1.2_3!,!stemp_1.6_3!,!smois_0.04_3!,!smois_0.1_3!,!smois_0.2_3!,!smois_0.4_3!,!smois_0.8_3!,!smois_1.2_3!,!smois_1.6_3!


In [117]:
# Removing the last character from the last column '~' to mimic the file sfr.csv.in
ins_data["point -1.6 saturation liquid"] = ins_data["point -1.6 saturation liquid"].str[:-1]
ins_data["point -1.6 saturation liquid"].head()

0               NaN
1     !smois_1.6_0!
2     !smois_1.6_1!
3     !smois_1.6_2!
4     !smois_1.6_3!
Name: point -1.6 saturation liquid, dtype: object

In [118]:
# Replacing the time vaiable with ~dum
ins_data['time [s]'] = f' !dum! '

# Dropping unnecessary row and column
#ins_data.drop(['time [s]'], axis=1, inplace=True)
ins_data.drop([0], axis=0, inplace=True)
ins_data.head()


Unnamed: 0,time [s],point -0.04 temperature [K],point -0.1 temperature [K],point -0.2 temperature [K],point -0.4 temperature [K],point -0.8 temperature [K],point -1.2 temperature [K],point -1.6 temperature [K],point -0.04 saturation liquid,point -0.1 saturation liquid,point -0.2 saturation liquid,point -0.4 saturation liquid,point -0.8 saturation liquid,point -1.2 saturation liquid,point -1.6 saturation liquid
1,!dum!,!stemp_0.04_0!,!stemp_0.1_0!,!stemp_0.2_0!,!stemp_0.4_0!,!stemp_0.8_0!,!stemp_1.2_0!,!stemp_1.6_0!,!smois_0.04_0!,!smois_0.1_0!,!smois_0.2_0!,!smois_0.4_0!,!smois_0.8_0!,!smois_1.2_0!,!smois_1.6_0!
2,!dum!,!stemp_0.04_1!,!stemp_0.1_1!,!stemp_0.2_1!,!stemp_0.4_1!,!stemp_0.8_1!,!stemp_1.2_1!,!stemp_1.6_1!,!smois_0.04_1!,!smois_0.1_1!,!smois_0.2_1!,!smois_0.4_1!,!smois_0.8_1!,!smois_1.2_1!,!smois_1.6_1!
3,!dum!,!stemp_0.04_2!,!stemp_0.1_2!,!stemp_0.2_2!,!stemp_0.4_2!,!stemp_0.8_2!,!stemp_1.2_2!,!stemp_1.6_2!,!smois_0.04_2!,!smois_0.1_2!,!smois_0.2_2!,!smois_0.4_2!,!smois_0.8_2!,!smois_1.2_2!,!smois_1.6_2!
4,!dum!,!stemp_0.04_3!,!stemp_0.1_3!,!stemp_0.2_3!,!stemp_0.4_3!,!stemp_0.8_3!,!stemp_1.2_3!,!stemp_1.6_3!,!smois_0.04_3!,!smois_0.1_3!,!smois_0.2_3!,!smois_0.4_3!,!smois_0.8_3!,!smois_1.2_3!,!smois_1.6_3!
5,!dum!,!stemp_0.04_4!,!stemp_0.1_4!,!stemp_0.2_4!,!stemp_0.4_4!,!stemp_0.8_4!,!stemp_1.2_4!,!stemp_1.6_4!,!smois_0.04_4!,!smois_0.1_4!,!smois_0.2_4!,!smois_0.4_4!,!smois_0.8_4!,!smois_1.2_4!,!smois_1.6_4!


In [119]:
#ins_data['l1'] = 'l1 ~'
# Adding an extra column to mimic sfr.csv.ins
ins_data.insert(0, 'l1', 'l1 ')

In [122]:
# Exporting the instruction file
ins_data.to_csv('rk_model_glm_obs_data.dat.ins', header=False, index=False,sep='\t')

##### Dont forget to add 'pif ~' string manually 

##### Refer later!

### Employing another methodology to generate the input files
- https://github.com/pypest/pyemu_pestpp_workflow/blob/master/setup_pestpp_interface.ipynb

### View the existing model

In [63]:
org_model_ws = os.path.join('RkModel_template')
os.listdir(org_model_ws)

['rk_model_glm_obs_dat.dat',
 'ats_vis_mesh.h5.0.xmf',
 'ats_vis_data.h5.400.xmf',
 'ats_vis_data.h5.1200.xmf',
 'ats_vis_data.h5.1100.xmf',
 'checkpoint01213.h5',
 'ats_vis_data.h5.700.xmf',
 'ats_vis_data.h5',
 'ats_vis_data.h5.600.xmf',
 'ats_vis_data.VisIt.xmf',
 'ats_vis_data.h5.500.xmf',
 'ats_vis_data.h5.800.xmf',
 'ats_vis_mesh.h5',
 'ats_vis_data.h5.200.xmf',
 'ats_vis_data.h5.900.xmf',
 'checkpoint_final.h5',
 'rk_model_glm_input.xml',
 'ats_vis_data.h5.1000.xmf',
 'ats_vis_data.h5.0.xmf',
 'ats_vis_mesh.VisIt.xmf',
 'ats_vis_data.h5.300.xmf',
 'ats_vis_data.h5.100.xmf',
 'observations.dat',
 '.ipynb_checkpoints']

### Run the original model

In [64]:
# Change the model command to a py file if required! - Think of a way to delete all the previous files
pyemu.os_utils.run("ats --xml_file=rk_model_glm_input.xml", cwd=org_model_ws)

### Copy to preserve original model

In [65]:
tmp_model_ws = "temp_pst_from"
if os.path.exists(tmp_model_ws):
    shutil.rmtree(tmp_model_ws)
shutil.copytree(org_model_ws,tmp_model_ws)
os.listdir(tmp_model_ws)

['rk_model_glm_obs_dat.dat',
 'ats_vis_mesh.h5.0.xmf',
 'ats_vis_data.h5.400.xmf',
 'ats_vis_data.h5.1200.xmf',
 'ats_vis_data.h5.1100.xmf',
 'checkpoint01213.h5',
 'ats_vis_data.h5.700.xmf',
 'ats_vis_data.h5',
 'ats_vis_data.h5.600.xmf',
 'ats_vis_data.VisIt.xmf',
 'ats_vis_data.h5.500.xmf',
 'ats_vis_data.h5.800.xmf',
 'ats_vis_mesh.h5',
 'ats_vis_data.h5.200.xmf',
 'ats_vis_data.h5.900.xmf',
 'checkpoint_final.h5',
 'rk_model_glm_input.xml',
 'ats_vis_data.h5.1000.xmf',
 'ats_vis_data.h5.0.xmf',
 'ats_vis_mesh.VisIt.xmf',
 'ats_vis_data.h5.300.xmf',
 'ats_vis_data.h5.100.xmf',
 'observations.dat',
 '.ipynb_checkpoints']

### Construct a PEST interface

#### Start PstFrom() build

In [66]:
template_ws = "RkModel_PESTtemplate"
pf = pyemu.utils.PstFrom(
    original_d=tmp_model_ws,  # where to find reference model
    new_d=template_ws,  # where to build PEST
    remove_existing=True,  # Stomp in new_d, if it exists
    longnames=True,  # use PESTPP long paramter and observation names (handy storing metadata)
    spatial_reference=None,  # model spatial reference info
    zero_based=False,  # model uses zero-based references - Check if it is True or False
    start_datetime="1-1-2017"
)  # model start time reference

2021-11-02 10:54:02.080711 starting: opening PstFrom.log for logging
2021-11-02 10:54:02.082215 starting PstFrom process
2021-11-02 10:54:02.082813 starting: setting up dirs
2021-11-02 10:54:02.083111 starting: removing existing new_d 'RkModel_PESTtemplate'
2021-11-02 10:54:02.086573 finished: removing existing new_d 'RkModel_PESTtemplate' took: 0:00:00.003462
2021-11-02 10:54:02.086933 starting: copying original_d 'temp_pst_from' to new_d 'RkModel_PESTtemplate'
2021-11-02 10:54:02.090764 finished: copying original_d 'temp_pst_from' to new_d 'RkModel_PESTtemplate' took: 0:00:00.003831
2021-11-02 10:54:02.091583 finished: setting up dirs took: 0:00:00.008770


#### Add observations

So now that we have a `PstFrom` instance, but its just an empty container at this point, so we need to add some PEST interface "observations" and "parameters".  
#### Temperature and moisture observations


In [67]:
df = pd.read_csv(os.path.join(tmp_model_ws,"rk_model_glm_obs_dat.dat"),index_col=0, sep=' ')
display(df)

Unnamed: 0_level_0,point -0.04 temperature [K],point -0.1 temperature [K],point -0.2 temperature [K],point -0.4 temperature [K],point -0.8 temperature [K],point -1.2 temperature [K],point -1.6 temperature [K],point -0.04 saturation liquid,point -0.1 saturation liquid,point -0.2 saturation liquid,point -0.4 saturation liquid,point -0.8 saturation liquid,point -1.2 saturation liquid,point -1.6 saturation liquid
time [s],Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0.0,270.150000,270.150000,270.150000,270.150000,270.150000,270.150000,270.150000,0.112949,0.121899,0.121899,0.121899,0.121899,0.121899,0.121899
86400.0,261.652266,263.295593,265.624812,268.197106,269.781201,270.089641,270.145229,0.107571,0.106694,0.109828,0.115042,0.120251,0.121613,0.121876
172800.0,261.459825,262.569818,264.313520,266.733188,269.029989,269.836751,270.100691,0.107521,0.105897,0.107941,0.111763,0.117480,0.120485,0.121665
259200.0,262.665377,263.256887,264.328393,266.131844,268.398335,269.512045,270.014415,0.107855,0.106650,0.107960,0.110668,0.115584,0.119180,0.121266
345600.0,263.255535,263.698865,264.495959,265.905777,267.949258,269.188441,269.894629,0.108038,0.107169,0.108182,0.110285,0.114411,0.118011,0.120734
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31190400.0,262.914728,262.945770,263.023392,263.267113,263.960163,264.822732,266.067327,0.107931,0.106301,0.106387,0.106662,0.107489,0.108631,0.110557
31276800.0,256.619808,258.161280,259.974858,261.901732,263.649212,264.746456,266.019845,0.106554,0.102154,0.103508,0.105219,0.107109,0.108524,0.110476
31363200.0,256.228922,257.258541,258.762280,260.835472,263.149134,264.552488,265.944987,0.106493,0.101552,0.102579,0.104232,0.106528,0.108258,0.110350
31449600.0,257.736143,258.199659,259.048259,260.566269,262.772838,264.323388,265.842164,0.106738,0.102180,0.102790,0.103999,0.106113,0.107953,0.110179


In [68]:
#df.to_csv(os.path.join(tmp_model_ws,"observations.csv"))

In [69]:
hds_df = pf.add_observations(
    "rk_model_glm_obs_dat.dat",  # model output file to use 
    insfile="rk_model_glm_obs_dat.dat.ins",  # optional, define name of PEST instruction file
    index_cols="time [s]",  # column used to index observation/outputs
    use_cols=list(df.columns.values),  # columns to setup observations for (can be multiple)
    prefix="tempmois",  # observation name prefix
)
display(hds_df)

2021-11-02 10:55:03.150971 starting: adding observations from output file rk_model_glm_obs_dat.dat
2021-11-02 10:55:03.151919 starting: adding observations from tabular output file '['rk_model_glm_obs_dat.dat']'
2021-11-02 10:55:03.152700 starting: reading list RkModel_PESTtemplate/rk_model_glm_obs_dat.dat
2021-11-02 10:55:03.158938 finished: reading list RkModel_PESTtemplate/rk_model_glm_obs_dat.dat took: 0:00:00.006238
2021-11-02 10:55:03.160098 starting: building insfile for tabular output file rk_model_glm_obs_dat.dat
2021-11-02 10:55:03.194126 finished: building insfile for tabular output file rk_model_glm_obs_dat.dat took: 0:00:00.034028
2021-11-02 10:55:03.194500 starting: adding observation from instruction file 'RkModel_PESTtemplate/rk_model_glm_obs_dat.dat.ins'
error processing instruction/output file pair: InstructionFile error processing instruction file on line number 3: unmatched observation marker '!', looking for '!' in token '!tempmois_usecol:point'
2021-11-02 10:55:03

Unnamed: 0,obsnme,obsval,weight,obgnme
tempmois_usecol:point -0.04 saturation liquid_time[s]:0.0,tempmois_usecol:point -0.04 saturation liquid_time[s]:0.0,1.000000e+10,1.0,tempmois_usecol:point -0.04 saturation liquid
tempmois_usecol:point -0.04 saturation liquid_time[s]:10022400.0,tempmois_usecol:point -0.04 saturation liquid_time[s]:10022400.0,1.000000e+10,1.0,tempmois_usecol:point -0.04 saturation liquid
tempmois_usecol:point -0.04 saturation liquid_time[s]:10108800.0,tempmois_usecol:point -0.04 saturation liquid_time[s]:10108800.0,1.000000e+10,1.0,tempmois_usecol:point -0.04 saturation liquid
tempmois_usecol:point -0.04 saturation liquid_time[s]:10195200.0,tempmois_usecol:point -0.04 saturation liquid_time[s]:10195200.0,1.000000e+10,1.0,tempmois_usecol:point -0.04 saturation liquid
tempmois_usecol:point -0.04 saturation liquid_time[s]:10281600.0,tempmois_usecol:point -0.04 saturation liquid_time[s]:10281600.0,1.000000e+10,1.0,tempmois_usecol:point -0.04 saturation liquid
...,...,...,...,...
tempmois_usecol:point -1.6 temperature [k]_time[s]:9590400.0,tempmois_usecol:point -1.6 temperature [k]_time[s]:9590400.0,1.000000e+10,1.0,tempmois_usecol:point -1.6 temperature [k]
tempmois_usecol:point -1.6 temperature [k]_time[s]:9676800.0,tempmois_usecol:point -1.6 temperature [k]_time[s]:9676800.0,1.000000e+10,1.0,tempmois_usecol:point -1.6 temperature [k]
tempmois_usecol:point -1.6 temperature [k]_time[s]:9763200.0,tempmois_usecol:point -1.6 temperature [k]_time[s]:9763200.0,1.000000e+10,1.0,tempmois_usecol:point -1.6 temperature [k]
tempmois_usecol:point -1.6 temperature [k]_time[s]:9849600.0,tempmois_usecol:point -1.6 temperature [k]_time[s]:9849600.0,1.000000e+10,1.0,tempmois_usecol:point -1.6 temperature [k]


In [70]:
[f for f in os.listdir(template_ws) if f.endswith(".ins")]

['rk_model_glm_obs_dat.dat.ins']

In [None]:
### Since the template file includes 

In [71]:
pst = pf.build_pst()

noptmax:0, npar_adj:0, nnz_obs:5124


In [72]:
[f for f in os.listdir(template_ws) if f.endswith(".py")]

['forward_run.py']

In [74]:
_ = [print(line.rstrip()) for line in open(os.path.join(template_ws,"forward_run.py"))]

import os
import multiprocessing as mp
import numpy as np
import pandas as pd
import pyemu
def main():

    try:
       os.remove(r'rk_model_glm_obs_dat.dat')
    except Exception as e:
       print(r'error removing tmp file:rk_model_glm_obs_dat.dat')
    pyemu.os_utils.run(r'ats --xml_file=rk_model_glm_input.xml')


if __name__ == '__main__':
    mp.freeze_support()
    main()



In [73]:
# only execute this block once!
pf.mod_sys_cmds.append("ats --xml_file=rk_model_glm_input.xml")
pst = pf.build_pst()

noptmax:0, npar_adj:0, nnz_obs:5124
2021-11-02 11:04:54.054578 forward_run line:pyemu.os_utils.run(r'ats --xml_file=rk_model_glm_input.xml')



In [75]:
_ = [print(line.rstrip()) for line in open(os.path.join(template_ws,"forward_run.py"))]

import os
import multiprocessing as mp
import numpy as np
import pandas as pd
import pyemu
def main():

    try:
       os.remove(r'rk_model_glm_obs_dat.dat')
    except Exception as e:
       print(r'error removing tmp file:rk_model_glm_obs_dat.dat')
    pyemu.os_utils.run(r'ats --xml_file=rk_model_glm_input.xml')


if __name__ == '__main__':
    mp.freeze_support()
    main()



In [3]:
import pyemu

  and should_run_async(code)


In [4]:
pst = pyemu.Pst("rk_model_glm_cf.pst")
pst.write("rk_model_glm_cf_v1.pst")

noptmax:0, npar_adj:4, nnz_obs:5110


  and should_run_async(code)
