This notebook is used to get residence-time distribution (RTD) for the entire aquifer from an existing MODFLOW model. It is possible to read in any group or label from a 3D array and make RTDs for those groups. The approach is to 
* read an existing model
* create flux-weighted particle starting locations in every cell
* run MODPATH and read endpoints
* fit parametric distributions

This notebook fits parametric distributions. Another notebook creates flux-weighted particles.

### PFJ updates:

1. Process flux and volumetrically-weighted particles at the same time.
2. Process all 3 FWP models at the same time
3. Plot results of all 6 RTDs on the same graph to generate figure 2 of the paper.

In [None]:
__author__ = 'Jeff Starn'
%matplotlib notebook

from IPython.display import set_matplotlib_formats
set_matplotlib_formats('png', 'pdf')
from IPython.display import Image
from IPython.display import Math
from ipywidgets import interact, Dropdown
from IPython.display import display

import os
import sys
import shutil
import pickle
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.ticker as mt
import matplotlib.patches as patches

import flopy as fp
import imeth
import fit_parametric_distributions
import pandas as pd
import gdal
import scipy.stats as ss
import scipy.optimize as so
from scipy.interpolate import Rbf
from scipy.interpolate import griddata


# Preliminary stuff

## Set user-defined variables

MODFLOW and MODPATH use elapsed time and are not aware of calendar time. To place MODFLOW/MODPATH elapsed time on the calendar, two calendar dates were specified at the top of the notebook: the beginning of the first stress period (`mf_start_date`) and when particles are to be released (`mp_release_date`). The latter date could be used in many ways, for example to represent a sampling date, or it could be looped over to create a time-lapse set of ages. 

In [None]:
# Additional input by PFJ to run analyses of multiple models:

simulate_list = ['FWP1L_zK', 'FWP5L_zK', 'FWP5L_hK']  # list of models in the gen_mod_dict.py file to analyze

## Loop through home directory to get list of name files

In [None]:
homes = ['../Models']
fig_dir = '../Figures'

if not os.path.exists(fig_dir):
    os.mkdir(fig_dir)  # PFJ:  dst is not defined; changed to fig_dir.

mfpth = '../executables/MODFLOW-NWT_1.0.9/bin/MODFLOW-NWT_64.exe'
mp_exe_name = '../executables/modpath.6_0/bin/mp6x64.exe' 

mf_start_date_str = '01/01/1900' 
mp_release_date_str = '01/01/2017' 

age_cutoff = 65
year_cutoff = '01/01/1952'

surf_aq_lays = 3  # deepest layer of the surficial aquifer.

dir_list = []
mod_list = []
i = 0
r = 0

for home in homes:
    if os.path.exists(home):
        for dirpath, dirnames, filenames in os.walk(home):
            for f in filenames:
                if os.path.splitext(f)[-1] == '.nam':
                    mod = os.path.splitext(f)[0]
                    i += 1
                    if mod in simulate_list:
                        mod_list.append(mod)
                        dir_list.append(dirpath)
                        r += 1
                               
print('    {} models read'.format(i))
print('These {} models will be analyzed: {}'.format(r, mod_list))

##  Create names and path for model workspace. 

The procedures in this notebook can be run from the notebook or from a batch file by downloading the notebook as a Python script and uncommenting the following code and commenting out the following block. The remainder of the script has to be indented to be included in the loop.  This may require familiarity with Python. 

In [None]:
# generate list of nam files:
nam_list = []
for pth in dir_list:
    model = os.path.normpath(pth).split(os.sep)[2]
    nam_file = '{}.nam'.format(model)
    nam_list.append(nam_file)

# Load an existing model

In [None]:
# Read-in model info and check max/min nlay & create list of DIS objects.  
# Assumes all else is the same among models (hnoflow, hdry, etc)
print ('Reading model information')
nlay_min = 100
nlay_max = 0

dis_objs = []
for i, model in enumerate(nam_list):
    nam_file = model
    model_ws = dir_list[i]
    
    fpmg = fp.modflow.Modflow.load(nam_file, model_ws=model_ws, exe_name=mfpth, version='mfnwt', 
                                   load_only=['DIS', 'BAS6', 'UPW', 'OC'], check=False)

    dis = fpmg.get_package('DIS')
    dis_objs.append(dis)
    bas = fpmg.get_package('BAS6')
    upw = fpmg.get_package('UPW')
    oc = fpmg.get_package('OC')

    delr = dis.delr
    delc = dis.delc
    nlay = dis.nlay
    nrow = dis.nrow
    ncol = dis.ncol
    bot = dis.getbotm()
    top = dis.gettop()

    hnoflo = bas.hnoflo
    ibound = np.asarray(bas.ibound.get_value())
    hdry = upw.hdry
    
    if nlay > nlay_max:
        nlay_max = nlay
    if nlay < nlay_min:
        nlay_min = nlay
        
    print('  .. done reading model {}'.format(i+1))

print ('   ... all done') 

print('minimum layers in a model:  {}'.format(nlay_min))
print('maximum layers in a model:  {}'.format(nlay_max))

## Specification of time in MODFLOW/MODPATH

There are several time-related concepts used in MODPATH.
* `simulation time` is the elapsed time in model time units from the beginning of the first stress period
* `reference time` is an arbitrary value of `simulation time` that is between the beginning and ending of `simulation time`
* `tracking time` is the elapsed time relative to `reference time`. It is always positive regardless of whether particles are tracked forward or backward
* `release time` is when a particle is released and is specified in `tracking time`

In [None]:
# setup dictionaries of the MODFLOW units for proper labeling of figures.
lenunit = {0:'undefined units', 1:'feet', 2:'meters', 3:'centimeters'}
timeunit = {0:'undefined', 1:'second', 2:'minute', 3:'hour', 4:'day', 5:'year'}

# Create dictionary of multipliers for converting model time units to days
time_dict = dict()
time_dict[0] = 1.0 # undefined assumes days, so enter conversion to days
time_dict[1] = 24 * 60 * 60
time_dict[2] = 24 * 60
time_dict[3] = 24
time_dict[4] = 1.0
time_dict[5] = 1.0

In [None]:
# convert string representation of dates into Python datetime objects
mf_start_date = dt.datetime.strptime(mf_start_date_str , '%m/%d/%Y')
mp_release_date = dt.datetime.strptime(mp_release_date_str , '%m/%d/%Y')

# convert simulation time to days from the units specified in the MODFLOW DIS file
sim_time = np.append(0, dis.get_totim())
sim_time /= time_dict[dis.itmuni]

# make a list of simulation time formatted as calendar dates
date_list = [mf_start_date + dt.timedelta(days = item) for item in sim_time]

# reference time and date are set to the end of the last stress period
ref_time = sim_time[-1]
ref_date = date_list[-1]

# release time is calculated in tracking time (for particle release) and 
# in simulation time (for identifying head and budget components)
release_time_trk = np.abs((ref_date - mp_release_date).days)
release_time_sim = (mp_release_date - mf_start_date).days

## Read endpoint file

In [None]:
def purge(ep_data):
    pre_Quaternary = ep_data.loc[ep_data.rt>=2.6e6]
    pre_Cretaceous = ep_data.loc[ep_data.rt>=66e6]
    preCambrian = ep_data.loc[ep_data.rt>=541e6]
    pre_earth = ep_data.loc[ep_data.rt>=4.6e9]

    print('\nFor your information:')
    print('{} particles were simulated as being older than Earth!'.format(preCambrian.shape[0]))
    print('{} particles were simulated as being PreCambrian in age.'.format(preCambrian.shape[0]))
    print('{} particles were simulated as being Cretaceous in age or older.'.format(pre_Cretaceous.shape[0]))
    print('{} particles were simulated as being pre-Quaternary in age.'.format(pre_Quaternary.shape[0]))
    
    ep_data = ep_data.loc[ep_data.rt<4.6e9]
    print('Purged particles older than earth')
    return(ep_data)

In [None]:
#  Change this block to read-in EPT, save-down pickles of end point file df, and generate CFD plots.  See code block 8 of GM_6flux_scale_analysis.ipynb

dfdict = {}
for i, model in enumerate(mod_list):
    model_ws = dir_list[i]
    src = os.path.join(model_ws, 'zone_df.csv')
    zone_df = pd.read_csv(src, index_col=0)
    dis = dis_objs[i]
    for group in zone_df:
        print('\nAnalyzing EPT for {}'.format(model))
        
        # form the path to the endpoint files
        mpname_flux = '{}_flux_{}'.format(os.path.join(model_ws, model), group)
        mpname_vol = '{}_volume_{}'.format(os.path.join(model_ws, model), group)
        mflux = '{}_flux'.format(model)
        mvol = '{}_volume'.format(model)

        flux_endpoint_file = '{}.{}'.format(mpname_flux, 'mpend')
        vol_endpoint_file = '{}.{}'.format(mpname_vol, 'mpend')

        # read the endpoint file to generate a dataframe
        f_ep_data1 = fit_parametric_distributions.read_endpoints(flux_endpoint_file, dis, time_dict)
        v_ep_data1 = fit_parametric_distributions.read_endpoints(vol_endpoint_file, dis, time_dict)
        
        f_ep_data = purge(f_ep_data1)
        v_ep_data = purge(v_ep_data1)
        
        fdst = '{}_mod.pickle'.format(os.path.join(mpname_flux))
        pickle.dump(f_ep_data, open(fdst, 'wb'))
        vdst = '{}_mod.pickle'.format(os.path.join(mpname_vol))
        pickle.dump(v_ep_data, open(vdst, 'wb'))
        
        dfdict[mflux] = f_ep_data
        dfdict[mvol] = v_ep_data
        
print('....done')

In [None]:
dfdict.keys()
dfdict['FWP1L_zK_flux'].dtypes

## Plot CDFs for each model

In [None]:
# Plot age distributions for all 3 models by flux/vol weight.

uniques = dfdict.keys()
n_uni = len(uniques)
sum_p = {}
for mn in uniques:
    sum_p[mn] = 0
    
vplots = 1
hplots = 1
figsize = (8, 6)
CS, ax = plt.subplots(vplots, hplots, figsize=figsize)

colors_line = plt.cm.brg(np.linspace(0, 1, len(mod_list)))  # 1 color for each model (3)
colors_line = colors_line.repeat(2, axis=0)  # dimensioned to account for flux and vol for each model
linestyle = []
for i, md in enumerate(uniques):
    
    if 'flux' in md:
        linestyle = '-'
    else:
        linestyle = '--'
        
    # keep just particles that started in the surficial aquifer
    df = dfdict[md].copy()
    df = df[df['Initial Layer'] <= surf_aq_lays]
    rt = df.rt.copy()  # 'rt' is "raw time" in the dataframe
    rt.sort_values(inplace=True)
    sum_p[md] = sum_p[md] + rt.count()
    y_rt = np.linspace(0, 1, rt.shape[0])

    ax.plot(rt, y_rt, c=colors_line[i], linestyle=linestyle, label=md)
    ax.plot((65, 65), (0.05, 1), 'k--')

    ax.set_xscale('log')
    ax.set_xlim(1e0, 1e6)
    ax.set_ylim(0, )

    ax.legend(loc=0, frameon=False, fontsize=8)#, bbox_to_anchor=(0.20, 0.2), ncol=1)
    ax.set_xlabel('Residence time, in years')
    ax.set_ylabel('Cumulative frequency')

CS.subplots_adjust(top= 0.96, hspace=0.15)

dst = '../RTD_flux_vs_vol.png'
plt.savefig(dst)
#plt.close()

In [None]:
# Calc and write-out summary stats for each model and weighting method.
dst = '../YFsummary_all_zones.dat'
uniques = dfdict.keys()
n_uni = len(uniques)
    
with open(dst, 'w') as f:
    for i, md in enumerate(uniques):
         # keep just particles that started in the surficial aquifer
        df = dfdict[md].copy()
        n_part_all = df.shape[0]
        medageall = df.rt.median()
        df = df[df['Initial Layer'] <= surf_aq_lays]
        medageglac = df.rt.median()
        n_part_glac = df.shape[0]
        young = df[df.rt <= age_cutoff]
        yf = young.shape[0] / n_part_glac
        meanYFage = young.rt.mean()
        medYFage = young.rt.median()
        f.write('Summary for {}\n'.format(md))
        f.write('Total particles in model:  {}\n'.format(n_part_all))
        f.write('Total glacial particles:  {}\n'.format(n_part_glac))
        f.write('Median age all particles in model:  {}\n'.format(medageall))
        f.write('Median age all glacial particles:  {}\n'.format(medageglac))
        f.write('Fraction Young glacial particles:  {:.3%}\n'.format(yf))
        f.write('Median age glac YF:  {}\n'.format(medYFage))
        f.write('Mean age glac YF:  {}\n\n'.format(meanYFage))
f.close()