# U.S. Geological Survey Class GW3099
Advanced Modeling of Groundwater Flow (GW3099)\
Boise, Idaho\
September 16 - 20, 2024

![title](../../images/ClassLocation.jpg)

# Pywatershed and MF6 coupling

## Introduction

### The challenge of siloed modeling domains
A recurring challenge of water sciences is working across disciplines. Increasingly, this is where important scientific questions and advances are located. From a software perspective, work across disciplines can result in new "code silos" being created when the software forged by interdisciplinary efforts becomes disconnected from its roots over time. 

One example of this is the case of GSFLOW, the USGS coupled groundwater-surfacewater flow model (Markstrom et al., 2008). GSFLOW represents a significant advancement in modeling many aspects of combined groundwater and surface water flow. However, over time, the GSFLOW code has diverged from the current and improved versions of the the USGS's hydrologic (Precipitation Runoff Modeling System, PRMS; Regan et al., 2022) and groundwater (MODFLOW, Langevin et al., 2017) models which it initially bridged. 

The challenge of working across-disciplines in a sustainable way is one issue addressed by the development of the Basic Model Interface (BMI, Hutton et al., 2020). Hughes et al. (2021) implemented BMI for MODFLOW 6 (MF6) and extended it to handle numerical-level model couplings. Hughes et al. (2021) specifically demonstrated the ease of coupling the PRMS model to MF6's BMI interface. 

This notebook reproduces the results of Hughes et al. (2021) but substitutes `pywatershed` for [PRMS-BMI](https://github.com/nhm-usgs/bmi-prms-demo) in the Sagehen Creek Watershed Simulation. The pywatershed package contains the same core functionality as the PRMS model in an interactive language with a more modularized design. The goal of this notebook is to give a concrete example of how pywatershed can couple to MF6 or to other models via BMI.

We implement the model coupling design of Hughes et al. (2021) as illustrated in Figure 1. Before discussing in more details, we note that this coupling is similar to GSFLOW, with the following differences:
* Surface water is modeled on HRUs instead of a grid
* Surface runoff and interflow are not routed from upslope to downslope HRUs using "cascading flow"
* The surface water soilzone process is only solved once per timestep instead of being recalculated with MF6 outer solver iterations

Figure 1 shows how pywatershed and MF6 are one-way coupled. Pywatershed runs the hydrologic simulation on an unstructured grid of hydrologic response units (HRUs). Starting from atmospheric focing inputs, and proceeding through canopy, snow, runoff, and soil, pywatershed uses process representations found in PRMS (National Hydrologic Model configuration). At the end of each timestep, fluxes and states are sent from pywatershed to MF6. As shown by arrows in the figure, separate sptial mappings exist from the HRUs to the MODFLOW grid and streamflow network. The gridded MF6 Unsaturated Zone Flow (UZF) package is conceptualized below the soil zone of the hydrologic model. Unstatisfied potentital evapotransipiration in the hydrologic model, remapped from HRUs to gridcells, can be met by water in the UZF. Similarly, hydrologic recharges of groundwater are summed and remapped as infiltration to UZF. The hydrologic surface runoff and interflow fluxes are mapped from the HRUS into the MF6 streamflow routing (SFR) network.

Also shown in the figure, the model design does allow MF6 Unstaurated Zone Flow (UZF) package to determine saturation and reject some or all of its infiltration term from the hydrologic soilzone. Instead of sending these back to Soilzone in a 2- way coupling, the rejected water is sent as runoff to streamflow routing (SFR) using the "mover" (MVR) package. Groundwater exfiltration is likewise mapped as runoff to SFR via MVR. 

| ![sagehen_schematic](sagehen_pws_mf6_coupling_schematic.png) |
|:--:|
| <i>Figure 1 Overview of spatial discretizations, the pywatershed-MF6 coupling, and physical process representations used for modeling the Sagehen Creek Watershed. The execution loop first executes pywatershed (1), then its fluxes are disaggregated with the two mappings and sent to MF6 (2) for execution via its BMI API. </i>|

### Requirements

The provided MODFLOW 6 dylib/dll (`sagehen_pws_mf6api/bin/libmf6.dylib`) may not work for you (it is for a Mac M1 chip). You can obtain a MODFLOW 6 dylib/dll for other platforms from https://github.com/MODFLOW-USGS/modflow6/releases. You should place it in `sagehen_pws_mf6api/bin/`.

If these preprovided DLLs do not work for you, then worst case scenario (not too time consuming) is that you have to install and build MF6 as described starting here (after installing its environment): https://github.com/MODFLOW-USGS/modflow6-nightly-build/releases/tag/20230430

Once you have the dylib/dll, you can **test it**?

The remaining dependencies for this notebook should be completely specified in by the pws-csdms.yaml or any of its frozen versions found in `env/` in the repository root directroy. Installing these is described in the `README.md`

In [None]:
import os
import pathlib as pl
import platform
import shutil

import flopy
import hvplot.xarray  # noqa
import jupyter_black
import numpy as np
import pywatershed as pws
import xarray as xr
from helpers import get_mf6_nightly_build
from modflowapi import ModflowApi

jupyter_black.load()
pws.utils.gis_files.download()
pws.utils.addtl_domain_files.download()

nb_output_dir = pl.Path("./step3_mf6_api").resolve()
if not nb_output_dir.exists():
    nb_output_dir.mkdir()

pws_root = pws.constants.__pywatershed_root__
domain_dir = (pws_root.parent / "test_data/sagehen_5yr/").resolve()
mf6_domain_dir = (pws_root.parent / "test_data/sagehen_mf6").resolve()

# Get the MF6 dylibs from yesterdays nightly build

In [None]:
nightly_build_dir = nb_output_dir / "mf6_nightly_build"
nightly_build_dir.mkdir(exist_ok=True)
flopy.utils.get_modflow(
    bindir=str(nightly_build_dir), repo="modflow6-nightly-build", quiet=True
)

if platform.system == "Windows":
    mf6_dll = nightly_build_dir / "libmf6.dll"
else:
    mf6_dll = nightly_build_dir / "libmf6.dylib"
assert mf6_dll.exists()

## Set up the run
### MF6 files staging
These need copied to the run directory

In [None]:
files_dirs_to_cp = [
    "common",
    "sagehenmodel",
    "hru_weights.npz",
    "sagehen_postprocess_graphs.py",
    "prms_grid_v3-Copy1.nc",
]
rename = {
    "sagehenmodel": "run_dir",
    "sagehen_postprocess_graphs.py": "run_dir/sagehen_postprocess_graphs.py",
}
for name in files_dirs_to_cp:
    src = mf6_domain_dir / name
    if name in rename.keys():
        dst = nb_output_dir / rename[name]
    else:
        dst = nb_output_dir / name
    if not dst.exists():
        if src.is_dir():
            shutil.copytree(src, dst)
        else:
            shutil.copy(src, dst)

### pywatershed files
The pyawatershed files dont need copied, they are just used in memory. (THough they could be written out if you like).

In [None]:
control_file = domain_dir / "sagehen_no_gw_cascades.control"
control = pws.Control.load_prms(control_file)

parameter_file = domain_dir / control.options["parameter_file"]
params = pws.parameters.PrmsParameters.load(parameter_file)
params = pws.utils.preprocess_cascades.preprocess_cascade_params(
    control, params, verbosity=0
)

### Plot HRUs and the MF6 grid?

# Spatial mappings

In [None]:
weights = np.load(nb_output_dir / "hru_weights.npz")

_HRU to UZF weights_

In [None]:
print(weights["uzfw"].shape)
uzfw = weights["uzfw"]
nuzf_infilt = uzfw.shape[0]

_HRU to SFR weights_

In [None]:
print(weights["sfrw"].shape)
sfrw = weights["sfrw"]

This indicates that here are 128 HRUs, 3386 gridcells, and 201 stream reaches. 

Define a function to map HRU values to MODFLOW 6 values, which is just a dot product.

In [None]:
def hru2mf6(weights, values):
    return weights.dot(values)

#### Unit conversion factors

In [None]:
m2ft = 3.28081
in2m = 1.0 / (12.0 * m2ft)
acre2m2 = 43560.0 / (m2ft * m2ft)

In [None]:
hru_area_m2 = params.parameters["hru_area"] * acre2m2

### Run one-way coupled pywatershed and MODFLOW 6 models

#### Initialize pywatershed components

We establish and load the requisite inputs for pywawtershed. This parameter dictionary loaded from a pickle file below was adapted from a PRMS6 parameter file.

In [None]:
run_dir = nb_output_dir / "run_dir"
assert run_dir.exists()
os.chdir(run_dir)
print("changing to run directory:\n", os.getcwd())

mf6_output_dir = run_dir / "output"

# This step is *critical* for MF6 initialization
if not mf6_output_dir.exists():
    mf6_output_dir.mkdir()

In [None]:
# use the 16 year CBH files from supplementary pywatershed files
control.options["input_dir"] = (
    pws.constants.__pywatershed_root__
    / "data/pywatershed_addtl_domains/sagehen_cbh_16yr"
)
control.edit_n_time_steps(5844)  # the 16 years in the forcing files above
control.options["calc_method"] = "numba"
control.options["budget_type"] = None
control.options["netcdf_output_dir"] = nb_output_dir / "pws_model_output"
# TODO: THIS NEXT LINE SHOULD NOT NEED TO HAPPEN
# control.options["netcdf_output_var_names"] = control.options[
#    "netcdf_output_var_names"
# ].tolist()
output_var_names = [
    "hru_ppt",
    "potet",
    "hru_actet",
    "ssres_in",
    "pref_flow_infil",
    "ssr_to_gw",
    "soil_to_gw",
    "sroff",
    "ssres_flow",
    "pref_flow",
    "dunnian_flow",
    "slow_flow",
]
control.options["netcdf_output_var_names"] = output_var_names
control.options["netcdf_output_var_names"].append("hru_horton_cascflow")

#### Create arrays to save hydrology results
While pywatershed has NetCDF output available, the existing plots are setup to work with numpy arrays collected during the run and saved at the end. We follow this path for ease of reproducing the plots. This how custom output can be quite easily achieved with pywatershed.

In [None]:
ntimes = int(control.n_times)
print("Number of days to simulate {}".format(ntimes))

prms_vars = [
    "hru_ppt_out",
    "actet_out",
    "potet_out",
    "soilinfil_out",
    "runoff_out",
    "interflow_out",
]
prms_var_dict = {}
prms_var_dict["time_out"] = np.empty(ntimes, dtype="datetime64[s]")
for vv in prms_vars:
    prms_var_dict[vv] = np.zeros(
        (ntimes, hru_area_m2.shape[0]), dtype=np.float64
    )

In [None]:
prms_processes = [
    pws.PRMSSolarGeometry,
    pws.PRMSAtmosphere,
    pws.PRMSCanopy,
    pws.PRMSSnow,
    pws.PRMSRunoffCascadesNoDprst,
    pws.PRMSSoilzoneCascadesNoDprst,
]

prms_model = pws.Model(
    prms_processes,
    control=control,
    parameters=params,
)

In [None]:
# how can we tell what variables to couple to MF6?
# "SINF": "UZF-1", # surface infiltration
# "PET": "UZF-1", # potential ET
# "RUNOFF": "SFR-1"  # runoff

# In pywatershed only prognostic variables appear in the mass balance terms. There may be
# summary variables available but these are not checked in mass balance

In [None]:
pws.meta.get_vars(
    [
        *pws.PRMSRunoffCascadesNoDprst.get_mass_budget_terms()["outputs"],
        *pws.PRMSSoilzoneCascadesNoDprst.get_mass_budget_terms()["outputs"],
    ]
)

```
"RUNOFF" =  sroff + dunnian_flow + pref_flow + slow_flow
# redundancy?? ssres_flow = pref_flow + slow_flow=grav
# redundancy?? 
"SINF"[:nuzf_infilt] = (
        hru2mf6(uzfw, recharge) * in2m
    )  # ssr_to_gw + soil_to_gw
"PET"[:nuzf_infilt] = hru2mf6(uzfw, unused_pet) * in2m  # potet - actet
# need to use perv_actet ?
```

#### Initialize MODFLOW 6

The dylib/DLL Initialization requires all inputs AND the output directory to exist.

In [None]:
mf6_config_file = "mfsim.nam"
mf6 = ModflowApi(mf6_dll, working_directory=os.getcwd())
mf6.initialize(str(mf6_config_file))

Get information about the modflow model start and end times

In [None]:
current_time = mf6.get_current_time()
end_time = mf6.get_end_time()
print(
    f"MF current_time: {current_time}, prms control.start_time: {control.start_time}"
)
print(f"MF end_time: {end_time}, prms control.n_times: {control.n_times}")

#### Get pointers to MODFLOW 6 variables

As shown in figure 1, pywatershed is sending the following fluxes to MF6

* SINF is surface infiltration to MF6's UZF package, this is mapped from groundwater recharge in pywatershed
* PET is unsatisfied potential evapotransipiration in MF6's UZF, this is also tracked through soilzone in pytwatershed
* RUNOFF to MF6's SFR comes from combined surface runoff and interflow from pywatershed.

To set these values on MF6, we need the to get the pointers into MF6 using its BMI interface.

In [None]:
sim_name = "sagehenmodel"
mf6_var_model_dict = {"SINF": "UZF-1", "PET": "UZF-1", "RUNOFF": "SFR-1"}
mf6_vars = {}
for vv, mm in mf6_var_model_dict.items():
    mf6_vars[vv] = mf6.get_value_ptr(
        mf6.get_var_address(vv, sim_name.upper(), mm)
    )

for vv, dd in mf6_vars.items():
    print(f"shape of {vv}: {dd.shape}")

The UZF model has 2 vertical layers, so the number of points in UZF is twice what was shown in the spatial weights mappings above.

#### Run the models

Now we run the model. We use pywatershed to control the simulation.

In [None]:
prms_model.initialize_netcdf()
n_time_steps = control.n_times  # redundant above?
for istep in range(n_time_steps):
    prms_model.advance()

    if control.current_dowy == 1:
        if istep > 0:
            print("\n")
        print(f"Water year: {control.current_year + 1}")

    msg = f"Day of water year: {str(control.current_dowy + 1).zfill(3)}"
    print(msg, end="\r")

    # run pywatershed
    prms_model.calculate()
    prms_model.output()

    # calculate variables for output and coupling
    hru_ppt = prms_model.processes["PRMSAtmosphere"].hru_ppt.current

    potet = prms_model.processes["PRMSSoilzoneCascadesNoDprst"].potet
    actet = prms_model.processes["PRMSSoilzoneCascadesNoDprst"].hru_actet
    unused_pet = potet - actet

    # TODO: why is soil_infil not used?
    soil_infil = (
        prms_model.processes["PRMSSoilzoneCascadesNoDprst"].ssres_in
        + prms_model.processes["PRMSSoilzoneCascadesNoDprst"].pref_flow_infil
    )
    recharge = (
        prms_model.processes["PRMSSoilzoneCascadesNoDprst"].ssr_to_gw
        + prms_model.processes["PRMSSoilzoneCascadesNoDprst"].soil_to_gw
    )

    sroff = prms_model.processes["PRMSRunoffCascadesNoDprst"].sroff
    interflow = prms_model.processes["PRMSSoilzoneCascadesNoDprst"].ssres_flow
    prms_ro = (sroff + interflow) * in2m * hru_area_m2

    # save PRMS results (converted to m3/d)
    prms_var_dict["time_out"][istep] = control.current_time
    prms_var_dict["hru_ppt_out"][istep, :] = hru_ppt * in2m * hru_area_m2
    prms_var_dict["potet_out"][istep, :] = potet * in2m * hru_area_m2
    prms_var_dict["actet_out"][istep, :] = actet * in2m * hru_area_m2
    prms_var_dict["soilinfil_out"][istep, :] = soil_infil * in2m * hru_area_m2
    prms_var_dict["runoff_out"][istep, :] = sroff * in2m * hru_area_m2
    prms_var_dict["interflow_out"][istep, :] = interflow * in2m * hru_area_m2

    # Set MF6 pointers
    mf6_vars["RUNOFF"][:] = hru2mf6(sfrw, prms_ro)  # sroff + ssres_flow
    mf6_vars["SINF"][:nuzf_infilt] = (
        hru2mf6(uzfw, recharge) * in2m
    )  # ssr_to_gw + soil_to_gw
    mf6_vars["PET"][:nuzf_infilt] = (
        hru2mf6(uzfw, unused_pet) * in2m
    )  # potet - actet

    # run MODFLOW 6
    mf6.update()

try:
    mf6.finalize()
    prms_model.finalize()
    success = True
except:
    raise RuntimeError

In [None]:
# fpth = "output/pywatershed_output.npz"
# np.savez_compressed(
#     fpth,
#     time=prms_var_dict["time_out"],
#     ppt=prms_var_dict["ppt_out"],
#     potet=prms_var_dict["potet_out"],
#     actet=prms_var_dict["actet_out"],
#     infil=prms_var_dict["soilinfil_out"],
#     runoff=prms_var_dict["runoff_out"],
#     interflow=prms_var_dict["interflow_out"],
# )


pws_output_dir = nb_output_dir / "pws_output"
pws_output_dir.mkdir(exist_ok=True)
for prms_var in prms_vars:
    if prms_var == "time_out":
        continue
    var = prms_var[0:-4]
    print(var)
    ds = xr.Dataset(
        data_vars={var: (["time", "nhru"], prms_var_dict[prms_var])},
        coords={"time": prms_var_dict["time_out"].astype("datetime64[ns]")},
    )
    ds.to_netcdf(pws_output_dir / f"{var}.nc")
    del ds

#### Finalize models
Clean up. 

#### Save hydrology output
Save the variables to file that were collected into memory from pywatershed.

### Plot

We spare the user the details of the plotting. 

In [None]:
fig_out_dir = nb_output_dir / "figures"
fig_out_dir.mkdir(exist_ok=True)

run_dir = nb_output_dir / "run_dir"
os.chdir(run_dir)  # make sure

import sagehen_postprocess_graphs as graphs

In [None]:
graphs.streamflow_fig()

In [None]:
graphs.et_recharge_ppt_fig()

In [None]:
graphs.gwf_uzf_storage_changes_fig()

In [None]:
graphs.cumulative_streamflow_fig()

In [None]:
graphs.composite_fig()

## Conclusions

This one-way coupling of pywatershed and MODFLOW 6 demonstrates how software interoperatbility standards can bridge silos with out creating new "code silos". 

This demonstration also highlights how modular models provide clear process conceptualizations and easy access to variables required for cross-discipline couplings. 

In the near future, pywatershed will expand to support the reproducing GSFLOW by implementing hydrologic "cascading flows" and implementing a 2-way couling with MF6. 


## References
* Hutton, E. W., Piper, M. D., & Tucker, G. E. (2020). The Basic Model Interface 2.0: A standard interface for coupling numerical models in the geosciences. Journal of Open Source Software, 5(51), 2317.
* Hughes, J. D., Russcher, M. J., Langevin, C. D., Morway, E. D., & McDonald, R. R. (2022). The MODFLOW Application Programming Interface for simulation control and software interoperability. Environmental Modelling & Software, 148, 105257.
* Langevin, C. D., Hughes, J. D., Banta, E. R., Niswonger, R. G., Panday, S., & Provost, A. M. (2017). Documentation for the MODFLOW 6 groundwater flow model (No. 6-A55). US Geological Survey.
* Markstrom, S. L., Niswonger, R. G., Regan, R. S., Prudic, D. E., & Barlow, P. M. (2008). GSFLOW-Coupled Ground-water and Surface-water FLOW model based on the integration of the Precipitation-Runoff Modeling System (PRMS) and the Modular Ground-Water Flow Model (MODFLOW-2005). US Geological Survey techniques and methods, 6, 240.
* Regan, R. S., Markstrom, S. L., Hay, L. E., Viger, R. J., Norton, P. A., Driscoll, J. M., & LaFontaine, J. H. (2018). Description of the national hydrologic model for use with the precipitation-runoff modeling system (prms) (No. 6-B9). US Geological Survey.
* Regan, R.S., Markstrom, S.L., LaFontaine, J.H., 2022, PRMS version 5.2.1: Precipitation-Runoff Modeling System (PRMS): U.S. Geological Survey Software Release, 02/10/2022.