# The Lake Package
The Lake (LAK) package can be used to model large surface water bodies with a single, uniform stage. The lake water balance is computed, and the resulting lake stage is determined by all inflows to and outflows from the lake. Several management options are available. Users can specify inflows and outflows, redirect discharge from other packages (e.g. DRN) to the lake, impose a fixed stage, or add a weir to control maximum water levels.

In this example, the Henschotermeer lake in the Utrechtse Heuvelrug is modelled using the Lake package. For demonstration purposes, a weir is included with an invert level of 7.0 m MSL.

In [None]:
import os
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import matplotlib
import flopy
import nlmod

In [None]:
# set up pretty logging and show package versions
nlmod.util.get_color_logger()
nlmod.show_versions()

## Define model extent and model workspace

In [None]:
extent = [152_000, 156_000, 453_500, 456_000]
model_name = "henschotermeer"
model_ws = "01_lake"
figdir, cachedir = nlmod.util.get_model_dirs(model_ws)

## Download and prepare data
We download:

- the layers from the subsurface model Regis
- the surface level data from AHN5
- the layer 'waterdeel' from the Basisregistratie Grootschalige Toppgrafie (BGT)
- level areas from waterboard Vallei en Veluwe

In [None]:
regis = nlmod.read.download_regis(extent, cachedir=cachedir, cachename="regis")
ahn = nlmod.read.ahn.download_ahn5(extent, cachedir=cachedir, cachename="ahn5")
bgt = nlmod.read.bgt.download_bgt(extent, cachedir=cachedir, cachename="bgt")
bgt = bgt.set_index("identificatie")
la = nlmod.read.waterboard.download_data(
    "Vallei en Veluwe",
    "level_areas",
    extent=extent,
    cachedir=cachedir,
    cachename="la_v_en_v",
)

### Combine surface water data
We add information from AHN and level areas to BGT-data.

In [None]:
columns = ["summer_stage", "winter_stage"]
bgt = nlmod.util.gdf_intersection_join(
    la,
    bgt,
    columns=columns,
    min_total_overlap=0.0,
    desc=f"Adding {columns} to bgt-data",
)

bgt = nlmod.gwf.surface_water.add_min_ahn_to_gdf(bgt, ahn, buffer=5.0)
bgt["summer_stage"] = bgt[["summer_stage", "ahn_min"]].max(1)
bgt["winter_stage"] = bgt[["winter_stage", "ahn_min"]].max(1)
bgt["stage"] = bgt[["summer_stage", "winter_stage"]].mean(1)

bgt["lake"] = None
henschotermeer = bgt.area.idxmax()  # 'W0662.ce4627405cb242ee8533946265644bb3'
bgt.loc[henschotermeer, "lake"] = "Henschotermeer"

## Generate a model dataset
We generate a model dataset with the same resolution as REGIS (100 x 100 m) and refine the model with level 2 (to 25x25 m) around the Henschotermeer.

In [None]:
ds = nlmod.to_model_ds(regis, model_name=model_name, model_ws=model_ws)

# drop layers below "PZWAz3"
ds = ds.sel(layer=ds.layer.loc[:"PZWAz3"])

# refine around the edge of the lakes
bgt_lake_boundary = bgt.loc[~bgt["lake"].isna()].copy()
bgt_lake_boundary.geometry = bgt_lake_boundary.boundary
ds = nlmod.grid.refine(ds, refinement_features=[(bgt_lake_boundary, 2)])

## Split surface water shapes by grid
We split the surface water shapes by the modelgrid using the method `nlmod.grid.gdf_to_grid`. We then seperate the resulting features to a variable `lak_grid` for the LAK-package and a variable `drn_grid` for the DRN-pacakge.

In [None]:
bgt_grid = nlmod.grid.gdf_to_grid(bgt, ds).set_index("cellid")
mask = bgt_grid["lake"].isna()
drn_grid = bgt_grid[mask].copy()
lak_grid = bgt_grid[~mask].copy()
assert not lak_grid.index.duplicated().any()
# only keep cells where ahlf of the are of the cell is covered by lakes
mask = lak_grid.area > 0.5 * ds["area"].sel(icell2d=lak_grid.index)
lak_grid = lak_grid[mask]
# set the geometry to the entire cell
gi = flopy.utils.GridIntersect(nlmod.grid.modelgrid_from_ds(ds), method="vertex")
lak_grid.geometry = gi.geoms[lak_grid.index]

# remove drains that overlap with the lake
drn_grid = drn_grid.loc[~drn_grid.index.isin(lak_grid.index)]

## Improve model dataset
### Set the top of the model
The lake’s bottom elevation is defined by the top of the model. For this reason, we set the top of the model to 3.0 m MSL at the lake cells, which corresponds to the estimated lake bottom. The bottom elevation is particularly important in areas where the lake may dry out, which can also lead to convergence issues in the model. In this case, a bottom elevation of 3.0 m MSL is sufficiently deep, ensuring that the lake remains saturated throughout the transient simulation.

In [None]:
top = nlmod.resample.structured_da_to_ds(nlmod.resample.fillnan_da(ahn), ds)
# The botom of the lake is at 3.0 m NAP
top[lak_grid.index] = 3.0
ds = nlmod.layers.set_model_top(ds, top)

### Set time dimension
Before downloading meteorological data, we define the time dimension of our model dataset. The simulation covers five years, from the beginning of 2020 to the beginning of 2025, using monthly stress periods. The model is initialized with a steady-state stress period representing the mean meteorological conditions of 2019.

In [None]:
time = pd.date_range("2020", "2025", freq="MS")
ds = nlmod.time.set_ds_time(ds, start="2019", time=time)

### Download recharge data
We use the method `nlmod.read.knmi.get_recharge` to download meteorological data. Instead of calculating recharge as precipitation minus evaporation, we retrieve both variables separately by setting `method="separate"`. Since our model area is relatively small, we represent it using a single meteorological and precipitation station (`most_common_station=True`). The variables recharge and evaporation are defined per station rather than for each model cell, which is achieved with `add_stn_dimensions=True`. Finally, to adopt the new default and suppress warnings about hourly precision, we set `hourly_precision=True`.

In [None]:
rch_ds = nlmod.read.knmi.get_recharge(
    ds=ds,
    method="separate",
    most_common_station=True,
    add_stn_dimensions=True,
    hourly_precision=True,
    cachedir=cachedir,
    cachename="knmi",
)
ds.update(rch_ds)

ds["starting_head"] = xr.full_like(ds.botm, 5.0)

## Generate and run model
### Generate a FloPy sim and gwf 

In [None]:
sim = nlmod.sim.sim(ds)  # simulation
tdis = nlmod.sim.tdis(ds, sim)  # time discretization
ims = nlmod.sim.ims(
    sim,
    complexity="COMPLEX",
    inner_dvclose=0.01, # needed so lake balance is correct
    outer_dvclose=0.01, # needed so lake balance is correct
    #rcloserecord=[0.01, "STRICT"],
)  # ims solver
gwf = nlmod.gwf.gwf(ds, sim, under_relaxation=True)  # groundwater flow model
dis = nlmod.gwf.dis(ds, gwf)  # spatial discretization
npf = nlmod.gwf.npf(ds, gwf)  # node property flow
sto = nlmod.gwf.sto(ds, gwf)  # storage
ic = nlmod.gwf.ic(ds, gwf)  # initial conditions
oc = nlmod.gwf.oc(ds, gwf)  # output control

### Meteorological stresses

In [None]:
rch = nlmod.gwf.rch(ds, gwf)
evt = nlmod.gwf.evt(ds, gwf)

### Lake package for the Henschotermeer

In [None]:
# %% 
lak_resistance = 10.0  # days
lak_grid["clake"] = lak_resistance
lak_grid["strt"] = 6.0
lak_grid["lakeout"] = -1
lak_grid["outlet_invert"] = 7.0
nlmod.gwf.lake_from_gdf(
    gwf,
    lak_grid,
    ds,
    rainfall=ds["recharge"].to_pandas().iloc[:, 0],
    evaporation=ds["evaporation"].to_pandas().iloc[:, 0] * 1.25,
    boundname_column="lake",
    #maximum_stage_change=0.000001,
    #maximum_iterations=200,
);

### Drain package for all other surface water

In [None]:
drn_resistance = 1.0  # days
drn_grid["cond"] = drn_grid.area / drn_resistance
spd = nlmod.gwf.surface_water.build_spd(drn_grid, "DRN", ds)
drn = flopy.mf6.ModflowGwfdrn(gwf, stress_period_data={0: spd})

### Write input-files and run Modflow 6

In [None]:
nlmod.sim.write_and_run(sim, ds)

## Post-processing
### Plot the head in the top layer
We plot the time-averaged head in the first model layer, `BXz2`. This layer is absent in the western part of the model domain, which is why this area appears white in the figure. Surface water features (`bgt`) are plotted on top of the map for reference. The finer model cells along the shoreline of Lake Henschotermeer are clearly visible.

In [None]:
head = nlmod.gwf.get_heads_da(ds)
ax = nlmod.plot.map_array(head.sel(layer="BXz2").mean("time"), ds=ds)
bgt.plot(edgecolor='k', facecolor='none', ax=ax);

### Plot the Lake stage
If the parameter `boundname_column` is specified in `nlmod.gwf.lake_from_gdf`, a CSV file named `lak_STAGE.csv` is generated containing observations of lake stages. Below, we plot the lake stage. Due to the wet conditions during the winter of 2023/2024, the lake level rises to 7.0 m MSL. At this elevation, the weir becomes active, preventing any further significant increase in lake stage.

In [None]:
lak_stage = pd.read_csv(os.path.join(ds.model_ws, "lak_STAGE.csv"), index_col=0)
lak_stage.index = pd.to_datetime(ds.time.start) + pd.to_timedelta(lak_stage.index, "d")
lak_stage.columns = [x.capitalize() for x in lak_stage.columns]

f, ax = plt.subplots(figsize=(10, 6), layout="constrained")
lak_stage.plot(ax=ax)
ax.set_xlabel("")
ax.set_ylabel("lake stage (m MSL)")
ax.grid()

### Plot a cross-section
We plot a west–east cross-section at y = 454 612, showing the model layers. We can see the Utrechtse Heuvelrug on the left (west), with decreasing surface level eastwards. The lake "Henschotermeer" is displayed in blue, using the mean lake stage over the simulation period.

In [None]:
f, ax = plt.subplots(figsize=(10, 6), layout="constrained")
line = [(extent[0], 454_612), (extent[1], 454_612)]
dcs = nlmod.plot.DatasetCrossSection(ds, line, ax=ax, zmin=-100)
dcs.plot_layers(colors=nlmod.read.regis.get_legend())
dcs.plot_grid(vertical=False, lw=0.5)
dcs.label_layers()

lak_gdf = lak_grid.dissolve("lake")


def plot_lake_in_cs(linestring, dcs, height, ax, color="C0", zorder=0):
    xy = (dcs.line.project(linestring.boundary.geoms[0]), dcs.zmin)
    width = dcs.line.project(linestring.boundary.geoms[1]) - xy[0]
    rect = matplotlib.patches.Rectangle(
        xy, width, height, color=color, zorder=zorder
    )
    ax.add_patch(rect)

# plot the lake
for lake in bgt.loc[~bgt.lake.isna(), "lake"].unique():
    mean_lake_stage = lak_stage[lake].mean()
    ylim = ax.get_ylim()
    height = mean_lake_stage - ylim[0]
    mask = bgt.lake == lake
    for geom in lak_gdf.loc[[lake]].intersection(dcs.line).geometry.values:
        if geom.geom_type == "MultiLineString":
            for geom_line in geom.geoms:
                plot_lake_in_cs(geom_line, dcs, height, ax)
        else:
            plot_lake_in_cs(geom, dcs, height, ax)


axes_bounds = nlmod.plot.get_inset_map_bounds(ax, extent, height=0.3)
mapax = nlmod.plot.inset_map(ax, extent, axes_bounds=axes_bounds)
nlmod.plot.add_xsec_line_and_labels(line, ax, mapax)
ax.set_xlabel("x (length along cross-section, m)")
ax.set_ylabel("z (m MSL)");

### Plot a water balance of the lake
The main advantage of using the LAK package is that the lake water balance is explicitly accounted for. This water balance is saved in the file `lak.bgt`. Below, we read this file and plot the results. The primary fluxes into and out of the lake are rainfall (blue) and evaporation (orange), which are model inputs. The calculated flux between the lake and the groundwater (GWF, brown) shows a substantial contribution of groundwater flow to the lake during the winter of 2023/2024. This inflow, and rainfall, causes the lake stage to rise, resulting in a negative storage term (grey). From the winter of 2023/2024 onward, the weir becomes active, causing external outflow of water (red).

In [None]:
fname = os.path.join(ds.model_ws, "lak.bgt")
cbf_lak = nlmod.gwf.output.get_cellbudgetfile(fname=fname, ds=ds)

# read budgets
lake_bgt = {}
for rn in cbf_lak.get_unique_record_names():
    key = rn.strip().decode()
    lake_bgt[key] = cbf_lak.get_data(text=rn)

colors = {
    "GWF": "tab:brown",
    "RAINFALL": "tab:blue",
    "CONSTANT": "tab:purple",
    "EVAPORATION": "tab:orange",
    "STORAGE": "tab:gray",
    "FROM-MVR": "tab:green",
    "EXT-OUTFLOW": "tab:red",
}


for name in lak_grid["lake"].unique():
    lakeno = int(lak_grid.loc[lak_grid["lake"] == name, "lakeno"].iloc[0])
    # generate a DataFrame
    data = {}
    for key in lake_bgt.keys():
        data[key] = [x[x["node"] == lakeno + 1]["q"].sum() for x in lake_bgt[key]]

    index = pd.to_datetime(ds.time.start) + pd.to_timedelta(cbf_lak.get_times(), "d")
    df = pd.DataFrame(data, index=index)

    f, ax = plt.subplots(figsize=(10, 6), layout="constrained")

    dfs = df.copy()
    dfs.index = [pd.to_datetime(ds.time.start)] + list(df.index[:-1])
    df_block = pd.concat((df, dfs)).sort_index(kind="mergesort")

    inflow = df_block.where(df_block > 0, 0.0)
    inflow = inflow.loc[:, ~(inflow == 0).all()]

    outflow = df_block.where(df_block < 0, 0.0)
    outflow = outflow.loc[:, ~(outflow == 0).all()]

    ymax = max(inflow.sum(1).max(), outflow.sum(1).max())

    inflow.plot.area(color=colors, ax=ax, linewidth=0)
    handles_in, labels_in = ax.get_legend_handles_labels()
    ax.set_ylim(-ymax, ymax)
    outflow.plot.area(color=colors, ax=ax, linewidth=0)
    ax.set_ylim(-ymax, ymax)

    ax.set_xlim(ds.time.data[0], ds.time.data[-1])
    ax.set_ylabel("Outflow (negative) and inflow (positive), in m3/d")
    nlmod.plot.title_inside(name, ax=ax)

    # remove double legend entries
    handles_all, labels_all = ax.get_legend_handles_labels()

    handles = handles_in[::-1]
    labels = labels_in[::-1]

    # move storage to the middle of the legend
    index = labels.index("STORAGE")
    handles.append(handles.pop(index))
    labels.append(labels.pop(index))
    for handle, label in zip(handles_all, labels_all):
        if label not in labels:
            handles.append(handle)
            labels.append(label)

    ax.legend(handles, labels, loc=2)
    ax.grid()