# Ensemble Groundwater Predictions (EGP)
*R.A. Collenteur, Eawag, 2025*

This notebook shows how to use Pastas and meteorological ensemble forecasts to generate ensemble groundwater predictions (EGPs). The goal is to forecast the groundwater levels for a well in Switzerland (Gossau), one month ahead. Meteorological ensemble forecasts of the ECMWF with 51 members are used as data input, and represent the uncertainty in the input data. The Pastas model is calibrated on 10 years prior to the start of the forecast, using meteorological data provided by MeteoSwiss. In this example, meteorological forecasts are used, but it is straightforward to extend this to other input data such as ensembles of pumping forecasts.

<div class="alert alert-info">
<b>Note:</b>
Collenteur et al. (In Preparation) Ensemble groundwater predictions (EGP) in alluvial aquifers in Switzerland.
</div>

## 0. Import Python Packages

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

import pastas as ps

ps.set_log_level("ERROR")
ps.show_versions()

## 1. Load data



In [None]:
head = pd.read_csv("data_forecast/heads.csv", index_col=0, parse_dates=True).squeeze()
prec = pd.read_csv("data_forecast/prec.csv", index_col=0, parse_dates=True).squeeze()
evap = pd.read_csv("data_forecast/evap.csv", index_col=0, parse_dates=True).squeeze()
temp = pd.read_csv("data_forecast/temp.csv", index_col=0, parse_dates=True).squeeze()

ps.plots.series(
    head,
    [prec, evap, temp],
    tmin="2004",
    tmax="2014",
    titles=False,
    labels=["Head\n[m]", "Prec.\n[mm/d]", "Evap.\n[mm/d]", "Temp.\n[C]"],
    table=True,
)
plt.tight_layout()

## Make Pastas Model

We now make a Pastas model to simulate the heads for this monitoring well in Gossau. Only meterological data, which is also available as forecast data, is used to model the groundwater levels. A nonlinear recharge model including a snow module is applied to compute the recharge. The model is calibrated on weekly groundwater level data in the period 2004-2014.

In [None]:
ml = ps.Model(head)
ml.add_stressmodel(
    ps.RechargeModel(
        prec,
        evap,
        rfunc=ps.Gamma(),
        recharge=ps.rch.FlexModel(snow=True),
        temp=temp,
        name="rch",
    )
)

ml.set_parameter("rch_tt", vary=False)
ml.solve(tmin="2004", tmax="2014", report=True, fit_constant=False, freq_obs="7D")
ml.add_noisemodel(ps.ArNoiseModel())
ml.set_parameter("rch_srmax", vary=False)
ml.solve(tmin="2004", tmax="2014", initial=False, fit_constant=False, freq_obs="7D")

### Visualize the model results

In [None]:
ml.plots.results();

## Prepare the forecast ensembles

Now that we have a calibrated Pastas model, we can prepare the forecast data used to generate the groundwater ensemble predictions. The forecast data should be prepared carefully as a dictionary. For each stressmodel, one item in the dictionary should be provided where the `key` is the stressmodel name (i.e., same as in `ml.stressmodels`) and the `value` a list of `pandas.DataFrames`. Each `DataFrame` should have the same `DateimeIndex` denoting the days to generate the forecast for. It should also have the same number of columns, where each column represents an ensemble member. All these properties are internally checked.

In [None]:
fc = {
    "rch": [
        pd.read_csv("data_forecast/ensemble_prec.csv", index_col=0, parse_dates=True),
        pd.read_csv("data_forecast/ensemble_evap.csv", index_col=0, parse_dates=True),
        pd.read_csv("data_forecast/ensemble_temp.csv", index_col=0, parse_dates=True),
    ]
}

fig, axes = plt.subplots(1, 3, figsize=(10, 3))
names = ["Cum. Precipitation", "Cum. Evaporation", "Temperature"]
for i in range(3):
    series = fc["rch"][i]
    if i < 2:
        series.cumsum().plot(legend=False, ax=axes[i], color="k", alpha=0.7)
    else:
        series.plot(legend=False, ax=axes[i], color="k", alpha=0.7)

    axes[i].set_title(names[i])

## Compute GWL forecasts

In [None]:
# Draw parameter sets
params = ml.solver.get_parameter_sample(n=10)

# Generate the forecast
df = ps.forecast(ml, fc, params=params, post_process=True)

df.head()

The returned variable `df` is a DataFrame containing the ensemble groundwater predictions. The columns of `df` are a `MultiIndex` with the first row the ensemble member (i.e., a single meteorological forecast), the second row the `n` parameter sets, and the third row three columns with the mean prediction and the lower and upper boundary of the prediction interval. 

### Plot the forecast results

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(10, 5))

ml.oseries.series.loc[df.index].plot(
    ax=ax, marker=".", color="k", linestyle="None", zorder=100
)

plt.fill_between(
    df.index,
    df.loc[:, (slice(None), slice(None), "lower_bound")].min(axis=1),
    df.loc[:, (slice(None), slice(None), "upper_bound")].max(axis=1),
    color="gray",
    alpha=0.5,
)

df.loc[:, (slice(None), slice(None), "mean")].plot(color="gray", legend=False, ax=ax)


plt.legend(
    ["Observations", "95% Prediction interval", "Ensemble Members"], loc="upper left"
)

plt.xlabel("Time")
plt.ylabel("Head [m]");