# Inputs for the Probable Maximum flood (PMF)

# Probable Maximum Precipitation (PMP)

Probable Maximum Precipitation (PMP) is the theoretical maximum amount of precipitation that could occur at a specific location within a given period of time, considering the most extreme meteorological conditions. PMP is a critical parameter in hydrology, especially for the design of infrastructure such as dams, reservoirs, and drainage systems.

There are several methods for calculating PMP, each varying in complexity and the type of data used. The method currently implemented in `xHydro` is based on the approach outlined by [Clavet-Gaumont et al. (2017)](https://doi.org/10.1016/j.ejrh.2017.07.003). This method involves maximizing the precipitable water over a given location, which refers to the total water vapor in the atmosphere that could potentially be converted into precipitation under ideal conditions. By maximizing this value, the method estimates the maximum precipitation that could theoretically occur at the location.


In [None]:
from pathlib import Path

import hvplot.xarray
import matplotlib.pyplot as plt
import numpy as np
import pooch
import xarray as xr
import xclim

import xhydro as xh
from xhydro.testing.helpers import deveraux

## Acquiring data

The acquisition of climatological data is outside the scope of `xHydro`. However, some examples of how to obtain and handle such data are provided in the [GIS operations](gis.ipynb) and [Use Case Example](use_case.ipynb) notebooks. For this notebook, we will use a test dataset consisting of 2 years and 3x3 grid cells from CanESM5 climate model data. In a real application, it would be preferable to have as many years of data as possible.

To perform the analysis, certain climatological variables are required.

- **Daily Timestep Variables**:
    - `pr` → Precipitation flux
    - `snw` → Snow water equivalent
    - `hus` → Specific humidity for multiple pressure levels
    - `zg` → Geopotential height for multiple pressure levels

- **Fixed Field Variables**:
    - `orog` → Surface altitude

In cold regions, it may be necessary to split total precipitation into rainfall and snowfall components. Many climate models already provide this data separately. However, if this data is not directly available, libraries such as `xclim` can approximate the split using precipitation and temperature data.

In [None]:
from pathlib import Path

import xhydro as xh

path_day_zip = deveraux().fetch(
    "pmp/CMIP.CCCma.CanESM5.historical.r1i1p1f1.day.gn.zarr.zip",
    pooch.Unzip(),
)
ds_day = xr.open_zarr(Path(path_day_zip[0]).parents[0])

path_fx_zip = deveraux().fetch(
    "pmp/CMIP.CCCma.CanESM5.historical.r1i1p1f1.fx.gn.zarr.zip",
    pooch.Unzip(),
)
ds_fx = xr.open_zarr(Path(path_fx_zip[0]).parents[0])

# There are a few issues with attributes in this dataset that we need to address
ds_day["pr"].attrs = {"units": "mm", "long_name": "precipitation"}
ds_day["prsn"].attrs = {"units": "mm", "long_name": "snowfall"}
ds_day["rf"].attrs = {"units": "mm", "long_name": "rainfall"}

# Combine both datasets
ds = ds_day.convert_calendar("standard")
ds["orog"] = ds_fx["orog"]
ds

## Computing the PMP

The method outlined by [Clavet-Gaumont et al. (2017)](https://doi.org/10.1016/j.ejrh.2017.07.003) follows these steps:

1. **Identification of Major Precipitation Events**:  
   The first step involves identifying the major precipitation events that will be maximized. This is done by filtering events based on a specified threshold.

2. **Computation of Monthly 100-Year Precipitable Water**:  
   The next step involves calculating the 100-year precipitable water on a monthly basis using the Generalized Extreme Value (GEV) distribution, with a maximum cap of 20% greater than the largest observed value.

3. **Maximization of Precipitation During Events**:  
   In this step, the precipitation events are maximized based on the ratio between the 100-year monthly precipitable water and the precipitable water during the major precipitation events. In snow-free regions, this is the final result.

4. **Seasonal Separation in Cold Regions**:  
   In cold regions, the results are separated into seasons (e.g., spring, summer) to account for snow during the computation of Probable Maximum Floods (PMF).

This method provides a comprehensive approach for estimating the PMP, taking into account both temperature and precipitation variations across different regions and seasons.


### Major precipitation events

The first step in calculating the Probable Maximum Precipitation (PMP) involves filtering the precipitation data to retain only the events that exceed a certain threshold. These major precipitation events will be maximized in subsequent steps. The function `xh.indicators.pmp.major_precipitation_events` can be used for this purpose. It also provides the option to sum precipitation over a specified number of days, which can help aggregate storm events. For 2D data, such as in this example, each grid point is treated independently.

In this example, we will filter out the 10% most intense storms to avoid overemphasizing smaller precipitation events during the maximization process. Additionally, we will focus on rainfall (`rf`) rather than total precipitation (`pr`) to exclude snowstorms and ensure that we are only considering liquid precipitation events.


In [None]:
help(xh.indicators.pmp.major_precipitation_events)

In [None]:
precipitation_events = xh.indicators.pmp.major_precipitation_events(
    ds.rf, windows=[1], quantile=0.9
)

ds.rf.isel(x=1, y=1).hvplot() * precipitation_events.isel(
    x=1, y=1, window=0
).hvplot.scatter(color="red")

### Daily precipitable water

<div class="alert alert-warning"> <b>WARNING</b>
    
This step should be avoided if possible, as it involves approximating precipitable water from the integral of specific humidity and will be highly sensitive to the number of pressure levels used. If available, users are strongly encouraged to use a variable or combination of variables that directly represent precipitable water.

</div>

Precipitable water can be estimated using `xhydro.indicators.pmp.precipitable_water` by integrating the vertical column of humidity. This process requires specific humidity, geopotential height, and elevation data. The resulting value represents the total amount of water vapor that could potentially be precipitated from the atmosphere under ideal conditions.


In [None]:
help(xh.indicators.pmp.precipitable_water)

In [None]:
pw = xh.indicators.pmp.precipitable_water(
    hus=ds.hus,
    zg=ds.zg,
    orog=ds.orog,
    windows=[1],
    add_pre_lay=False,
)

pw.isel(x=1, y=1, window=0).hvplot()

### Monthly 100-year precipitable water

According to Clavet-Gaumont et al. (2017), a monthly 100-year precipitable water must be computed using the Generalized Extreme Value (GEV) distribution. The value should be limited to a maximum of 20% greater than the largest observed precipitable water value for a given month. This approach ensures that the estimated 100-year event is realistic and constrained by observed data.

To compute this, you can use the `xh.indicators.pmp.precipitable_water_100y` function. If using `rebuild_time`, the output will have the same time axis as the original data.


In [None]:
help(xh.indicators.pmp.precipitable_water_100y)

In [None]:
pw100 = xh.indicators.pmp.precipitable_water_100y(
    pw.sel(window=1).chunk(dict(time=-1)),
    dist="genextreme",
    method="ML",
    mf=0.2,
    rebuild_time=True,
).compute()

pw.isel(x=1, y=1, window=0).hvplot() * pw100.isel(x=1, y=1).hvplot()

In [None]:
x = np.array([2, 5, 6, 7, 7])
y = [5, 7]

np.isin(x, y)

### Maximized precipitation

<div class="alert alert-info"> <b>INFO</b>
    
This step follows the methodology described in Clavet-Gaumont et al., 2017. It is referred to as "Maximizing precipitation", however, it effectively applies a ratio based on the monthly 100-year precipitable water. If a historical event surpassed this value—such as the case observed for January 2011—the result may actually lower the precipitation, rather than increasing it.

</div>

With the information gathered so far, we can now proceed to maximize the precipitation events. Although `xHydro` does not provide an explicit function for this step, it can be accomplished by following these steps:

1. **Compute the Ratio**: First, calculate the ratio between the 100-year monthly precipitable water and the precipitable water during the major precipitation events.
   
2. **Apply the Ratio**: Next, apply this ratio to the precipitation values themselves to maximize the precipitation events accordingly.

This process effectively scales the precipitation events based on the 100-year precipitable water, giving an estimate of the maximum possible rainfall.


In [None]:
# Precipitable water on the day of the major precipitation events.
pw_events = pw.where(precipitation_events > 0)
ratio = pw100 / pw_events

# Apply the ratio onto precipitation itself
precipitation_max = ratio * precipitation_events
precipitation_max.name = "maximized_precipitation"

ds.rf.isel(x=1, y=1).hvplot() * precipitation_max.isel(
    x=1, y=1, window=0
).hvplot.scatter(color="red")

### Seasonal Mask

In cold regions, computing Probable Maximum Floods (PMFs) often involves scenarios that combine both rainfall and snowpack. Therefore, PMP values may need to be separated into two categories: rain-on-snow (i.e., "spring") and snow-free rainfall (i.e., "summer").

This can be computed easily using `xhydro.indicators.pmp.compute_spring_and_summer_mask`, which defines the start and end dates of spring, summer, and winter based on the presence of snow on the ground, with the following criteria:

1. **Winter**:  
   - Winter start: The first day after which there are at least 14 consecutive days with snow on the ground.  
   - Winter end: The last day with snow on the ground, followed by at least 45 consecutive snow-free days.

2. **Spring**:  
   - Spring start: 60 days before the end of winter.
   - Spring end: 30 days after the end of winter.

3. **Summer**:  
   - The summer period is defined as the time between winters. This period is not influenced by whether it falls in the traditional summer or fall seasons, but rather simply marks the interval between snow seasons.


In [None]:
help(xh.indicators.pmp.compute_spring_and_summer_mask)

In [None]:
mask = xh.indicators.pmp.compute_spring_and_summer_mask(
    ds.snw,
    thresh="1 cm",
    window_wint_end=14,  # Since the dataset used does not have a lot of snow, we need to be more lenient
    freq="YS-SEP",
)

mask

In [None]:
xclim.core.units.convert_units_to(
    ds.isel(x=1, y=1).snw, "cm", context="hydro"
).hvplot() * (mask.mask_spring.isel(x=1, y=1) * 10).hvplot() * (
    mask.mask_summer.isel(x=1, y=1) * 8
).hvplot()

### Final PMP

The final PMP is obtained by finding the maximum value over the `time` dimension. In our case, since we computed a season mask, we can further refine the results into a spring and summer PMP.

In [None]:
pmp_spring = (precipitation_max * mask.mask_spring).max("time").compute()
pmp_summer = (precipitation_max * mask.mask_summer).max("time").compute()

In [None]:
plt.subplots(1, 2, figsize=[12, 5])

ax = plt.subplot(1, 2, 1)
pmp_spring.sel(window=1).plot(vmin=30, vmax=100)
plt.title("Spring PMP")

ax = plt.subplot(1, 2, 2)
pmp_summer.sel(window=1).plot(vmin=30, vmax=100)
plt.title("Summer PMP")

## PMPs with aggregated storm configurations

In some cases, it may be preferable to avoid processing each grid cell independently. Instead, storms can be aggregated using various configurations to provide a more regionally representative estimate. These configurations allow for the spatial averaging of storm events, which can help reduce variability across grid cells and yield more reliable results.

Different aggregation configurations are discussed in Clavet-Gaumont et al. (2017) and have been implemented in `xHydro` under the function `xhydro.indicators.pmp.spatial_average_storm_configurations`.

Note that precipitable water must first be calculated in a distributed manner and then spatially averaged to obtain the aggregated precipitable water.


In [None]:
help(xh.indicators.pmp.spatial_average_storm_configurations)

In [None]:
ds_agg = []
for variable in ["rf", "pw", "snw"]:
    if variable == "pw":
        ds_agg.extend(
            [xh.indicators.pmp.spatial_average_storm_configurations(pw, radius=3)]
        )
    else:
        ds_agg.extend(
            [
                xh.indicators.pmp.spatial_average_storm_configurations(
                    ds[variable], radius=3
                )
            ]
        )
ds_agg = xr.merge(ds_agg).chunk(dict(time=-1))

# The aggreagtion creates NaN values for snow, so we'll restrict the domain
ds_agg = ds_agg.isel(y=slice(0, -1), x=slice(0, -1))

ds_agg

After applying storm aggregation, the subsequent steps remain the same as before, following the standard PMP calculation process outlined earlier.

In [None]:
pe_agg = xh.indicators.pmp.major_precipitation_events(
    ds_agg.rf, windows=[1], quantile=0.9
)

pw100_agg = xh.indicators.pmp.precipitable_water_100y(
    ds_agg.sel(window=1).precipitable_water, dist="genextreme", method="ML", mf=0.2
)

# Maximization
pw_events_agg = ds_agg.precipitable_water.where(pe_agg > 0)
r_agg = pw100_agg / pw_events_agg

pmax_agg = r_agg * pe_agg

# Season mask
mask_agg = xh.indicators.pmp.compute_spring_and_summer_mask(
    ds_agg.snw,
    thresh="1 cm",
    window_wint_start=14,
    window_wint_end=14,
    spr_start=60,
    spr_end=30,
    freq="YS-SEP",
)

pmp_spring_agg = pmax_agg * mask_agg.mask_spring
pmp_summer_agg = pmax_agg * mask_agg.mask_summer

pmp_summer_agg

Previously, the final PMP for each season was obtained by taking the maximum value over the `time` dimension. In this updated approach, we can now take the maximum across both the `time` and `conf` dimensions, using our multiple storm configurations.


In [None]:
# Final results
pmp_spring_agg = pmp_spring_agg.max(dim=["time", "conf"])
pmp_summer_agg = pmp_summer_agg.max(dim=["time", "conf"])

pmp_summer_agg

In [None]:
# Compare results for the central grid cell
print(
    f"Grid-cell summer PMP: {np.round(pmp_summer.isel(x=1, y=1, window=0).values, 1)} mm"
)
print(
    f"Aggregated summer PMP: {np.round(pmp_summer_agg.isel(x=1, y=1, window=0).values, 1)} mm"
)

## Probable Maximum Snow Accumulation (PMSA)

The PMSA represents the theoretical maximum snow water equivalent (SWE) that could accumulate in a year under the most extreme meteorological conditions. The method currently implemented in `xHydro` follows the approach introduced by [Klein et al. (2016)](https://doi.org/10.1016/j.jhydrol.2016.03.031). In this study, the authors developed a PMSA estimation technique using simulated data from regional climate models to maximize the precipitable water leading to snowfall.


The method outlined by [Klein et al. (2016)](https://doi.org/10.1016/j.jhydrol.2016.03.031) follows these steps:

1. **Identification of the Precipitable Water leading to Snowfall**:  
   Precipitable water data are filtered using specified thresholds for rainfall and snowfall to isolate values associated with snowfall events.

2. **Computation of Monthly 100-Year Precipitable Water leading to Snowfall**:  
   Monthly 100-year precipitable water values associated with snowfall are estimated using the Gamma distribution.

3. **Maximization of Snowfall Events**:  
   Snowfall events are maximized based on the ratio between the 100-year monthly precipitable water leading to snowfall and the precipitable water observed during snowfall events.

4. **PMSA estimation**:  
   For each winter, both maximized and non-maximized snowfall events are aggregated. The PMSA is then defined as the highest total among all winters.



### Identification of the Precipitable Water leading to Snowfall

[Klein et al. (2016)](https://doi.org/10.1016/j.jhydrol.2016.03.031) proposed that only precipitable water values from time steps with at least 0.25 mm/6h of solid precipitation (expressed as water equivalent) and less than 0.1 mm/6h of rain should be considered. Since in this example we are working with daily time steps, these thresholds were adjusted to 1 mm/day and 0.4 mm/day, respectively.

In [None]:
prsn_threshold = "1 mm"
prra_threshold = "0.4 mm"

In some cases, both snowfall and rainfall occur during the same time step. However, it is typically not possible to distinguish between them. To avoid including precipitable water values associated with rainfall events, the authors proposed the following three methods:

   - **m1**: Only snowfall events that are not accompanied by rain, are considered for maximization. For the precipitable water corresponding to a certain event, we consider only time steps that show at least the minimum amount of snowfall and less than the minimum amount of rain.
   - **m2**:   Snowfall events that exceed the established threshold for maximization (i.e., minimum amount of snowfall) are maximized, regardless whether they are accompanied by rain or not. For the precipitable water corresponding to a certain event, we consider only time steps that show at least the minimum amount snowfall.
   - **m3**:   Same events as those selected for M2, but if additionally more than the minimum amount of rain occurs, the precipitable water is multiplied by the ratio of snowfall [mm water equivalent] over total precipitation. The multiplication provides a means to estimate the amount of precipitable water that leads to snowfall.

We start by calculating the precipitation events and their rain and snow components for a given accumulation. 

<div class="alert alert-info"> <b>INFO</b>
    
To keep the example simple, the duration of the precipitation event is set to one time step (windows=[1]). When computing the PMSA, users should pay close attention to the window parameter as it can significantly influences the PMSA results. [Klein et al. (2016)](https://doi.org/10.1016/j.jhydrol.2016.03.031)  suggests to use an algorithm that actually determines the real duration of every simulated snowfall rather than using predefined durations. 

In [None]:
precipitation_events = xh.indicators.pmp.major_precipitation_events(ds.pr, windows=[1])
prra_events = xh.indicators.pmp.major_precipitation_events(ds.rf, windows=[1])
prsn_events = xh.indicators.pmp.major_precipitation_events(ds.prsn, windows=[1])

In [None]:
pw_snowfall_m1 = xh.indicators.pmp.pw_snowfall(
    pw,
    method="m1",
    prsn_events=prsn_events,
    prsn_threshold=prsn_threshold,
    prra_events=prra_events,
    prra_threshold=prra_threshold,
)
pw_snowfall_m2 = xh.indicators.pmp.pw_snowfall(
    pw, method="m2", prsn_events=prsn_events, prsn_threshold=prsn_threshold
)
pw_snowfall_m3 = xh.indicators.pmp.pw_snowfall(
    pw,
    method="m3",
    prsn_events=prsn_events,
    prsn_threshold=prsn_threshold,
    prra_events=prra_events,
    prra_threshold=prra_threshold,
    pr_events=precipitation_events,
)

pw.isel(x=1, y=1, window=0).hvplot(label="pw") * pw_snowfall_m1.isel(
    x=1, y=1, window=0
).hvplot.scatter(color="purple", marker="o", size=75, label="M1") * pw_snowfall_m2.isel(
    x=1, y=1, window=0
).hvplot.scatter(
    color="green", marker="v", size=50, label="M2"
) * pw_snowfall_m3.isel(
    x=1, y=1, window=0
).hvplot.scatter(
    color="red", marker="*", label="M3"
)

As suggested by [Klein et al. (2016)](https://doi.org/10.1016/j.jhydrol.2016.03.031), in this example we used the methodology M3.

### Computation of Monthly 100-Year Precipitable Water leading to Snowfall (pw<sub>100</sub>)

[Klein et al. (2016)](https://doi.org/10.1016/j.jhydrol.2016.03.031) recommend fitting a non-stationary Gamma distribution. Currently, only the stationary Gamma distribution is implemented, but a non-stationary version is planned for a future release. The GEV distribution can also be used for extrapolation purposes. A non-stationnary version of it is available in the xhydro package.

[Klein et al. (2016)](https://doi.org/10.1016/j.jhydrol.2016.03.031) also suggests using the maximum observed precipitable water as pw<sub>100</sub> when there are fewer than 20 data points in a month (i.e., `n=20`). Furthermore, in line with [Klein et al. (2016)](https://doi.org/10.1016/j.jhydrol.2016.03.031) methodology, pw<sub>100</sub> is not limited (i.e., `mf=None`). However, this assumption should be reviewed for each specific case. 

In [None]:
pw100_snow_events_m3 = xh.indicators.pmp.precipitable_water_100y(
    pw_snowfall_m3.sel(window=1).chunk(dict(time=-1)),
    dist="gamma",
    method="ML",
    mf=None,
    n=20,
    rebuild_time=True,
).compute()

pw.isel(x=0, y=1, window=0).hvplot(label="pw") * pw100_snow_events_m3.isel(
    x=0, y=1
).hvplot(label="pw₁₀₀") * pw_snowfall_m3.isel(x=0, y=1, window=0).hvplot.scatter(
    color="red", marker="*", label="m3"
)

### Maximization of Snowfall Events

With the information gathered so far, we can now proceed to maximize the snowfall events. Although `xHydro` does not provide an explicit function for this step, it can be accomplished by following these steps:

1. **Compute the maximitation ratio (r)**: Calculate the ratio between the 100-year monthly precipitable water leading to snowfall and the precipitable water during snowfall events.

2. **Limit the ratio (r)**: Limit the ratio to 1≤r≤2.5 as suggested by [Klein et al. (2016)](https://doi.org/10.1016/j.jhydrol.2016.03.031).
   
3. **Apply the ratio**:

    [Klein et al. (2016)](https://doi.org/10.1016/j.jhydrol.2016.03.031) suggest maximizing only those snowfall events that exceed a given threshold for a specified duration. Therefore, only events that meet this criterion are multiplied by the corresponding maximization ratio (r). If the criterion is not met, the snowfall event is not maximized.

    The authors used different thresholds depending on the duration of the event:

    - 6-hour duration: 3, 4, 5, 6 mm
    - 12-hour duration: 4, 5, 6, 7 mm
    - 24-hour duration: 5, 6, 7, 8 mm

    In this example, a 6-mm threshold was selected for the 1-day duration.


In [None]:
limit = 2.5
maximization_threshold = "6 mm"
maximization_threshold_converted = xclim.core.units.convert_units_to(
    maximization_threshold, prsn_events
)  # Convert thresholds to match the data's units

r = pw100_snow_events_m3 / pw_snowfall_m3
r_limited = xr.where(r > limit, limit, r)
r_limited = xr.where(r_limited < 1, 1, r_limited)

max_snow_events = xr.where(
    prsn_events >= maximization_threshold_converted, r_limited * prsn_events, np.nan
)
max_snow_events.name = "max_snow_events"
non_max_snow_events = prsn_events.where(prsn_events < maximization_threshold_converted)
non_max_snow_events.name = "non_max_snow_events"

The maximized snow accumulation (MSA) is the sum of the maximized and non-maximized events throughout a winter

In [None]:
snow_sum = xr.concat([non_max_snow_events, max_snow_events], dim="variable").sum(
    "variable"
)

snow_sum.isel(x=1, y=1, window=0).hvplot(
    label="Maximized snowfall", color="red"
) * max_snow_events.isel(x=1, y=1, window=0).hvplot.scatter(
    color="red", label="Maximized events"
) * prsn_events.isel(
    x=1, y=1, window=0
).hvplot(
    label="Snowfall", color="#30a2da", line_dash="dashed"
)

### PMSA estimation

The PMSA is the highest total among all winters defined.

In [None]:
pmsa = snow_sum.resample(time="YS-JUL").sum(dim="time").max(dim="time")
pmsa

In [None]:
plt.subplots(1, 1, figsize=[6, 5])
ax = plt.subplot(1, 1, 1)
pmsa.sel(window=1).plot()
plt.title("PMSA")