# Hydrological modelling - Raven (distributed)

`xHydro` provides a collection of functions designed to facilitate hydrological modelling, focusing on two key models: [HYDROTEL](https://github.com/INRS-Modelisation-hydrologique/hydrotel) and a suite of models emulated by the [Raven Hydrological Framework](https://raven.uwaterloo.ca/). It is important to note that Raven already possesses an extensive Python library, [RavenPy](https://github.com/CSHS-CWRA/RavenPy), which enables users to build, calibrate, and execute models. `xHydro` wraps some of these functions to support multi-model assessments with HYDROTEL, though users seeking advanced functionalities may prefer to use `RavenPy` directly. 

The primary contribution of `xHydro` to hydrological modelling is thus its support for HYDROTEL, a model that previously lacked a dedicated Python library. This Notebook covers `RavenPy` models, but a similar notebook for `HYDROTEL` is available [here](hydrological_modelling_hydrotel.ipynb).

## Basic information

In [None]:
from IPython.display import clear_output

import xhydro as xh
import xhydro.modelling as xhm

clear_output(wait=False)

In [None]:
# Workaround for determining the notebook folder within a running notebook
# This cell is not visible when the documentation is built.

from __future__ import annotations

try:
    from _finder import _find_current_folder

    notebook_folder = _find_current_folder()
except ImportError:
    from pathlib import Path

    notebook_folder = Path().cwd()

import logging

logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

The `xHydro` modelling framework is based on a `model_config` dictionary, which is meant to contain all necessary information to execute a given hydrological model. For example, depending on the model, it can store meteorological datasets directly, paths to datasets (netCDF files or other), csv configuration files, parameters, and basically anything that is required to configure and execute an hydrological model.

The list of required inputs for the dictionary can be obtained one of two ways. The first is to look at the hydrological model's class, such as `xhydro.modelling.RavenpyModel`. The second is to use the `xh.modelling.get_hydrological_model_inputs` function to get a list of the required keys for a given model, as well as the documentation.

In [None]:
help(xhm.get_hydrological_model_inputs)

In [None]:
# This function can be called to get a list of the keys for a given model, as well as its documentation.
inputs, docs = xhm.get_hydrological_model_inputs("HBVEC", required_only=False)
inputs

In [None]:
print(docs)

HYDROTEL and Raven vary in terms of required inputs and available functions, but an effort will be made to standardize the outputs as much as possible. Currently, all models include the following three functions:

- `.run()`: Executes the model, reformats the outputs to be compatible with analysis tools in `xHydro`, and returns the simulated streamflow as a `xarray.Dataset`.
  - The streamflow variable will be named `q` and will have units of `m3 s-1`.
  - For 1D data (such as hydrometric stations), the corresponding dimension in the dataset will be identified by the `cf_role: timeseries_id` attribute.
  
- `.get_inputs()`: Retrieves the meteorological inputs used by the model.

- `.get_streamflow()`: Retrieves the simulated streamflow output from the model.

## Initializing and running a calibrated model
Raven requires several `.rv*` files to control various aspects such as meteorological inputs, watershed characteristics, and more. If the project directory already exists and contains data, `xHydro` will prepare the model for execution without overwriting existing `.rv*` files—unless the `overwrite` argument is explicitly set to `True`. To force overwriting of these files, you can thus either:

- Set `overwrite=True` in the `model_config` when instantiating the model
- Use the `.create_rv(overwrite=True)` method on the instantiated model.

This Notebook will focus on distributed RavenPy models. For lumped models, refer to the [Raven lumped modelling notebook](hydrological_modelling_raven.ipynb).

### Formatting HRU Data for distributed models

Raven relies on Hydrological Response Units (HRUs) for its hydrological simulations. Distributed models require a long list of HRU attributes, which is not yet fully supported by `xHydro`. Users are encouraged to consult the [BasinMaker documentation](https://hydrology.uwaterloo.ca/basinmaker/) for a complete list of HRU attributes, and specifically the 'data specification file' available on their homepage. For distributed modelling, `xHydro` calls upon the `BasinMakerExtractor` class of `RavenPy` to extract HRU data from a shapefile, so users must ensure that their shapefile is formatted correctly.

Additionally, while BasinMaker will produce attributes such as `Landuse_ID`, these will not be passed on to the `RavenPy` model. Instead, the HRU should contain relevant land use attributes that can be directly mapped to the hydrological model's arguments. For example, for the HBV-EC model, which is currently the only distributed model available in `Raven`, the following attributes are used instead: `LAND_USE_C`, `VEG_C`, and `SOIL_PROF`, which represent land use, vegetation, and soil profile respectively.

<div class="alert alert-info"> <b>INFO</b>

By default, HBV-EC as defined in `RavenPy` only understands a unique `LAND_USE_C` (`LU_ALL`), `VEG_C` (`VEG_ALL`), and `SOIL_PROF` (`DEFAULT_P`). If you want to use different classes, you will need to modify the `model_config` dictionary to include the relevant keys. There is currently no good documentation on how to do this, but you can refer to the class definition of the `HBVEC` model in `ravenpy.config.emulators.hbvec.py`.

As an example, new vegetation classes can be added by modifying the `VegetationClasses` and `VegetationParameterList` keys with new entries detailing all the vegetation classes and their parameters. The same applies to land use and soil profile classes.

</div>


In [None]:
from pathlib import Path

import geopandas as gpd
import matplotlib.pyplot as plt
import pooch

from xhydro.testing.helpers import (  # In-house function to get data from the xhydro-testdata repo
    deveraux,
)

df = gpd.read_file(
    Path(
        deveraux().fetch(
            "ravenpy/hru_subset.zip",
            pooch.Unzip(),
        )[0]
    ).parents[0]
)

# Plot the subbasins and land use
f, ax = plt.subplots(1, 2, figsize=(10, 10))
df.plot(column="SubId", ax=ax[0])
ax[0].set_title("Subbasins")
df.plot(
    column="LAND_USE_C",
    ax=ax[1],
    legend=True,
    legend_kwds={"bbox_to_anchor": (1.05, 1), "loc": "upper left"},
)
ax[1].set_title("Land Use")
plt.tight_layout()

In [None]:
# To keep this example simple, and until cleaner methods are incorporated in xHydro, we will revert to the default HBVEC model configuration.
# This is not recommended for real applications, as you will likely want to modify the model configuration to suit your needs.
df.loc[:, "LAND_USE_C"] = "LU_ALL"
df.loc[:, "VEG_C"] = "VEG_ALL"
df.loc[:, "SOIL_PROF"] = "DEFAULT_P"

### Formatting Meteorological Data

<div class="alert alert-info"> <b>INFO</b>

If using multiple meteorological stations, it is recommended to add the `Interpolation` argument to `model_config` or the `RavenpyModel` call to control the interpolation algorithm. Raven uses the nearest neighbour method by default, but other options are available:

- `INTERP_NEAREST_NEIGHBOR` (default) — Nearest neighbor (Voronoi) method  
- `INTERP_INVERSE_DISTANCE` — Inverse distance weighting  
- `INTERP_INVERSE_DISTANCE_ELEVATION` — Inverse distance weighting with consideration of elevation  
- `INTERP_AVERAGE_ALL` — Averages all specified gauge readings  
- `INTERP_FROM_FILE [filename]` — Weights for each gauge at each HRU are specified in an external file.  This method should work via `xHydro`, but it has not been fully tested.

</div>

<div class="alert alert-info"> <b>INFO</b>

When using gridded meteorological data, `xHydro` uses functions from `RavenPy` to compute weights for each grid cell based on the HRU's geometry.  
Ensure that the domain of the grid completely covers the watershed.

</div>

The acquisition of raw meteorological data is covered in the [GIS notebook](gis.ipynb) and [Use Case Example](use_case.ipynb) notebooks. Therefore, this notebook will use a test dataset.

In [None]:
import xarray as xr

ds = xr.open_zarr(
    Path(
        deveraux().fetch(
            "pmp/CMIP.CCCma.CanESM5.historical.r1i1p1f1.day.gn.zarr.zip",
            pooch.Unzip(),
        )[0]
    ).parents[0]
)
ds_fx = xr.open_zarr(
    Path(
        deveraux().fetch(
            "pmp/CMIP.CCCma.CanESM5.historical.r1i1p1f1.fx.gn.zarr.zip",
            pooch.Unzip(),
        )[0]
    ).parents[0]
)

ds["orog"] = ds_fx["orog"]
ds = ds.drop_vars(["height"])
ds["pr"].attrs = {"units": "mm", "long_name": "precipitation"}
ds = ds[["pr", "tas", "orog"]]
ds

Every hydrological model has different requirements when it comes to their input data. In this example, the data above has multiple issues that would be not compatible with the requirements for Raven. For reference on default units expected by Raven, consult [this link](https://ravenpy.readthedocs.io/en/latest/_modules/ravenpy/config/defaults.html#).

The function `xh.modelling.format_input` can be used to reformat CF-compliant datasets for use in hydrological models.

In [None]:
help(xh.modelling.format_input)

In [None]:
# You can also use the 'save_as' argument to save the new file(s) in your project folder.
ds_reformatted, config = xh.modelling.format_input(
    ds,
    "HBVEC",
    save_as=notebook_folder / "_data" / "meteo_hmr_distributed.nc",
)
ds_reformatted

While RavenPy does not require a configuration file to accompany the meteorological file, many information must be given to `model_config` to properly instantiate the model. The second output of `format_input` will return the "meteo_file", "data_type", "alt_names_meteo", and "meteo_station_properties" entries based on the provided file.


In [None]:
config_copy = config.copy()
config_copy["meteo_file"] = "/path/to/your/save_as/argument.nc"
config_copy

### Initializing the Model

The model can now be initialized using the information acquired so far.  
Additional entries can be provided to the `model_config` dictionary, as long as they are supported by the emulated Raven model. In particular:

- The `output_subbasins` key can be used to specify which subbasins to output.
- The `global_parameter` key must have a value for `AVG_ANNUAL_RUNOFF`, which is the average annual runoff in mm/year (with a range of 0-1000 according the Raven's documentation). This value is required for distributed models.

Refer to the [Raven documentation](https://raven.uwaterloo.ca/Downloads.html) for the most up-to-date information.  
Model templates are currently listed in Appendix F, while the available options are described in various chapters.


In [None]:
# The HBVEC model has 21 parameters
parameters = [
    -0.15,
    3.5,
    3.0,
    0.07,
    0.4,
    0.8,
    1,
    4.0,
    0.5,
    0.1,
    1,
    5.0,
    4.8,
    0.1,
    1.0,
    22.0,
    0.5,
    0.1,
    0.0,
    1.0,
    1.0,
]

model_config = {
    "model_name": "HBVEC",
    "parameters": parameters,
    "global_parameter": {
        "AVG_ANNUAL_RUNOFF": 597
    },  # Distributed models require an average annual runoff value at each HRU
    "hru": df,
    "output_subbasins": "all",  # Use "all" to output all subbasins
    "start_date": "2010-01-02",
    "end_date": "2010-12-31",
    **config,
}

With `model_config` on hand, an instance of the hydrological model can be initialized using `xhydro.modelling.hydrological_model` or the `xhydro.modelling.RavenpyModel` class directly.

In [None]:
ht = xhm.hydrological_model(model_config)
ht

### Validating the Meteorological Data

Before executing hydrological models, a few basic checks will be performed automatically. However, users may want to conduct more advanced health checks on the meteorological inputs (e.g., identifying unrealistic values). This can be done using `xhydro.utils.health_checks`. For the full list of available checks, refer to [the 'xscen' documentation](https://xscen.readthedocs.io/en/latest/notebooks/3_diagnostics.html#Health-checks).

We can use `.get_inputs()` to automatically retrieve the meteorological data. In this example, we'll ensure there are no abnormal meteorological values or sequences of values.


In [None]:
health_checks = {
    "raise_on": [],  # If an entry is not here, it will warn the user instead of raising an exception.
    "flags": {
        "pr": {  # You can have specific flags per variable.
            "negative_accumulation_values": {},
            "very_large_precipitation_events": {},
            "outside_n_standard_deviations_of_climatology": {"n": 5},
            "values_repeating_for_n_or_more_days": {"n": 5},
        },
        "tas": {
            "temperature_extremely_low": {},
            "temperature_extremely_high": {},
            "outside_n_standard_deviations_of_climatology": {"n": 5},
            "values_repeating_for_n_or_more_days": {"n": 5},
        },
    },
}

In [None]:
from xclim.core.units import amount2rate

with ht.get_inputs() as ds_in:
    ds_in["pr"] = amount2rate(ds_in["pr"])  # Precipitation in xclim needs to be a flux.

    xh.utils.health_checks(ds_in, **health_checks)

### Executing the Model

A few basic checks are performed when the `.run()` function is called, before executing the model itself. However, since both RavenPy and Raven will perform a series of checkups themselves, they are kept at a minimum in `xHydro`. If required, a `RavenpyModel.executable` class attribute can be used to point to your own Raven executable instead of the one provided by the `raven-hydro` library in the active Python environment.

Once the model is executed, `xHydro` will automatically reformat the NetCDF file to bring it closer to CF conventions, ensuring compatibility with other `xHydro` modules. Note that, at this time, this reformatting only supports the outgoing streamflow.


In [None]:
ds_out = ht.run()
ds_out

In [None]:
ht.get_streamflow().isel(subbasin_id=0)["q"].plot()

## Updating the rv* files

Currently, `RavenPy` provides no straightforward way to open and modify the Raven `.rv*` files. For instance, changing simulation dates or meteorological data directly through the files is not yet supported. Until this feature is added, some basic functions have been integrated into `xHydro`, but should be used with care.

The basic information, such as `start_date`, `end_date`, and `parameters`, are stored directly in the `RavenpyModel` class and can be manually updated. Similarly, if additional arguments had been given to the model during initialization, they are stored within a dictionary under `RavenpyModel.kwargs`, which can be accessed and modified as needed.

The observed streamflow, HRU characteristics and meteorological data are stored under the `.qobs`, `.hru` and `.meteo` attributes respectively, but can be much trickier to update, since the associated `RavenPy` commands must be reconstructed again. Therefore, it is strongly recommended to use the `.update_data` method to update these. This function calls upon a subset of the same arguments used when initializing a Raven model:


In [None]:
help(ht.update_data)

That function will only update the `RavenpyModel` class itself, not the files. If possible, it is strongly recommended to use the `create_rv` function to overwrite the existing `.rv*` files with the updated information.

If this is not possible, some aspects of the model can still be updated using the `.update_config` method:

In [None]:
help(ht.update_config)

Be very aware that not all updates will be reflected in the `.rv*` files. The last two options especially should be used with caution, as HRU characteristics, such as the subbasin IDs, will *not* be updated. If the HRU within the model has changed, there is currently no way to modify existing files. They should be deleted and recreated using the `.create_rv()` method.

## Model Calibration

Calibrating distributed models is not yet supported by `xHydro` if multiple hydrometric stations are used. Users are encouraged to use the `RavenPy` library directly for this purpose. For single-station calibration, `xHydro` can be used. Refer to the [lumped RavenPy documentation](hydrological_modelling_raven.ipynb) for more details.