# Hydrological modelling - Raven

`xHydro` provides a collection of functions designed to facilitate hydrological modelling, focusing on two key models: [HYDROTEL](https://github.com/INRS-Modelisation-hydrologique/hydrotel) and a suite of models emulated by the [Raven Hydrological Framework](https://raven.uwaterloo.ca/). It is important to note that Raven already possesses an extensive Python library, [RavenPy](https://github.com/CSHS-CWRA/RavenPy), which enables users to build, calibrate, and execute models. `xHydro` wraps some of these functions to support multi-model assessments with HYDROTEL, though users seeking advanced functionalities may prefer to use `RavenPy` directly. 

The primary contribution of `xHydro` to hydrological modelling is thus its support for HYDROTEL, a model that previously lacked a dedicated Python library. This Notebook covers `RavenPy` models, but a similar notebook for `HYDROTEL` is available [here](hydrological_modelling_hydrotel.ipynb).

## Basic information

In [None]:
import xhydro as xh
import xhydro.modelling as xhm

In [None]:
# Workaround for determining the notebook folder within a running notebook
# This cell is not visible when the documentation is built.

from __future__ import annotations

try:
    from _finder import _find_current_folder

    notebook_folder = _find_current_folder()
except ImportError:
    from pathlib import Path

    notebook_folder = Path().cwd()

import logging

logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

The `xHydro` modelling framework is based on a `model_config` dictionary, which is meant to contain all necessary information to execute a given hydrological model. For example, depending on the model, it can store meteorological datasets directly, paths to datasets (netCDF files or other), csv configuration files, parameters, and basically anything that is required to configure and execute an hydrological model.

The list of required inputs for the dictionary can be obtained one of two ways. The first is to look at the hydrological model's class, such as `xhydro.modelling.RavenpyModel`. The second is to use the `xh.modelling.get_hydrological_model_inputs` function to get a list of the required keys for a given model, as well as the documentation.

In [None]:
help(xhm.get_hydrological_model_inputs)

In [None]:
import xhydro as xh
import xhydro.modelling as xhm

# This function can be called to get a list of the keys for a given model, as well as its documentation.
inputs, docs = xhm.get_hydrological_model_inputs("GR4JCN", required_only=False)
inputs

In [None]:
print(docs)

HYDROTEL and Raven vary in terms of required inputs and available functions, but an effort will be made to standardize the outputs as much as possible. Currently, all models include the following three functions:

- `.run()`: Executes the model, reformats the outputs to be compatible with analysis tools in `xHydro`, and returns the simulated streamflow as a `xarray.Dataset`.
  - The streamflow variable will be named `q` and will have units of `m3 s-1`.
  - For 1D data (such as hydrometric stations), the corresponding dimension in the dataset will be identified by the `cf_role: timeseries_id` attribute.
  
- `.get_inputs()`: Retrieves the meteorological inputs used by the model.

- `.get_streamflow()`: Retrieves the simulated streamflow output from the model.

## Initializing and running a calibrated model
Raven requires several `.rv*` files to control various aspects such as meteorological inputs, watershed characteristics, and more. Currently, `RavenPy` provides no straightforward way to open and modify these files. For instance, changing simulation dates or meteorological data directly through the files is not yet supported. 

Until this feature is added, all relevant information must be provided to `RavenPy` via the `model_config` dictionary in order to successfully run the model. For examples on how to obtain many of the required inputs, such as the watershed characteristics and meteorological data, consult the [GIS](gis.ipynb) and [Use Case Example](use_case.ipynb) notebooks. Therefore, this notebook will utilize a test dataset. All RavenPy models available currently in `xHydro` are lumped.


In [None]:
import xarray as xr
import xclim

from xhydro.testing.helpers import (  # In-house function to get data from the xhydro-testdata repo
    deveraux,
)

D = deveraux()

meteo_file = D.fetch("ravenpy/ERA5_Riviere_Rouge_global.nc")
ds = xr.open_dataset(meteo_file)
ds = ds.rename({"tmin": "tasmin", "tmax": "tasmax"})

ds.to_netcdf(notebook_folder / "_data" / "meteo_hmr.nc")

In [None]:
model_config = {
    "model_name": "GR4JCN",
    "parameters": [
        0.529,
        -3.396,
        407.29,
        1.072,
        16.9,
        0.947,
    ],  # GR4JCN has 6 parameters, others might have more
    "drainage_area": 500,
    "elevation": 120,
    "latitude": 45.5,
    "longitude": -71.8,
    "start_date": "1990-01-01",
    "end_date": "1991-12-31",
    "meteo_file": notebook_folder / "_data" / "meteo_hmr.nc",
    "data_type": ["TEMP_MAX", "TEMP_MIN", "PRECIP"],
    "alt_names_meteo": {"TEMP_MIN": "tasmin", "TEMP_MAX": "tasmax", "PRECIP": "pr"},
    "meteo_station_properties": {
        "ALL": {"elevation": 500, "latitude": 45.5, "longitude": -71.8}
    },
    "rain_snow_fraction": "RAINSNOW_DINGMAN",
    "evaporation": "PET_HARGREAVES_1985",
    "global_parameter": {"AVG_ANNUAL_SNOW": 100.00},
}

With `model_config` on hand, an instance of the hydrological model can be initialized using `xhydro.modelling.hydrological_model` or the `xhydro.modelling.RavenpyModel` class directly.

In [None]:
ht = xhm.hydrological_model(model_config)
ht

### Validating the Meteorological Data

Before executing hydrological models, a few basic checks will be performed automatically. However, users may want to conduct more advanced health checks on the meteorological inputs (e.g., identifying unrealistic values). This can be done using `xhydro.utils.health_checks`. For the full list of available checks, refer to [the 'xscen' documentation](https://xscen.readthedocs.io/en/latest/notebooks/3_diagnostics.html#Health-checks).

We can use `.get_inputs()` to automatically retrieve the meteorological data. In this example, we'll ensure there are no abnormal meteorological values or sequences of values.


In [None]:
health_checks = {
    "raise_on": [],  # If an entry is not here, it will warn the user instead of raising an exception.
    "flags": {
        "pr": {  # You can have specific flags per variable.
            "negative_accumulation_values": {},
            "very_large_precipitation_events": {},
            "outside_n_standard_deviations_of_climatology": {"n": 5},
            "values_repeating_for_n_or_more_days": {"n": 5},
        },
        "tasmax": {
            "tasmax_below_tasmin": {},
            "temperature_extremely_low": {},
            "temperature_extremely_high": {},
            "outside_n_standard_deviations_of_climatology": {"n": 5},
            "values_repeating_for_n_or_more_days": {"n": 5},
        },
        "tasmin": {
            "temperature_extremely_low": {},
            "temperature_extremely_high": {},
            "outside_n_standard_deviations_of_climatology": {"n": 5},
            "values_repeating_for_n_or_more_days": {"n": 5},
        },
    },
}

In [None]:
from xclim.core.units import amount2rate

ds_in = ht.get_inputs()
ds_in["pr"] = amount2rate(ds_in["pr"])  # Precipitation in xclim needs to be a flux.

xh.utils.health_checks(ds_in, **health_checks)

### Executing the Model

A few basic checks are performed when the `.run()` function is called, before executing the model itself. For `RavenPy`, the following checks are made:

- The model name is valid: ["Blended", "GR4JCN", "HBVEC", "HMETS", "HYPR", "Mohyse", "SACSMA"]

Only if these checks pass will the function proceed to execute the model. Note that Raven itself will perform a series of checkups, which is why they are kept at a minimum in `xHydro`.

Once the model is executed, `xHydro` will automatically reformat the NetCDF file to bring it closer to CF conventions, ensuring compatibility with other `xHydro` modules. Note that, at this time, this reformatting only supports the outgoing streamflow.


In [None]:
ds_out = ht.run()
ds_out

In [None]:
ht.get_streamflow()["q"].plot()

## Model Calibration

When building a model from scratch, a calibration step is necessary to find the optimal set of parameters. Model calibration involves a loop of several iterations, where: model parameters are selected, the model is run, and the results are compared to observed data. In `xHydro`, the calibration function utilizes `SPOTPY` to carry out the optimization process.

The calibration function still uses the `model_config` dictionary created earlier, but now within the `xh.modelling.perform_calibration` function.


In [None]:
help(xh.modelling.perform_calibration)

We can prepare the additional arguments required by the calibration function. A good calibration process should always exclude some data from the computation of the objective function to ensure a validation period. This can be achieved using the `mask` argument, which uses an array of 0 and 1. 

This example will only use 10 evaluations to cut on computing time, but a real calibration should rely on at least 500 iterations with simple models such as GR4JCN.

In [None]:
qobs_file = D.fetch("ravenpy/Debit_Riviere_Rouge.nc")
ds_obs = xr.open_dataset(qobs_file)

# Reformat the data
ds_obs = ds_obs.rename({"qobs": "q"}).sel(time=slice("1990", "1991"))

# Create the mask
mask = xr.where(ds_obs.time.dt.year.isin([1990]), 0, 1)

In [None]:
# Parameter bounds for GR4JCN
bounds_low = [0.01, -15.0, 10.0, 0.0, 1.0, 0.0]
bounds_high = [2.5, 10.0, 700.0, 7.0, 30.0, 1.0]

In [None]:
# Run the calibration
best_parameters, best_simulation, best_objfun = xhm.perform_calibration(
    model_config,
    obj_func="kge",
    bounds_low=bounds_low,
    bounds_high=bounds_high,
    qobs=ds_obs,
    evaluations=10,
    algorithm="DDS",
    mask=mask,
    sampler_kwargs={"trials": 1},
)

In [None]:
# The first output corresponds to the best set of parameters
best_parameters

In [None]:
# The second output corresponds to the timeseries for the best set of parameters
best_simulation

In [None]:
# The second output is the value of the objective function for the best set of parameters
best_objfun