# Modelling module
## HYDROTEL
The `xhydro.modelling.Hydrotel` class is used to prepare and update configuration files for HYDROTEL, validate the project directory and the meteorological inputs, execute the model, and reformat the outputs to be more inline with CF conventions and other functionalities of `xhydro`.

<div class="alert alert-info"> <b>INFO</b>
`xhydro` does not distribute the HYDROTEL executable. If you want to use this module, you will have to get access to it first.
</div>

### Initialising the model
`xhydro` does not prepare the project directory itself, which should be done beforehand. What the class does, when initiating a new instance of `xhydro.modelling.Hydrotel`, is allow modifying the entries located in the three main configuration files: `projet.csv`, `simulation.csv`, and `output.csv`. At minimum, when this class is initialised, the project folder will have to already exist and either 'default_options' will have to be True or SIMULATION COURANTE (current simulation) must be specified as a keyword argument in `project_options`.

The arguments of the class are:

- `project`: *str* or *os.Pathlike*. The full path to the project directory.
- `default_options`: *bool*. Whether to use default configuration options taken from `xhydro/modelling/data/hydrotel_defaults.yml`, or read them from the files on disk.
- `project_options`, `simulation_options`, `output_options`: *dict*. Configuration options to overwrite with new information.

At any time after initialising the class, `Hydrotel.update_options()` can be called to update the three configuration files. When called, this function will overwrite the CSV files written on disk.

In [None]:
from pathlib import Path
from xhydro.modelling import Hydrotel
import xhydro as xh

# Folder where to put the data
project_folder = Path().absolute() / "_data"
project_name = "example_hydrotel"

In [None]:
# This is a hidden cell. We'll create a fake Hydrotel directory for the purpose of this example.
from xclim.testing.helpers import test_timeseries as timeseries
import xhydro.testing.utils
import numpy as np

# Fake meteorological data
meteo = timeseries(
    np.zeros(365 * 2),
    start="2001-01-01",
    freq="D",
    variable="tasmin",
    as_dataset=True,
    units="degC",
)
meteo["tasmax"] = timeseries(
    np.ones(365 * 2),
    start="2001-01-01",
    freq="D",
    variable="tasmax",
    units="degC",
)
meteo["pr"] = timeseries(
    np.ones(365 * 2) * 10,
    start="2001-01-01",
    freq="D",
    variable="pr",
    units="mm",
)
meteo = meteo.expand_dims("stations").assign_coords(stations=["010101"])
meteo = meteo.assign_coords(
    coords={"lat": 46, "lon": -77, "z": 0}
)
for c in ["lat", "lon", "z"]:
    meteo[c] = meteo[c].expand_dims("stations")

# Fake output
debit_aval = timeseries(
    np.zeros(365 * 2),
    start="2001-01-01",
    freq="D",
    variable="streamflow",
    as_dataset=True,
)
debit_aval = debit_aval.expand_dims("troncon").assign_coords(troncon=[0])
debit_aval = debit_aval.assign_coords(coords={"idtroncon": 0})
debit_aval["idtroncon"] = debit_aval["idtroncon"].expand_dims("troncon")
debit_aval = debit_aval.rename({"streamflow": "debit_aval"})
debit_aval["debit_aval"].attrs = {
    "units": "m3/s",
    "description": "Debit en aval du troncon",
}

xhydro.testing.utils.fake_hydrotel_project(
    project_folder, project_name, meteo=meteo, debit_aval=debit_aval
)

DATE DEBUT (start date), DATE FIN (end date), and PAS DE TEMPS (timestep frequency) will always need to be specified for HYDROTEL to run, so these need to be added to `simulation_options` if they don't already exist in `simulation.csv`. Similarly, either FICHIER STATIONS METEO (meteorological stations file) or FICHIER GRILLE METEO (meteorological grid file) need to be specified.

By default, all output options are turned off, so the required outputs need to be turned on too if it's not already the case.

In [None]:
# The options only need to be those that differ from the defaults or those on-disk. These will generally include the simulation dates and output requested.
simulation_options = {"DATE DEBUT": "2001-01-01", "DATE FIN": "2001-12-31", "FICHIER STATIONS METEO": "meteo/SLNO_meteo_GC3H.nc", "PAS DE TEMPS": 24}
output_options = {"TRONCONS": 1, "DEBITS_AVAL": 1}
ht = Hydrotel(project_folder / project_name, default_options=False, simulation_options=simulation_options, output_options=output_options)

print(f"Simulation name, taken from 'projet.csv': '{ht.simulation_name}'\n")
print(f"Project configuration: '{ht.project_options}'\n")
print(f"Simulation configuration: '{ht.simulation_options}'\n")
print(f"Output configuration: '{ht.output_options}'")

## Validating the meteorological data
A few basic checks will be automatically performed prior to executing the model, but a user might want to perform more advanced health checks (missing values, unrealistic meteorological inputs, etc.). This is possible through the use of `xhydro.utils.health_checks`. Consult [the 'xscen' documentation](https://xscen.readthedocs.io/en/latest/notebooks/3_diagnostics.html#Health-checks) for the full list of possible checks.

In this example, we'll:
- Make sure that there is no missing data.
- That there are no abnormal meteorological values or sequence of values. Some checks here will fail, to showcase what those would look like.

In [None]:
health_checks = {"raise_on": [],  # If an antry is not here, it will warn the user instead of raising an exception.
                 "missing": {"missing_any": {"freq": "YS"}},
                 "flags": {
                     "pr":  # You can have specific flags per variable.
                           {  
                               "negative_accumulation_values": {},
                               "very_large_precipitation_events": {},
                               "outside_n_standard_deviations_of_climatology": {"n": 5},
                               "values_repeating_for_n_or_more_days": {"n": 5}
                               },
                           "tasmax": 
                           {
                               "tasmax_below_tasmin": {},
                               "temperature_extremely_low": {},
                               "temperature_extremely_high": {},
                               "outside_n_standard_deviations_of_climatology": {"n": 5},
                               "values_repeating_for_n_or_more_days": {"n": 5},
                               },
                           "tasmin": 
                           {
                               "temperature_extremely_low": {},
                               "temperature_extremely_high": {},
                               "outside_n_standard_deviations_of_climatology": {"n": 5},
                               "values_repeating_for_n_or_more_days": {"n": 5},
                               },
                     }
                 }

In [None]:
ds_in = ht.get_input()
ds_in["pr"].attrs["units"] = "mm d-1"  # Hydrotel-to-xclim compatibility

xh.utils.health_checks(ds_in, **health_checks)

## Executing HYDROTEL
A few basic checkups will always be performed when executing `HYDROTEL.run()`:

- All files mentioned in the configuration exist.
- The dataset has the TIME and STATION (optional) dimensions, and LONGITUDE, LATITUDE, ELEVATION coordinates.
- The dataset has TMIN (degC), TMAX (degC), and PRECIP (mm) variables.
- The dataset has a standard calendar.
- The frequency is uniform (i.e. all time steps are equally spaced).
- The start and end dates are contained in the dataset.

The name of the dimensions, coordinates, and variables are checked against the configuration file (e.g. `SLNO_meteo_GC3H.nc.config`, in this example).

Only when those checks are satisfied will the function actually execute the model. In addition, the following arguments can be called:

- `hydrotel_console`: *str* or *os.Pathlike*. For Windows, this is the path to the Hydrotel executable.
- `id_as_dim`: *bool*. By default, Hydrotel will generate a dataset with `troncon` as the spatial dimension, and `idtroncon` as a coordinate. If True, this will swap them so that `idtroncon` is the dimension, then will remove `troncon` altogether.
- `dry_run`: *bool*. True by default, just to prevent erroneous executions. Put at False to execute the model.

Once HYDROTEL has been run, `xhydro` will automatically reformat the NetCDF to bring it closer to CF conventions and into a format that can be used with other `xhydro` modules, although no standard exists for hydrological data. This currently only supports the DEBITS_AVAL (outgoing streamflow) output option.

In [None]:
# For the purpose of this example, we'll leave 'dry_run' as True.
print("Command that would be run in the terminal:")
ht.run(id_as_dim=True, dry_run=True)

In [None]:
ht.get_streamflow()  # This is what raw HYDROTEL outputs would look like.

In [None]:
# dry_run=True skips the reformatting, but it can be called explicitely.
ht._standardise_outputs()
ht.get_streamflow()  # These will be the reformatted outputs.

If we examine the streamflow file after executing HYDROTEL, we'll see that it has been reformatted.

- `troncon` and `idtroncon` are renamed to `station` and `station_id`, respectively.
- If `id_as_dim` is True, then `station_id` is the dimension.
- `debit_aval` is renamed `streamflow` and is given standard attributes.

In [None]:
# This cell is hidden.
import os
import shutil

if os.path.isdir(project_folder):
    shutil.rmtree(project_folder)