Skip to content

Commit

Permalink
Merge 73f4bbb into e6df64d
Browse files Browse the repository at this point in the history
  • Loading branch information
RondeauG committed Oct 4, 2023
2 parents e6df64d + 73f4bbb commit 04d923f
Show file tree
Hide file tree
Showing 16 changed files with 1,293 additions and 11 deletions.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ include README.rst
include .zenodo.json

recursive-include docs *.rst conf.py Makefile make.bat *.jpg *.png *.gif
recursive-include docs *.ipynb
recursive-include tests *
recursive-exclude * __pycache__
recursive-exclude * *.py[co]
Expand Down
12 changes: 7 additions & 5 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,19 @@ Welcome to xHydro's documentation!
.. toctree::
:maxdepth: 2
:caption: Contents:
:hidden:

readme
installation
usage
notebooks/local_frequency_analysis
apidoc/modules
contributing
authors
history

Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
.. Indices and tables
.. ==================
.. * :ref:`genindex`
.. * :ref:`modindex`
.. * :ref:`search`
347 changes: 347 additions & 0 deletions docs/notebooks/local_frequency_analysis.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,347 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Frequency analysis module"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Basic imports\n",
"import xarray as xr\n",
"import hvplot.xarray\n",
"import numpy as np\n",
"import xdatasets as xd\n",
"\n",
"import xhydro as xh\n",
"import xhydro.frequency_analysis as xhfa"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extracting and preparing the data\n",
"For this example, we'll conduct a frequency analysis using historical time series from various sites. We begin by obtaining a dataset comprising hydrological information. Here, we use the [xdataset](https://hydrologie.github.io/xdatasets/notebooks/getting_started.html) library to acquire hydrological data from the [Ministère de l'Environnement, de la Lutte contre les changements climatiques, de la Faune et des Parcs](https://www.cehq.gouv.qc.ca/atlas-hydroclimatique/stations-hydrometriques/index.htm) in Quebec, Canada. Specifically, our query focuses on stations with IDs beginning with `020`, possessing a natural flow pattern and limited to streamflow data. \n",
"\n",
"Users may prefer to generate their own `xarray.DataArray` using their individual dataset. At a minimum, the `xarray.DataArray` used for frequency analysis has to follow these principles:\n",
"\n",
"- The dataset needs a `time` dimension.\n",
"- If there is a spatial dimension, such as `id` in the example below, it needs an attribute `cf_role` with `timeseries_id` as its value.\n",
"- The variable will at the very least need a `units` attribute, although other attributes such as `long_name` and `cell_methods` are also expected by `xclim` (which is called at various points during the frequency analysis) and warnings will be generated if they are missing."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds = xd.Query(\n",
" **{\n",
" \"datasets\":{\n",
" \"deh\":{\n",
" \"id\" :[\"020*\"],\n",
" \"regulated\":[\"Natural\"],\n",
" \"variables\":[\"streamflow\"],\n",
" }\n",
" }, \"time\":{\"start\": \"1970-01-01\", \n",
" \"minimum_duration\":(15*365, 'd')},\n",
"\n",
" }\n",
").data.squeeze()\n",
"\n",
"# This dataset lacks some of the aforementioned attributes, so we need to add them.\n",
"ds[\"id\"].attrs[\"cf_role\"] = \"timeseries_id\"\n",
"ds[\"streamflow\"].attrs = {\"long_name\": \"Streamflow\", \"units\": \"m3 s-1\", \"standard_name\": \"water_volume_transport_in_river_channel\", \"cell_methods\": \"time: mean\"}\n",
"\n",
"ds"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.streamflow.dropna('time', 'all').hvplot(x='time', grid=True, widget_location='bottom', groupby='id')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Customizing the analysis settings\n",
"\n",
"### a) Defining seasons\n",
"We can define seasons using indexers that are compatible with `xclim.core.calendar.select_time`. There are currently four accepted types of indexers:\n",
"\n",
"- `month`, followed by a sequence of month numbers.\n",
"- `season`, followed by one or more of 'DJF', 'MAM', 'JJA' and 'SON'.\n",
"- `doy_bounds`, followed by a sequence representing the inclusive bounds of the period to be considered (start, end).\n",
"- `date_bounds`, which is the same as above, but using a month-day (%m-%d) format.\n",
"\n",
"For the purpose of getting block maxima through `xhydro.indicators.get_yearly_op`, the indexers need to be grouped within a dictionary, with the key being the label to be given to the requested period of the year. A second key can be used to instruct on the resampling frequency, for example to wrap around the year for winter."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Some examples\n",
"timeargs = {\n",
" \"spring\": {\"date_bounds\": [\"02-11\", \"06-19\"]},\n",
" \"summer\": {\"doy_bounds\": [152, 243]},\n",
" \"fall\": {\"month\": [9, 10, 11]},\n",
" \"winter\": {\"season\": ['DJF'], \"freq\": \"AS-DEC\"}, # To correctly wrap around the year, we need to specify the resampling frequency.\n",
" \"august\": {\"month\": [8]},\n",
" \"annual\": {}\n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### b) Getting block maxima\n",
"Upon selecting each desired season, we can extract block maxima timeseries from every station using `xhydro.indicators.get_yearly_op`. The main arguments are:\n",
"\n",
"- `op`: the operation to compute. One of \"max\", \"min\", \"mean\", \"sum\".\n",
"- `input_var`: the name of the variable. Defaults to \"streamflow\".\n",
"- `window`: the size of the rolling window. A \"mean\" is performed on the rolling window prior to the `op` operation.\n",
"- `timeargs`: as defined previously. Leave at `None` to get the annual maxima.\n",
"- `missing` and `missing_options`: to define tolerances for missing data. See [this page](https://xclim.readthedocs.io/en/stable/checks.html#missing-values-identification) for more information.\n",
"- `interpolate_na`: whether to interpolate missing data prior to the `op` operation. Only used for `sum`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The function returns a `xarray.Dataset` with 1 variable per indexer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Here, we hide years with more than 15% of missing data.\n",
"ds_4fa = xh.indicators.get_yearly_op(ds, op=\"max\", timeargs=timeargs, missing=\"pct\", missing_options={\"tolerance\": 0.15})\n",
"\n",
"ds_4fa"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"ds_4fa.streamflow_max_summer.dropna('time', 'all').hvplot(x='time', grid=True, widget_location='bottom', groupby='id')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### c) Using custom seasons per year or per station\n",
"\n",
"Using individualized date ranges for each year or each catchment is not explicitely supported, so users should instead mask their data prior to calling `get_yearly_op`. Note that when doing this, `missing` should be adjusted accordingly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a mask beforehand\n",
"import random\n",
"nyears = np.unique(ds.time.dt.year).size\n",
"dom_start = xr.DataArray(np.random.randint(1, 30, size=(nyears, )).astype(\"str\"), dims=(\"year\"), coords={\"year\": np.unique(ds.time.dt.year)})\n",
"dom_end = xr.DataArray(np.random.randint(1, 30, size=(nyears, )).astype(\"str\"), dims=(\"year\"), coords={\"year\": np.unique(ds.time.dt.year)})\n",
"\n",
"mask = xr.zeros_like(ds[\"streamflow\"])\n",
"for y in dom_start.year.values:\n",
" # Random mask of dates per year, between April and June.\n",
" mask.loc[{\"time\": slice(str(y)+\"-04-\"+str(dom_start.sel(year=y).item()), str(y)+\"-06-\"+str(dom_end.sel(year=y).item()))}] = 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mask.hvplot(x='time', grid=True, widget_location='bottom', groupby='id')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The name of the indexer will be used to identify the variable created here\n",
"timeargs_custom = {\"custom\": {}}\n",
"\n",
"# We use where() to mask the data that we want to ignore\n",
"masked = ds.where(mask==1)\n",
"# Since we masked almost all of the year, our tolerance for missing data should be changed accordingly\n",
"missing = \"at_least_n\"\n",
"missing_options = {\"n\": 45}\n",
"\n",
"# We use xr.merge() to combine the results with the previous dataset.\n",
"ds_4fa = xr.merge([ds_4fa, xh.indicators.get_yearly_op(masked, op=\"max\", timeargs=timeargs_custom, missing=missing, missing_options=missing_options)])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds_4fa.streamflow_max_custom.dropna('time', 'all').hvplot(x='time', grid=True, widget_location='bottom', groupby='id')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### d) Computing volumes\n",
"\n",
"The frequency analysis can also be performed on volumes, using a similar workflow. The main difference is that if we're starting from streamflow, we'll first need to convert them into volumes using `xhydro.indicators.compute_volume`. Also, if required, `get_yearly_op` has an argument `interpolate_na` that can be used to interpolate missing data prior to the sum."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get a daily volume from a daily streamflow\n",
"ds[\"volume\"] = xh.indicators.compute_volume(ds[\"streamflow\"], out_units=\"hm3\")\n",
"\n",
"# We'll take slightly different indexers\n",
"timeargs_vol = {\n",
" \"spring\": {\"date_bounds\": [\"04-30\", \"06-15\"]},\n",
" \"annual\": {}\n",
" }\n",
"\n",
"# The operation that we want here is the sum, not the max.\n",
"ds_4fa = xr.merge([ds_4fa, xh.indicators.get_yearly_op(ds, op=\"sum\", input_var=\"volume\", timeargs=timeargs_vol, missing=\"pct\", missing_options={\"tolerance\": 0.15}, interpolate_na=True)])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds_4fa.volume_sum_spring.dropna('time', 'all').hvplot(x='time', grid=True, widget_location='bottom', groupby='id')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Local frequency analysis\n",
"\n",
"Once we have our yearly maximums (or volumes/minimums), the first step in a local frequency analysis is to call `xhfa.local.fit` to obtain distribution parameters. The options are:\n",
"\n",
"- `distributions`: list of [SciPy distributions](https://docs.scipy.org/doc/scipy/reference/stats.html#continuous-distributions). Defaults to [\"expon\", \"gamma\", \"genextreme\", \"genpareto\", \"gumbel_r\", \"pearson3\", \"weibull_min\"].\n",
"- `min_years`: minimum number of years required to fit the data.\n",
"- `method`: fitting method. Defaults to the maximum likelihood."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# To speed up the Notebook, we'll only perform the analysis on a subset of variables\n",
"params = xhfa.local.fit(ds_4fa[[\"streamflow_max_spring\", \"volume_sum_spring\"]], min_years=15)\n",
"\n",
"params"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Information Criteria such as the AIC, BIC, and AICC are useful to determine which statistical distribution is better suited to a given location. These three criteria can be computed using `xhfa.local.criteria`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"criteria = xhfa.local.criteria(ds_4fa[[\"streamflow_max_spring\", \"volume_sum_spring\"]], params)\n",
"\n",
"criteria"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, return periods can be obtained using `xhfa.local.parametric_quantiles`. The options are:\n",
"\n",
"- `t`: the return period(s) in years.\n",
"- `mode`: whether the return period is the probability of exceedance (max) or non-exceedance (min). Defaults to `max`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"rp = xhfa.local.parametric_quantiles(params, t=[20, 100])\n",
"\n",
"rp"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"vscode": {
"interpreter": {
"hash": "e28391989cdb8b31df72dd917935faad186df3329a743c813090fc6af977a1ca"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}
8 changes: 8 additions & 0 deletions environment-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ dependencies:
- python >=3.9,<3.12
# Don't forget to sync the changes here with environment.yml!
# Main packages
- numpy
- scipy
- statsmodels
- xarray
- xclim >=0.45.0
- xscen >=0.7.1
# Dev
Expand Down Expand Up @@ -33,3 +37,7 @@ dependencies:
# Packaging
- build
- wheel
# Notebooks
- hvplot
- pip:
- xdatasets
Loading

0 comments on commit 04d923f

Please sign in to comment.