Skip to content

Machine Learning Models

Martin Gauch edited this page Jul 30, 2020 · 13 revisions

A suite of Machine Learning models built by Martin Gauch and Jimmy Lin (both U of Waterloo).

Data

The input data used mimic the setups of several models:

  1. LBRM: Lumped forcing data (independent variables) and streamflow data at 21 USACE sub-watersheds (dependent variable) (here)
  2. GR4J-Raven-lp: Lumped forcing data (independent variables) and streamflow data at 46 sub-watersheds draining to streamflow gauges of objectives 1 and 2 (dependent variable) (here)
  3. VIC, MESH, etc.: Gridded forcing data (independent variables; sent upon request), sub-basin shape-files (to aggregate forcings to sub-watersheds) and streamflow data at 46 sub-watersheds draining to streamflow gauges of objectives 1 and 2 (dependent variable) (here)

Models

ML-LSTM

In this kind of architecture, the model passes the previous hidden state to the next step of the sequence, "remembering" information on data the network has seen before and using it to make decisions. The model is using meteorological forcings and static catchment attributes (["Area2", "Lat_outlet", "Lon_outlet", "RivSlope", "Rivlen","BasinSlope", "BkfWidth", "BkfDepth", "MeanElev", "FloodP_n", "Q_Mean", "Ch_n", "Perim_m", "Regulation"]) as independent variables. Regulation merges the information fom Regulation and Reference (ref -> natural, no-ref -> regulated) into one one-hot-encoded attribute.

  1. EA-LSTM Machine Learning model derived from a Long Short Term Memory (LSTM), as described in Kratzert et al. (2019).
  2. LSTM Vanilla LSTM model. The static catchment attributes are concatenated to each daily forcing step.
ML-XGBoost

XGBoost is a sophisticated Machine Learning approach that trains gradient-boosted regression trees (GBRTs). GBRTs iteratively train K regression trees fk and generate an overall predictions as the sum of their outputs. This model uses lumped forcings and and static catchment attributes (["Area2", "Lat_outlet", "Lon_outlet", "RivSlope", "Rivlen","BasinSlope", "BkfWidth", "BkfDepth", "MeanElev", "FloodP_n", "Q_Mean", "Ch_n", "Perim_m"]) as independent variables.

Other Models

see below

Results

See https://github.com/julemai/GRIP-E/wiki/Results.

Appendix

Literature

Some interesting publications on this topic are:

  • Best (2015): "The Plumbing of Land Surface Models: Benchmarking Model Performance" comparing several Land-Surface models demonstrating that regression models outperform all of them (pdf)
  • Papacharalampous & Tyralis (2018): "Evaluation of random forests and Prophet for daily streamflow forecasting" comparing different ANN methods for streamflow forecasting (pdf)
  • Shen et al. (2018): "HESS Opinions: Incubating deep-learning-powered hydrologic science advances as a community" Hydrol. Earth Syst. Sci., 22, 5639-5656 (pdf)
  • Kratzert et al. (2019): "Benchmarking a Catchment-Aware Long Short-Term Memory Network (LSTM) for Large-Scale Hydrological Modeling", Hydrol. Earth Syst. Sci. Discuss., doi, in review, 2019. (pdf)
  • Feel free to add...

The Museum of Ancient Models

This section contains descriptions for models we explored at some point but have abandoned later.

ML-LinReg-Erie

Machine Learning model based on linear regression.

Results for Lake Erie can be found here (objective 1) and here (objective 2).

ML-ConvLSTM-Erie

Machine Learning model based on Convolutional Long Short Term Memory (ConvLSTM). In this kind of architecture, the model passes the previous hidden state to the next step of the sequence. Therefore holding information on previous data the network has seen before and using it to make decisions. The model is using meteorological forcings as independent variables.

  1. Not using DEM or Land Cover data: Results for Lake Erie can be found here (objective 1) and here (objective 2).

  2. Using Digital Elevation Model (DEM) data as independent variables: Results for Lake Erie can be found here (objective 1) and here (objective 2).

  3. Using Land Cover data as independent variables: Results for Lake Erie can be found here (objective 1) and here (objective 2).

  4. Using Digital Elevation Model (DEM) and Land Cover data as independent variables: Results for Lake Erie can be found here (objective 1) and here (objective 2).

Funded under IMPC project of Global Water Futures program.

Table of contents

Clone this wiki locally