Skip to content

Datasets

Julie edited this page Mar 8, 2021 · 66 revisions

Content:

  1. Modelling domain
  2. Digital elevation model
  3. Flow direction and accumulation
  4. Soil data
  5. Groundwater table depth
  6. Land cover data
  7. Meteorologic forcings
  8. Data to calibrate and validate models

Modelling domain

We used the modelling domain of Lake Erie as specified by the Great Lakes Aquatic Habitat Framework (GLAHF). It the domain is shown in the figure below. It is a 103,666 km2 domain including the lake area and 76,352 km2 excluding all the water bodies.

Figure 1: The modelling domain of the Lake Erie watershed is displayed in grey together with major cities.


There are several shape files available for your convenience here.

  1. Shapefile of whole modeling domain (1 shape: land and lake together)
    Lake Erie: data/shapefiles/1_LEB_boundary_mainland+lake.zip
  2. Shapefile of only land area (2 shapes: land split into US and Canadian portion)
    Lake Erie: data/shapefiles/2_LEB_boundary_mainland_only.zip
  3. Shapefile of only lake area (1 shape: Lake Erie and St. Clair combined)
    Lake Erie: data/shapefiles/3_LEB_boundary_lake_only.zip
  4. Shapefile of subwatersheds (621 shapes: 621 subwatersheds)
    Lake Erie: data/shapefiles/4_LEB_boundary_subwatershed.zip

Figure 2: The visualization of the four shape files available in the data directory. Different coloured domain indicated separate shapes within one file. The title of each subplot indicates the filename.



Digital elevation model

We used the conditioned SRTM Digital Elevation Model (DEM) from HydroSHEDS.

Common dataset

This dataset is used for all models for Phase 2 and onwards.

The HydroSHEDS DEM has a 3'' resolution which corresponds to about 90 m at the equator. The data can be downloaded here. A documentation can be found here. Data for the Lake Erie Basin are available on GitHub in Esri raster format and NetCDF on a regular lat-lon grid for the domain of latitudes between (40.00041580200195, 44.99958419799805) [degrees north] and longitudes between (-85.99958038330078,-77.00041961669922) [degrees east].

Individual dataset

This dataset is used for VIC, VIC-GRU, HYPE, GR4J-lp, and GR4J-sd in Phase 0 and 1.

The HydroSHEDS DEM also has a 15'' resolution which corresponds to about 500 m at the equator. This dataset is upscaled hydrologically consistent from the 5'' dataset. Hence, they basically count as the same dataset at different resolutions. Data are available on GitHub in Esri raster format and NetCDF on a regular lat-lon grid for the domain of latitudes between (40.381248474121094, 44.23125076293945) [degrees north] and longitudes between (-85.32765197753906,-78.27348327636719) [degrees east].

Figure 3: The Digital Elevation Model used in this study highlighting the Lake Erie watershed modelling domain.

Flow direction and accumulation

The above mentioned DEM was used to derive the flow direction and accumulation with ArcGIS.

Common dataset

This dataset is used for all models for Phase 2 and onwards.

The 3'' data for the Lake Erie basin are available on GitHub in Esri raster format and NetCDF on a regular lat-lon grid for the domain of latitudes between (40.00041580200195, 44.99958419799805) [degrees north] and longitudes between (-85.99958038330078,-77.00041961669922) [degrees east].

Individual dataset

This dataset is used for VIC, VIC-GRU, HYPE, GR4J-lp, and GR4J-sd in Phase 0 and 1.

The 15'' data for the Lake Erie Basin are available on GitHub in Esri raster format and NetCDF on a regular lat-lon grid for the domain of latitudes between (40.381248474121094, 44.23125076293945) [degrees north] and longitudes between (-85.32765197753906,-78.27348327636719) [degrees east].

Figure 4: The ArcGIS derived flow direction (left) and flow accumulation (right) used in this study highlighting the Lake Erie watershed modelling domain.

Soil data

Common dataset

This dataset is used for all models for Phase 2 and onwards.

As a common soil dataset for all models we used the Global Soil Dataset for Earth System Models (GSDE) (30'' ~ 1km) containing 8 layers of soil to a depth of 2.3m. The data can be downloaded here. A publication of this product is available here. It contains also a comparison of the HWSD and GSDE (see Figures 4 to 6).

Individual dataset

This dataset is used for VIC, VIC-GRU, and HYPE in Phase 0 and 1.

These models took the FAO Harmonized World Soil Database (HWSD) v1.2 as our soil database. This data are at a 30'' resolution which corresponds to about 1km at the equator. Data are available for the Lake Erie Basin on GitHub in Esri raster format and NetCDF on a regular lat-lon grid for the domain of latitudes between (40.00416564941406,44.99583435058594) [degrees north] and longitudes between (-85.99583435058594,-77.00416564941406) [degrees east].

Figure 5: The soil classes based on the Harmonized World Soil Database are shown highlighting the Lake Erie watershed modelling domain.

Groundwater table depth data

Common dataset

This dataset is used for all models for Phase 2 and onwards. However, currently only HYPE is using this data.

As the common dataset for groundwater table depths we used the Global patterns of groundwater table depth (30'' ≈ 1 km) available for download here. A documentation can be found in a publication by Fan et al. (2013) here. Provided information are average of observed values at well sites and simulated water table depth at 30'' and 0.25° resolution.

Land cover data

Common dataset

This dataset is used for all models for Phase 2 and onwards.

As a common landcover dataset for all models we used the NALCMS product including 19 land cover classes for North America (30m, Landsat, 2010 from Mexico and Canada, 2011 for U.S.). The data can be downloaded here and a documentation can be found here.

Individual dataset

This dataset is used for VIC in Phase 0 and 1.

The land cover classes were derived from the MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 500m SIN Grid V006 (MCD12Q1_006) provided by the USGS. This global product has a 500 m resolution. Only the data images of 2016 have been used for the initial land cover parametrization. Data are available for Lake Erie basin on GitHub in Esri raster format and NetCDF on a regular lat-lon grid for the domain of latitudes between (40.00264358520508,44.996395111083984) [degrees north] and longitudes between (-85.99735260009766,-77.0043716430664) [degrees east].

Hongren Shen (Waterloo) did a rough comparison among the MODIS land cover data set and other similar products, such as GlobCover 2009, NALCMS 2010, UMD land cover 1990s, and GLAHF 2011-2012. The main reason why he has decided to use MODIS are two as follows:

  1. Acceptable spatial resolutions and it is the latest product, which makes it potentially available if we want to extend our study to the nearest years.
  2. MODIS provides six land cover classification schemes, making it versatile in parameterization procedures.

Figure 6: The land cover classes based on a 500m resolved MODIS product.

Meteorologic forcings

One key point of this study is that all participating models had to use the same forcing dataset, here the Regional Deterministic Reanalysis System version 1 (RDRS-v1).

The RDRS dataset is a preliminary sample of an atmospheric reforecast and precipitation/ land-surface reanalysis dataset that has recently been developed and released by Environment and Climate Change Canada (Gasset et al. 2017, Gasset et al. 2021). The data had been provided to this project already before its public release. The dataset was chosen because of its high spatial and temporal resolution and the availability of all variables required to setup hydrologic and land-surface models. A full list of variables including units and vertical level can be found in the table below. The data are available on CaSPAr (Mai et al. 2020).

This dataset was obtained from short-term (6-h to 18-h lead time) meso-scale (15~km) integrations of the Global Environmental Multiscale (GEM) atmospheric model coupled to the Canadian Land Data Assimilation system (CaLDAS) and to the Canadian Precipitation Analysis (CaPA), launched every 12 hours from initial atmospheric conditions provided by the ERA-Interim reanalysis (Gasset et al. 2021). A technical report is available here.

The data have an hourly temporal resolution and a spatial resolution of about 15 km. It is available for the 5 years of 2010 to 2014. The dataset originally covers North and Central America. The domain of Lake Erie was cropped for this project. Data are available as NetCDF on a rotated lat-lon grid for the domain of latitudes between (38.91452407836914,45.38602828979492) [degrees north] and longitudes between (-85.86407470703125,-77.44696044921875) [degrees east]. The variables available in the dataset are:

Variable Var.
name
Long name Unit Level
Precipitation Rate PR0 Quantity of precipitation [m] SFC
Air Temperature TT Air temperature [°C] 40m
Inc. Shortwave Radiation FB Downward solar flux [W/m2] SFC
Inc. Longwave Radiation FI Surface inc. infrared flux [W/m2] SFC
Atmospheric Pressure P0 Surface pressure [mb] SFC
Specific Humidity HU Specific humidity [kg/kg] 40m
Wind Components UU,VV U/V-component of wind
(along grid X/Y)
[kts] 40m
Corrected Wind Components UUC,VVC U/V-component of wind
(along W-E/S-N direct.)
[kts] 40m
Wind Speed UVC Wind Modulus [kts] 40m
Wind Direction WDC Meteorol. wind direction [degree] 40m

Figure 7: The sub-basins used in this study for (A) objective 1 (low-human impact watersheds) and (B) objective 2 (most downstream gauges) are highlighted with their colored delineated shape. Blue shapes indicate subbasins used for calibration and red shapes indicate validation basins. The Lake Erie watershed is shown as reference as a light gray shaded area. (C) The 15 km gridded forcing data are shown for one point in time (Jan 1, 2012 6pm UTC). (D) Some models processed the gridded forcings into, for example, lumped forcings per subbasin..

Data to calibrate and validate models

Objective 1: Modelling Every Location of Lake Erie Watershed (naturalized monitoring points)

Lake Erie domain

Objective #1 used only stations monitoring daily discharge of naturalized watersheds. This is in total 15 Canadian and 13 US American stations (gauge info (cal) and gauge info (val)).

The data are available in the raw data format (csv/txt (cal)/csv/txt (val)) and in NetCDF (cal)/NetCDF (val) format. The latter one includes a conversion of all data into [m3/s] and a merging of all data and their metadata into one single NetCDF file.

Figure 8: Daily discharge data for all gauging stations of Lake Erie watershed including Lake St Clair of objective #1 for the period of 2010 to 2014 (GRIP-E study).

Objective 2: Modelling only inflows to Lake Erie watershed

Lake Erie domain

Objective #2 is focussing only on getting the inflows to Lake Erie correct. Therefore stations of either naturalized or managed watersheds are used. Only the most downstream gauging stations are considered here. This is 10 Canadian and 21 US American stations (gauge info (cal) and gauge info (val)).

The data are available in the raw data format (csv/txt (cal)/csv/txt (val)) and in NetCDF (cal)/NetCDF (val) format. The latter one includes a conversion of all data into [m3/s] and a merging of all data and their metadata into one single NetCDF file.

Figure 9: Daily discharge data for all gauging stations of objective #2 for the period of 2010 to 2014.