Permalink
Find file Copy path
2df1fc1 Jul 12, 2018
1 contributor

Users who have contributed to this file

245 lines (210 sloc) 10.2 KB

ERA5 Data on S3 via AWS Public Dataset Program

To provide cloud-based access to ERA5 reanalysis data, Planet OS is working in conjunction with the AWS Public Dataset Program to publish and maintain regular updates of ERA5 data in S3.

This documentation outlines the dataset's details, available parameters, location and structure on S3, and includes examples of how to access and work with the data.

Please refer to the ECMWF website for the official ERA5 data documentation.

Introduction

ERA5 Climate reanalysis provides a numerical assessment of the modern climate. It is produced by a similar process as regular numerical weather forecast, a data assimilation and forecast loop, taking into account most of the available meteorological observations and analyses them with state of the art numerical model, producing a continuous, spatially consistent and homogeneous dataset.

The dataset provides all essential atmospheric meteorological parameters like, but not limited to, air temperature, pressure and wind at different altitudes, along with surface parameters like rainfall, soil moisture content and sea parameters like sea-surface temperature and wave height. ERA5 provides data at a considerably higher spatial and temporal resolution than its legacy counterpart ERA-Interim. ERA5 consists of high resolution version with 31 km horizontal resolution, and a reduced resolution ensemble version with 10 members.

Data is currently available since 2008, but will be continuously extended backwards, first until 1979 and then to 1950.

Overview

Source ECMWF WebAPI
Category Climate Reanalysis
Format NetCDF
License Generated using Copernicus Climate Change Service Information 2018. See http://apps.ecmwf.int/datasets/licences/copernicus/ for additional information.
Storage Amazon S3
Location Amazon Resource Name (ARN)
arn:aws:s3:::era5-pds

AWS Region
us-east-1

URL
http://era5-pds.s3.amazonaws.com/
Update Frequency New data is published monthly. The ERA5 Public Release Plan is available at http://climate.copernicus.eu/products/climate-reanalysis

Variables

The table below lists the 18 ERA5 variables that are available on S3. All variables are surface or single level parameters sourced from the HRES sub-daily forecast stream.

Variable Name File Name
10 metre U wind component eastward_wind_at_10_metres.nc
10 metre V wind component northward_wind_at_10_metres.nc
100 metre U wind component eastward_wind_at_100_metres.nc
100 metre V wind component northward_wind_at_100_metres.nc
2 metre dew point temperature dew_point_temperature_at_2_metres.nc
2 metre temperature air_temperature_at_2_metres.nc
2 metres maximum temperature since previous post-processing air_temperature_at_2_metres_1hour_Maximum.nc
2 metres minimum temperature since previous post-processing air_temperature_at_2_metres_1hour_Minimum.nc
Mean sea level pressure air_pressure_at_mean_sea_level.nc
Sea surface temperature sea_surface_temperature.nc
Mean wave period sea_surface_wave_mean_period.nc
Mean direction of wind waves sea_surface_wind_wave_from_direction.nc
Significant height of combined wind waves and swell significant_height_of_wind_and_swell_waves.nc
Snow density snow_density.nc
Snow depth lwe_thickness_of_surface_snow_amount.nc
Surface pressure surface_air_pressure.nc
Surface solar radiation downwards integral_wrt_time_of_surface_direct_downwelling_shortwave_flux_in_air_1hour_Accumulation.nc
Total precipitation precipitation_amount_1hour_Accumulation.nc

The date and time of the variable data is the valid time, with a mapping from forecast time to valid time corresponding to that outlined in Table 0 of the ECMWF ERA5 documentation. In this mapping, the first 12 forecast hours are used from each forecast run, which occur at 06:00 and 18:00 UTC. A sample highlighting key times of this mapping is included below for reference.

Valid Time ERA5 HRES Sub-Daily Forecast
Date Time Date Forecast Run Step
date 00:00 date - 1 18:00 6
date 06:00 date - 1 18:00 12
date 07:00 date 06:00 1
date 18:00 date 06:00 12
date 19:00 date 18:00 1
date 23:00 date 18:00 5

If there are specific variables you would like to recommend for future inclusion, please contact datahub@intertrust.com.

Data Structure

The ERA5 dataset has been transformed to optimize access by specific variables and temporal ranges. To accommodate this, data is divided into distinct NetCDF granules organized by year, month, and variable name.

The data is structured as follows:

/{year}/{month}/main.nc
               /data/{var1}.nc
                    /{var2}.nc
                    /{....}.nc
                    /{varN}.nc

where year is expressed as four digits (e.g. YYYY) and month as two digits (e.g. MM). Individual data variables (var1 through varN) use names corresponding to NetCDF CF standard names convention plus any applicable additional info, such as vertical coordinate.

Granule variable structure and metadata attributes are stored in main.nc. This file contains coordinate and auxiliary variable data, and is also annotated using NetCDF CF metadata conventions.

A sample path for air temperature would take the following form:

/2008/01/data/air_temperature_at_2_metres.nc

Note that due to the nature of the ERA5 forecast timing, which is run twice daily at 06:00 and 18:00 UTC, monthly data files begins with data from 07:00 on the first of the month and continue through 06:00 of the following month. This means the first six hours of data for each month are contained in the previous month’s file.

Versioning

To provide a means for correcting potential processing errors in individual granule files, bucket versioning will be used. This solution allows for consistent S3 file paths for end users of the data, and also allows for recovery of previous file versions if necessary. Should an issue occur that requires the rewriting of data granules, we will publish details of the incident as well as the affected files on the ERA5 dataset page.

In the unlikely event that a major update impacting the data structure or its dimensionality be required, such changes would be published as a distinct version of the dataset.

Data Access

The data is publicly available in the ERA5 S3 bucket (era5-pds) and may be directly accessed there. Please note that the best transfer speeds will be achieved by accessing the data from an EC2 instance located in the same AWS region as the S3 bucket (us-east-1).

Data may be accessed via http using the S3 REST API. To make a GET request, use the bucket name and the full key name for the object. For example, to download air temperature at 2 meters for January, 2008, submit a GET request to the following url: http://era5-pds.s3.amazonaws.com/2008/01/data/air_temperature_at_2_metres.nc

Another option is to use the AWS SDK or CLI. We’ve published a jupyter notebook on GitHub that provides an example of how to access ERA5 data in python using boto.

This dataset is also accessible via the Planet OS Datahub, which provides a RESTful API that supports JSON and CSV responses to point and polygon based queries.

Use Cases & Examples