- Target: Structured gridded data
- Version: Draft 3
- Author/POC: Hayley Song (haejinso@usc.edu)
- Last Modified: Sep 25, 2019
The purpose of this document is to propose a self-describing data format for structured gridded datasets for MINT data catalog and visualization based on the NetCDF and the CF convention.
NetCDF (network Common Data Form) is a file format for storing multidimensional scientific data (variables
) such as pressure, surface temperature, soil moisture content and wind speed. It has been adopted as a standard way to represent scientific data as it facilitates data access. Its main advantages are1:
- self-describing: it includes information about the data it continas so that no external tables are needed to interpret it
- machine-independent: portable across different platforms, eg. MacOS, Windows, Linux
- scalable: a small subset of a large netCDF file can be accessed efficinetly without reading the entire file
We would like to start a discussion on the common convention for structured gridded data for MINT by proposing the following specifications. We have consulted the Unidata group's recommendation on the netCDF attributions for data discovery as well as the CF Conventions, the Attirubte Convention for Data Discovery (ACDD), and the Open Geospatial Convention document.
The purpose of this specification is to establish a unified NetCDF format within MINT (and in the near future, among World Moderlers) for an efficient data exchange and knowledge discovery. In addition, we have created an interactive tool for explororing datasets conforming to this specification (See Figure 1 for a quick demo). We welcome your comments, questions, and any suggestions you might have. Please submit your comments here.
NetCDF convention for MINT structured gridded datasets 0. Required dimensions - X, Y, time bnds - units
- Attribute Convention
- Global attributes: per data file
- Variable attributes: per variable
-
Examples
-
Related materials
MINT NetCDF Visualization requires the input NetCDF file to have the following three dimensions specified. Any violation to the naming convention will likely cause an error in visualization.
Name | Description | Required fields |
---|---|---|
X | Longitude | unit (eg: km)* |
Y | Latitude | unit (eg: km)* |
time | Time | unit (as a valid udunit)* |
bnds | Number of bands, ie. dimensionality of a value in each grid cell. (eg: 1 for a scalar, 3 for RGB, 4 for RGBA) |
units*: units of the data contained by the variable; must be a valid udunits string. For example, “m” (meter), “km” (kilometer), “degrees_north” (for latitude), “degrees_east” (for longitude),
- Time
Two conventions for storing a date/time into a netCDF variable are:
- CF-compliant: as a numeric value with a udunits time unit, such as "seconds since 1992-10-8 15:00:00"
- ISO-compliant: as a string using ISO 8601 encoding, such as "2010-10-25T12:00:00Z".
In addition to the unit field, each dimension is strongely encouraged to contain coordinates. For example:
Dimensions: (X: 294, Y: 348, bnds: 2, time: 1)
Coordinates:
* time (time) datetime64[ns] 2017-01-01
* X (X) float64 22.05 22.15 22.25 ... 51.15 51.25 51.35
* Y (Y) float64 -11.75 -11.65 -11.55 ... 22.85 22.95
Dimensions without coordinates: bnds
Data variables:
Evap_tavg (time, Y, X) float32 ...
LWdown_f_tavg (time, Y, X) float32 ...
Lwnet_tavg (time, Y, X) float32 ...
Psurf_f_tavg (time, Y, X) float32 ...
Qair_f_tavg (time, Y, X) float32 ...
Qg_tavg (time, Y, X) float32 ...
Qh_tavg (time, Y, X) float32 ...
Qle_tavg (time, Y, X) float32 ...
Qs_tavg (time, Y, X) float32 ...
Qsb_tavg (time, Y, X) float32 ...
RadT_tavg (time, Y, X) float32 ...
Rainf_f_tavg (time, Y, X) float32 ...
SM01_Percentile (time, Y, X) float32 ...
SWdown_f_tavg (time, Y, X) float32 ...
SoilMoi00_10cm_tavg (time, Y, X) float32 ...
SoilMoi100_200cm_tavg (time, Y, X) float32 ...
SoilMoi10_40cm_tavg (time, Y, X) float32 ...
SoilMoi40_100cm_tavg (time, Y, X) float32 ...
SoilTemp00_10cm_tavg (time, Y, X) float32 ...
SoilTemp100_200cm_tavg (time, Y, X) float32 ...
SoilTemp10_40cm_tavg (time, Y, X) float32 ...
SoilTemp40_100cm_tavg (time, Y, X) float32 ...
Swnet_tavg (time, Y, X) float32 ...
Tair_f_tavg (time, Y, X) float32 ...
Wind_f_tavg (time, Y, X) float32 ...
time_bnds (time, bnds) datetime64[ns] ...
Note that the attribute names link to the Unidata definitions, and each element is marked with M
,R
,O
, or C
depedning on our specificationn requirement
M
: MandatoryR
: RecommendedO
: OptionalC
: Mandatory under certain conditions- eg. If the dataset has a time dimension,
time_coverage_start
field is mandatory. In this case we marktime_coverage_start
asC
- Similarly, if the dataset has a spatial dimension,
geospatial_bounds_crs
field is mandatory, and will be marked asC
- eg. If the dataset has a time dimension,
- Global attributes
Attribute | Requirement | Description | Example |
---|---|---|---|
title | M | a short description of the dataset | |
summary | R | a paragraph describing the dataset | |
naming_authority | M | the organization that provides the dataset id (below). We recommend using URIs or reverse-DNS naming | edu.isi.workflow |
id | M | UUID as generated by MINT's data catalog system, appended by MINT Workflow ID | |
keywords | R | a comma separated list of keywords and phrasts | |
comment | O | micellaneous information about the data | |
date_created | M | the date on which the data was created | |
date_modified | M | the date on which this data was last modified | |
date_issued | R | the date on which this data was formally issued | |
creator_name | R | the data creator's name | |
creator_email | M | the email address of the data creator | |
institution | R | institution in charge of the dataset | |
project | R | the scientific project that produced the data | |
history | R | a static value, "created by MINT workflow" | |
convention | R | MINT-{versionNumber} |
Values shall be formatted as specified by ISO 8601:2004.
Attribute | Requirement | Description | Example |
---|---|---|---|
time_coverage_start | C | ||
time_coverage_end | C | ||
time_coverage_duration | R | ||
time_coverage_resolution | C | ||
time_units | C | a string | “units since YYYY-MM-DD hh:mm:ss” |
-
Coordinate Reference System Format
There are numerous formats that are used to document a CRS. Three common formats areproj.4
,EPSG
, andWell-known Text (WKT)
formats. Refer to this tutorial for details on conversions among these formats. Following the OGC, we require the geospatial bounds be specified as EPSG code which is a 4-5 digit number that represents particular CRS definition.- List of EPSG codes
- epsg.io: useful service to search EPSG codes
- eg: "EPSG:4326", "urn:ogc:def:crs:EPSG::4326"
- use the string format “+init=epsg:<your_code>” as the value
-
Other attributes (eg.
geospatial_lat_min
)- Values type (except for the
geospatial_bounds_crs
) should be floating point - Units of measurement should be degrees with positive latitudes in the North hemisphere and longitude values increasing toward east
- The minimum lat/lon must be smaller or equal to the maximum lat/lon
- Values type (except for the
Attribute | Requirement | Description | Example |
---|---|---|---|
geospatial_bounds_crs | C | EPSG code | "+init=epsg:4326" |
geospatial_lat_min | O | southernmost latitude covered by the dataset | |
geospatial_lat_max | O | northernmost latitude | |
geospatial_lon_min | O | easternmost longitude | |
geospatial_lon_min | O | westernmost longitude | |
geospatial_bounds | R | lon_min, lat_min, lon_max, lat_max |
- Variable attributes
Attribute | Requirement | Description | Example |
---|---|---|---|
title* | M | a brief description of the dataset | |
standard_name* | R | a name for the variable from a standard list of names listed in the Scientific Variables Ontology | |
long_name* | R | a long descriptive name for the variable | |
units* | M | units of the data contained by the variable; must be a valid udunits string | “m” (meter), “km” (kilometer), “degrees_north” (for latitude), “degrees_east” (for longitude), “K” (temperature in Kelvin), “Pa” (pressure in Pascal) |
valid_min | M | ||
valid_max | M | ||
valid_range | M | ||
missing_value | M | ||
fill_value | M |
- Unidata's CF Convention: Official doc, Overview slides
- CF Metadata conventions Official doc
- List of netCDF Conventions - link
- CF standard name stable (for
standard_name
attribute) - link - Attribute Convention for Data Discovery convention: link
- udunits (for
units
attribute) - Official doc - CRS (for CRS's epsg code)
- EPSG.io
- Example netCDF file - link
- https://github.com/mintproject/MINT-GeoViz/blob/master/examples/notebooks/01_xarray_intro.ipynb
- How to use
xarray
(Python library)- Read netcdf files as a Dataset and manipulate data
- Create a Dataset representing a NetCDF
- Define labelled dimensions (optionally with coordinate data)
- Write to a file
- How to use