You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a large and growing number of publicly-available datasets that are loadable into xarray from buckets in the Cloud. Currently, however, there is no effective way to discover these datasets.
Using standards like OGC Catalog Service the Web (CSW) and OpenSearch, it would be possible to discover these xarray datasets via sites like data.gov (and data.gov.uk, data.gov.au, etc) but it requires producing the ISO metadata which these sites consume.
It would also be possible to discover [xarray datasets via sites like Google's dataset search, but it would necessary to produce the json-ld metadata that these sites consume.
Since xarray preserves the content of datasets which follow the CF and ACDD metadata conventions, it should be possible to generate both types of metadata in a straightforward way from the xarray dataset object, using metadata tools that have already been developed for datasets that adhere to the CF conventions. The ncISO tool exists that generate ISO records from netCDF or OPeNDAP endpoints, so the mapping from CF/ACDD attributes to ISO could be reused for records from xarray. Similarly, there has been work already done to create nco-json metadata from netcdf files, a complete metadata representation from which the json-ld content could be extracted.
Proposed Work:
Develop code that integrates the nco-json spec into the xarray package, which represent the complete metadata of the xarray object.
Develop code that, from the complete nco-json metadata associated with xarray objects, generates the more restrictive ISO and json-ld metadata formats.
The text was updated successfully, but these errors were encountered:
The current problem with hosting xarray data in the cloud is that hdf does not play well with cloud storage. This is a technical obstacle that is being discussed in many places across xarray, zarr, netCDF, etc. That's why I'm curious about your claim that there are already a large number of publicly available cloud datasets that play well with xarray.
All that said, I am supportive of this idea in general.
We were actually thinking about the Pangeo datasets. The term "large" is subjective of course, and large enough to warrant a catalog, as in: pangeo-data/pangeo#39. We experimented with something along these lines a few weeks ago at the Pangeo workshop, https://gist.github.com/rsignell-usgs/88cfae22896bf9fed5bd36a6689e7210. The goal would be to facilitate discovery of these datasets through their attributes/metadata.
There are a large and growing number of publicly-available datasets that are loadable into xarray from buckets in the Cloud. Currently, however, there is no effective way to discover these datasets.
Using standards like OGC Catalog Service the Web (CSW) and OpenSearch, it would be possible to discover these
xarray
datasets via sites like data.gov (and data.gov.uk, data.gov.au, etc) but it requires producing the ISO metadata which these sites consume.It would also be possible to discover [xarray datasets via sites like Google's dataset search, but it would necessary to produce the json-ld metadata that these sites consume.
Since
xarray
preserves the content of datasets which follow the CF and ACDD metadata conventions, it should be possible to generate both types of metadata in a straightforward way from thexarray
dataset object, using metadata tools that have already been developed for datasets that adhere to the CF conventions. The ncISO tool exists that generate ISO records from netCDF or OPeNDAP endpoints, so the mapping from CF/ACDD attributes to ISO could be reused for records fromxarray
. Similarly, there has been work already done to createnco-json
metadata from netcdf files, a complete metadata representation from which thejson-ld
content could be extracted.Proposed Work:
Develop code that integrates the
nco-json
spec into thexarray
package, which represent the complete metadata of thexarray
object.Develop code that, from the complete
nco-json
metadata associated withxarray
objects, generates the more restrictiveISO
andjson-ld
metadata formats.The text was updated successfully, but these errors were encountered: