Leverage pyessv #1

huard · 2020-01-22T19:55:42Z

I'm guessing catalog building and vocabulary validation could leverage https://github.com/ES-DOC/pyessv. See in particular parsers such as https://github.com/ES-DOC/pyessv/blob/master/pyessv/parsers/cmip6_dataset_id.py

I have no experience with it myself, so maybe not a good fit.

sherimickelson · 2020-04-20T23:41:00Z

@huard it looks like intake-esm-datastore has similar parsing methods. The main difference is that pyessv taps into the control vocabulary to check for valid values.

@andersy005, here's a working example that uses pyessv based on an example notebook they provide:

import pyessv

# Set template.
template = '/glade/collections/cmip/CMIP6/{}/{}/{}/{}/{}/{}/{}/{}/tas_Amon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc'

# Set seperator.
seperator = '/'

# Set collections.
collections = (
    'wcrp:cmip6:activity-id',
    'wcrp:cmip6:institution-id',
    'wcrp:cmip6:source-id',
    'wcrp:cmip6:experiment-id',
    'wcrp:cmip6:member-id',
    'wcrp:cmip6:table-id',
    'wcrp:cmip6:variable-id',
    'wcrp:cmip6:grid-label'
    )

# Set parsing stricness = 1 (raw-name).  
strictness = pyessv.PARSING_STRICTNESS_1

# Create parser.
parser = pyessv.create_template_parser(template, collections, strictness, seperator)

# Parsing
print(parser.parse('/glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Amon/tas/gn/tas_Amon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc'))

The code returns:

{wcrp:cmip6:activity-id:cmip,
 wcrp:cmip6:experiment-id:historical,
 wcrp:cmip6:grid-label:gn,
 wcrp:cmip6:institution-id:ncar,
 wcrp:cmip6:source-id:cesm2,
 wcrp:cmip6:table-id:amon}

If you change Amon in the path to Gmon, it will let you know that this isn't a valid value. It will also require extra code to get the member_id and version in the directory structure working, but I figure this would give us enough of a base for discussion.

@andersy005 do you see any extra benefits for using this library to help with the parsing?

andersy005 · 2020-04-21T00:35:52Z

If you change Amon in the path to Gmon, it will let you know that this isn't a valid value. It will also require extra code to get the member_id and version in the directory structure working, but I figure this would give us enough of a base for discussion.

👍 I like the fact that you can enforce the validity of the parsed attributes.

@andersy005 do you see any extra benefits for using this library to help with the parsing?

Our current parsers are fragile due to the lack of verifying that the parsed attributes are valid. I think taking advantage of pyessv's features would make it easy to guarantee that the built catalogs contain attributes that are compliant with valid vocabularies.

andersy005 · 2020-06-01T23:18:17Z

I am transferring this issue to a new repo: https://github.com/NCAR/ecg

sherimickelson · 2020-06-09T22:53:56Z

@andersy005 Since you've started work on getting the information out of the cmip6 files themselves instead of using file paths, I'm wondering if this is still needed. File paths change and this is where this library is helpful in verifying the file path assumptions that were built in. File attributes are more of an effort to change and are more likely to be correct. All of these attributes are verified to be correct using the same controlled vocabulary that pyessv uses before they can be added to ESGF. If an inconsistency is found between the controlled vocabulary and the file attribute, the file cannot be published to ESGF. The file also cannot be published if one of these attributes is missing. I feel confident with proceeding with harvesting the attributes from the files themselves and not adding the extra check. What do you think?

sherimickelson self-assigned this Apr 27, 2020

andersy005 transferred this issue from NCAR/intake-esm-datastore Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage pyessv #1

Leverage pyessv #1

huard commented Jan 22, 2020

sherimickelson commented Apr 20, 2020

andersy005 commented Apr 21, 2020

andersy005 commented Jun 1, 2020

sherimickelson commented Jun 9, 2020

Leverage pyessv #1

Leverage pyessv #1

Comments

huard commented Jan 22, 2020

sherimickelson commented Apr 20, 2020

andersy005 commented Apr 21, 2020

andersy005 commented Jun 1, 2020

sherimickelson commented Jun 9, 2020