New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for YAML as input format for metadata control file #54

Closed
alexandreleroux opened this Issue Sep 15, 2016 · 7 comments

Comments

Projects
None yet
2 participants
@alexandreleroux
Contributor

alexandreleroux commented Sep 15, 2016

Long term feature request, no rush to implement this on my end.

It would be useful if pygeometa was to support YAML as input format for metadata in addition to the MCF format.

@tomkralidis

This comment has been minimized.

Show comment
Hide comment
@tomkralidis

tomkralidis Oct 28, 2016

Member

@alexandreleroux questions/comments:

  • what's the value proposition around moving to YAML?
  • we can certainly do more with YAML as a config format (nesting, better includes, cardinality, etc.)
  • should we support both, or just a hard move to YAML (with a migration script for MCF to YAML)? If we want to support both, then then YAML possibilities are bound to the MCF (i.e. the ini file) limitations
  • here's a sample YAML representation of https://github.com/geopython/pygeometa/blob/master/sample.mcf
- metadata
    identifier: 3f342f64-9348-11df-ba6a-0014c2c00eab
    language: en
    language_alternate: fr
    charset: utf8
    parentidentifier: someparentid
    hierarchylevel: dataset
    datestamp: 2014-11-11
    dataseturi: http://some/minted/uri

- spatial
    datatype: vector
    geomtype: point
    crs: 4326
    bbox: -141,42,-52,84

- identification
    language: eng; CAN
    charset: utf8
    title_en: title in English
    title_fr: title in French
    abstract_en: abstract in English
    abstract_fr: abstract in French
    keywords_en: kw1 in English,kw2 in English,kw3 in English
    keywords_fr: kw1 in French,kw2 in French,kw3 in French
    keywords_wmo: FOO,BAR
    keywords_type: theme
    keywords_gc_cst_en: kw1,kw2
    keywords_gc_cst_fr: kw1,kw2
    topiccategory: climatologyMeteorologyAtmosphere
    publication_date: 2000-09-01T00:00:00Z
    fees: None
    accessconstraints: otherRestrictions
    rights_en: Copyright (c) 2010 Her Majesty the Queen in Right of Canada
    rights_fr: Copyright (c) 2010 Her Majesty the Queen in Right of Canada
    url: http://geogratis.ca/geogratis/en/product/search.do?id=08DB5E85-7405-FE3A-2860-CC3663245625
    temporal_begin: 1950-07-31
    temporal_end: now
    status: onGoing
    maintenancefrequency: continual

- contact:main
    organization: Environment Canada
    url: http://www.ec.gc.ca/
    individualname: Tom Kralidis
    positionname: Senior Systems Scientist
    phone: +01-123-456-7890
    fax: +01-123-456-7890
    address: 4905 Dufferin Street
    city: Toronto
    administrativearea: Ontario
    postalcode: M3H 5T4
    country: Canada
    email: foo@bar.tld
    hoursofservice: 0700h - 1500h EST
    contactinstructions: email

- contact:distribution
    #ref: contact:main
    organization: Environment Canada
    url: http://www.ec.gc.ca/
    individualname: Tom Kralidis
    positionname: Senior Systems Scientist
    phone: +01-123-456-7890
    fax: +01-123-456-7890
    address: 4905 Dufferin Street
    city: Toronto
    administrativearea: Ontario
    postalcode: M3H 5T4
    country: Canada
    email: foo@bar.tld
    hoursofservice: 0700h - 1500h EST
    contactinstructions: email

- distribution:waf
    url: http://dd.meteo.gc.ca
    type: WWW:LINK
    name: my waf
    description_en: description in English
    description_fr: description in French
    function: download

- distribution:wms
    url: http://dd.meteo.gc.ca
    type: OGC:WMS
    name_en: roads
    name_fr: routes
    description_en: description in English
    description_fr: description in French
    function: download
Member

tomkralidis commented Oct 28, 2016

@alexandreleroux questions/comments:

  • what's the value proposition around moving to YAML?
  • we can certainly do more with YAML as a config format (nesting, better includes, cardinality, etc.)
  • should we support both, or just a hard move to YAML (with a migration script for MCF to YAML)? If we want to support both, then then YAML possibilities are bound to the MCF (i.e. the ini file) limitations
  • here's a sample YAML representation of https://github.com/geopython/pygeometa/blob/master/sample.mcf
- metadata
    identifier: 3f342f64-9348-11df-ba6a-0014c2c00eab
    language: en
    language_alternate: fr
    charset: utf8
    parentidentifier: someparentid
    hierarchylevel: dataset
    datestamp: 2014-11-11
    dataseturi: http://some/minted/uri

- spatial
    datatype: vector
    geomtype: point
    crs: 4326
    bbox: -141,42,-52,84

- identification
    language: eng; CAN
    charset: utf8
    title_en: title in English
    title_fr: title in French
    abstract_en: abstract in English
    abstract_fr: abstract in French
    keywords_en: kw1 in English,kw2 in English,kw3 in English
    keywords_fr: kw1 in French,kw2 in French,kw3 in French
    keywords_wmo: FOO,BAR
    keywords_type: theme
    keywords_gc_cst_en: kw1,kw2
    keywords_gc_cst_fr: kw1,kw2
    topiccategory: climatologyMeteorologyAtmosphere
    publication_date: 2000-09-01T00:00:00Z
    fees: None
    accessconstraints: otherRestrictions
    rights_en: Copyright (c) 2010 Her Majesty the Queen in Right of Canada
    rights_fr: Copyright (c) 2010 Her Majesty the Queen in Right of Canada
    url: http://geogratis.ca/geogratis/en/product/search.do?id=08DB5E85-7405-FE3A-2860-CC3663245625
    temporal_begin: 1950-07-31
    temporal_end: now
    status: onGoing
    maintenancefrequency: continual

- contact:main
    organization: Environment Canada
    url: http://www.ec.gc.ca/
    individualname: Tom Kralidis
    positionname: Senior Systems Scientist
    phone: +01-123-456-7890
    fax: +01-123-456-7890
    address: 4905 Dufferin Street
    city: Toronto
    administrativearea: Ontario
    postalcode: M3H 5T4
    country: Canada
    email: foo@bar.tld
    hoursofservice: 0700h - 1500h EST
    contactinstructions: email

- contact:distribution
    #ref: contact:main
    organization: Environment Canada
    url: http://www.ec.gc.ca/
    individualname: Tom Kralidis
    positionname: Senior Systems Scientist
    phone: +01-123-456-7890
    fax: +01-123-456-7890
    address: 4905 Dufferin Street
    city: Toronto
    administrativearea: Ontario
    postalcode: M3H 5T4
    country: Canada
    email: foo@bar.tld
    hoursofservice: 0700h - 1500h EST
    contactinstructions: email

- distribution:waf
    url: http://dd.meteo.gc.ca
    type: WWW:LINK
    name: my waf
    description_en: description in English
    description_fr: description in French
    function: download

- distribution:wms
    url: http://dd.meteo.gc.ca
    type: OGC:WMS
    name_en: roads
    name_fr: routes
    description_en: description in English
    description_fr: description in French
    function: download
@alexandreleroux

This comment has been minimized.

Show comment
Hide comment
@alexandreleroux

alexandreleroux Nov 15, 2016

Contributor

My non-expert understanding:

  • yaml's indentation makes it much easier to read than the mcf / ini format
    • this is significant given that's what data stewards in an organisation leveraging pygeometa will use to view and edit metadata records
  • like you pointed out already, yaml and its libraries support nesting / includes / cardinality
  • I'm fine with supporting only yaml in the future. A tool to migrate from mcf to yaml would indeed be nice for existing pygeometa users, but not necessary?

Have I forgot or am I wrong on anything? Thx

Contributor

alexandreleroux commented Nov 15, 2016

My non-expert understanding:

  • yaml's indentation makes it much easier to read than the mcf / ini format
    • this is significant given that's what data stewards in an organisation leveraging pygeometa will use to view and edit metadata records
  • like you pointed out already, yaml and its libraries support nesting / includes / cardinality
  • I'm fine with supporting only yaml in the future. A tool to migrate from mcf to yaml would indeed be nice for existing pygeometa users, but not necessary?

Have I forgot or am I wrong on anything? Thx

@alexandreleroux

This comment has been minimized.

Show comment
Hide comment
@alexandreleroux

alexandreleroux Feb 9, 2017

Contributor

@tomkralidis in your YAML-MCF example, is there a reason both Contact and Distribution are split into distinct yaml elements? Example, why not, for distribution:

- distribution:
    - waf
        url: http://dd.meteo.gc.ca
        type: WWW:LINK
        name: my waf
        description_en: description in English
        description_fr: description in French
        function: download
    -wms
        url: http://dd.meteo.gc.ca
        type: OGC:WMS
        name_en: roads
        name_fr: routes
        description_en: description in English
        description_fr: description in French
        function: download

Similar comment for keywords under identification, could they be grouped together?

- identification
    language: eng; CAN
    charset: utf8
    title_en: title in English
    title_fr: title in French
    abstract_en: abstract in English
    abstract_fr: abstract in French
    keywords:
        keywords_en: kw1 in English,kw2 in English,kw3 in English
        keywords_fr: kw1 in French,kw2 in French,kw3 in French
        keywords_wmo: FOO,BAR
        keywords_type: theme
        keywords_gc_cst_en: kw1,kw2
        keywords_gc_cst_fr: kw1,kw2

Does such groupings make sense? Thx! -- Alex

Contributor

alexandreleroux commented Feb 9, 2017

@tomkralidis in your YAML-MCF example, is there a reason both Contact and Distribution are split into distinct yaml elements? Example, why not, for distribution:

- distribution:
    - waf
        url: http://dd.meteo.gc.ca
        type: WWW:LINK
        name: my waf
        description_en: description in English
        description_fr: description in French
        function: download
    -wms
        url: http://dd.meteo.gc.ca
        type: OGC:WMS
        name_en: roads
        name_fr: routes
        description_en: description in English
        description_fr: description in French
        function: download

Similar comment for keywords under identification, could they be grouped together?

- identification
    language: eng; CAN
    charset: utf8
    title_en: title in English
    title_fr: title in French
    abstract_en: abstract in English
    abstract_fr: abstract in French
    keywords:
        keywords_en: kw1 in English,kw2 in English,kw3 in English
        keywords_fr: kw1 in French,kw2 in French,kw3 in French
        keywords_wmo: FOO,BAR
        keywords_type: theme
        keywords_gc_cst_en: kw1,kw2
        keywords_gc_cst_fr: kw1,kw2

Does such groupings make sense? Thx! -- Alex

@tomkralidis

This comment has been minimized.

Show comment
Hide comment
@tomkralidis

tomkralidis Feb 9, 2017

Member

Makes sense, and we can apply it to different sections as well. This will be nice as it will allow clean grouping that wasn't possible before given the ini format.

Updated example:

metadata:
    identifier: 3f342f64-9348-11df-ba6a-0014c2c00eab
    language: en
    language_alternate: fr
    charset: utf8
    parentidentifier: someparentid
    hierarchylevel: dataset
    datestamp: 2014-11-11
    dataseturi: http://some/minted/uri

spatial:
    datatype: vector
    geomtype: point
    crs: 4326
    bbox: -141,42,-52,84

identification:
    language: eng; CAN
    charset: utf8
    title_en: title in English
    title_fr: title in French
    abstract_en: abstract in English
    abstract_fr: abstract in French
    keywords:
        default:
            keywords_en: [kw1 in English,kw2 in English,kw3 in English]
            keywords_fr: [kw1 in French,kw2 in French,kw3 in French]
        wmo:
            keywords_en: [FOO,BAR]
            keywords_type: theme
        gc_cst:
            keywords_en: [kw1,kw2]
            keywords_fr: [kw1,kw2]
    topiccategory: climatologyMeteorologyAtmosphere
    publication_date: 2000-09-01T00:00:00Z
    fees: None
    accessconstraints: otherRestrictions
    rights_en: Copyright (c) 2010 Her Majesty the Queen in Right of Canada
    rights_fr: Copyright (c) 2010 Her Majesty the Queen in Right of Canada
    url: http://geogratis.ca/geogratis/en/product/search.do?id=08DB5E85-7405-FE3A-2860-CC3663245625
    temporal_begin: 1950-07-31
    temporal_end: now
    status: onGoing
    maintenancefrequency: continual

contact:
    main: &contact_main
        organization: Environment Canada
        url: http://www.ec.gc.ca/
        individualname: Tom Kralidis
        positionname: Senior Systems Scientist
        phone: +01-123-456-7890
        fax: +01-123-456-7890
        address: 4905 Dufferin Street
        city: Toronto
        administrativearea: Ontario
        postalcode: M3H 5T4
        country: Canada
        email: foo@bar.tld
        hoursofservice: 0700h - 1500h EST
        contactinstructions: email

    distribution: *contact_main

distribution:
    waf:
        url: http://dd.meteo.gc.ca
        type: WWW:LINK
        name: my waf
        description_en: description in English
        description_fr: description in French
        function: download

    wms:
        url: http://dd.meteo.gc.ca
        type: OGC:WMS
        name_en: roads
        name_fr: routes
        description_en: description in English
        description_fr: description in French
        function: download
Member

tomkralidis commented Feb 9, 2017

Makes sense, and we can apply it to different sections as well. This will be nice as it will allow clean grouping that wasn't possible before given the ini format.

Updated example:

metadata:
    identifier: 3f342f64-9348-11df-ba6a-0014c2c00eab
    language: en
    language_alternate: fr
    charset: utf8
    parentidentifier: someparentid
    hierarchylevel: dataset
    datestamp: 2014-11-11
    dataseturi: http://some/minted/uri

spatial:
    datatype: vector
    geomtype: point
    crs: 4326
    bbox: -141,42,-52,84

identification:
    language: eng; CAN
    charset: utf8
    title_en: title in English
    title_fr: title in French
    abstract_en: abstract in English
    abstract_fr: abstract in French
    keywords:
        default:
            keywords_en: [kw1 in English,kw2 in English,kw3 in English]
            keywords_fr: [kw1 in French,kw2 in French,kw3 in French]
        wmo:
            keywords_en: [FOO,BAR]
            keywords_type: theme
        gc_cst:
            keywords_en: [kw1,kw2]
            keywords_fr: [kw1,kw2]
    topiccategory: climatologyMeteorologyAtmosphere
    publication_date: 2000-09-01T00:00:00Z
    fees: None
    accessconstraints: otherRestrictions
    rights_en: Copyright (c) 2010 Her Majesty the Queen in Right of Canada
    rights_fr: Copyright (c) 2010 Her Majesty the Queen in Right of Canada
    url: http://geogratis.ca/geogratis/en/product/search.do?id=08DB5E85-7405-FE3A-2860-CC3663245625
    temporal_begin: 1950-07-31
    temporal_end: now
    status: onGoing
    maintenancefrequency: continual

contact:
    main: &contact_main
        organization: Environment Canada
        url: http://www.ec.gc.ca/
        individualname: Tom Kralidis
        positionname: Senior Systems Scientist
        phone: +01-123-456-7890
        fax: +01-123-456-7890
        address: 4905 Dufferin Street
        city: Toronto
        administrativearea: Ontario
        postalcode: M3H 5T4
        country: Canada
        email: foo@bar.tld
        hoursofservice: 0700h - 1500h EST
        contactinstructions: email

    distribution: *contact_main

distribution:
    waf:
        url: http://dd.meteo.gc.ca
        type: WWW:LINK
        name: my waf
        description_en: description in English
        description_fr: description in French
        function: download

    wms:
        url: http://dd.meteo.gc.ca
        type: OGC:WMS
        name_en: roads
        name_fr: routes
        description_en: description in English
        description_fr: description in French
        function: download

tomkralidis added a commit to tomkralidis/pygeometa that referenced this issue Feb 10, 2017

@tomkralidis

This comment has been minimized.

Show comment
Hide comment
@tomkralidis

tomkralidis Feb 10, 2017

Member

@alexandreleroux for review and comment: Initial changes are in https://github.com/geopython/pygeometa/tree/issue-54. Notes:

  • see the updated sample.mcf for updates to the config representation now in YAML
    • contacts are now dicts
    • distributions are now dicts
    • we can now use YAML's native reference/dereferencing features (&ref, *ref) instead of a ref option to reuse other parts of the YAML config
  • HNAP based keywords are now just sections of the main keywords group (keys called hnap_category_information, hnap_category_geography, hnap_category_content, etc.)
Member

tomkralidis commented Feb 10, 2017

@alexandreleroux for review and comment: Initial changes are in https://github.com/geopython/pygeometa/tree/issue-54. Notes:

  • see the updated sample.mcf for updates to the config representation now in YAML
    • contacts are now dicts
    • distributions are now dicts
    • we can now use YAML's native reference/dereferencing features (&ref, *ref) instead of a ref option to reuse other parts of the YAML config
  • HNAP based keywords are now just sections of the main keywords group (keys called hnap_category_information, hnap_category_geography, hnap_category_content, etc.)
@tomkralidis

This comment has been minimized.

Show comment
Hide comment
@tomkralidis

tomkralidis Feb 16, 2017

Member

@alexandreleroux more updates in branch issue-54:

  • MCF now has a section called mcf, which initially has a version field to denote the version of the MCF
  • the cli has been refactored to support a single pygeometa script with subcommands. Examples:
$ pygeometa
Usage: pygeometa [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  generate_metadata
  migrate
$ pygeometa generate_metadata --help
Usage: pygeometa generate_metadata [OPTIONS]

Options:
  --mcf PATH                      Path to metadata control file (.yml)
  --output FILENAME               Name of output file
  --schema [wmo-cmp|iso19139|iso19139-hnap]
                                  Metadata schema
  --schema_local DIRECTORY        Locally defined metadata schema
  --help                          Show this message and exit.
$ pygeometa generate_metadata --mcf=path/to/file.yml --schema=iso19139  # to stdout
# migrate an old MCF file to YAML
$ pygeometa migrate --mcf=path/to/file.mcf
  • added migration script to convert old MCF files to YAML
Member

tomkralidis commented Feb 16, 2017

@alexandreleroux more updates in branch issue-54:

  • MCF now has a section called mcf, which initially has a version field to denote the version of the MCF
  • the cli has been refactored to support a single pygeometa script with subcommands. Examples:
$ pygeometa
Usage: pygeometa [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  generate_metadata
  migrate
$ pygeometa generate_metadata --help
Usage: pygeometa generate_metadata [OPTIONS]

Options:
  --mcf PATH                      Path to metadata control file (.yml)
  --output FILENAME               Name of output file
  --schema [wmo-cmp|iso19139|iso19139-hnap]
                                  Metadata schema
  --schema_local DIRECTORY        Locally defined metadata schema
  --help                          Show this message and exit.
$ pygeometa generate_metadata --mcf=path/to/file.yml --schema=iso19139  # to stdout
# migrate an old MCF file to YAML
$ pygeometa migrate --mcf=path/to/file.mcf
  • added migration script to convert old MCF files to YAML

tomkralidis added a commit that referenced this issue Feb 16, 2017

@tomkralidis tomkralidis added this to the 1.0.0 milestone Mar 23, 2017

@tomkralidis tomkralidis self-assigned this Mar 23, 2017

tomkralidis added a commit that referenced this issue Mar 23, 2017

Merge pull request #67 from geopython/issue-54
Support for YAML as input format for metadata control file (#54)
@tomkralidis

This comment has been minimized.

Show comment
Hide comment
@tomkralidis

tomkralidis Mar 23, 2017

Member

Implemented in master.

Member

tomkralidis commented Mar 23, 2017

Implemented in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment