-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utilities for coverage metadata #9
Comments
@mbjones What is the best way to connect information in a coverage node to that in a attributeList? e.g. if I have a column header that is a species, or I have a column consisting of different species names, it seems logical that I would want to tie this information to the taxanomic coverage section of the node. I gather I can do this with Also, does each EML file have a single coverage node? Does it always appear as a child of a dataset (or equivalent module?) or does it appear in deeper levels (e.g. at the attributeList level, etc). I imagine the same would naturally apply to cases where time or geographic coordinates appear in attribute elements as well. Therefore I'm also trying to imagine the best user interface to address this -- e.g. declaring a column to have species names automatically fills out the coverage for taxanomic coverage, etc... |
@cboettig That's correct -- you give an id to the attribute you want to describe, then put a coverage element in the additionalMetadata section, and use the You can repeat the items under the coverage tree if you have discontinuous coverages, such as a temporal coverage from 1990-1993 and 1998-2000, or disjoint spatial areas. Many groups use this to provide bounding boxes for discrete sampling areas that are not contiguous. |
@mbjones Thanks, that implementation makes sense. Just to make sure I understand, after assigning an id to the <additionalMetadata>
<describes>1838f0b53178056585f5bda86818ca30</describes>
<metadata>
<coverage>
... How about if the column values were species names (instead of the column headers), would I denote them as a nominal/enumerated domain and I would put the attribute on the "definition" element? e.g. <attribute id="1838f0b53178056585f5bda86818ca30">
<attributeName>Species</attributeName>
<attributeDefinition>Species name</attributeDefinition>
<measurementScale>
<nominal>
<nonNumericDomain>
<enumeratedDomain>
<codeDefinition>
<code>coho</code>
<definition>Oncorhynchus kisutch</definition> and so on for all species in the attribute list? Or would I do better to document at the level of the attribute and list all species? (The latter seems more concise, but also seems to leave the mapping between the species names used in the columns and the semantically precise species names from the columns ambiguous?) It's a good point that this all might be rather academic if existing software ignores it. It would be good to keep this in mind at least. Is there a list of software that consumes EML? I've only come across metacat and morpho. Harvard Forest tells me they generate their EML using a combination of generic XML editor Oxygen (for stuff above the We could generate the In light of the 'standard usage', perhaps it would be better to handle this kind of thing using the kind of semantic annotation approaches discussed in #5 (also see issue #13)? e.g. I've always been a bit confused by the division -- for some terms EML appears to give us semantically precise vocabulary for certain things: e.g. the taxonomic, geospatial, and temporal terms in "coverage", as well as the units. On the other hand, perhaps the EML notion of "gram" is less powerful the OBOE notion http://ecoinformatics.org/oboe/oboe.1.0/oboe-standards.owl#Gram because the latter is part of an OWL ontology that enables further reasoning? Perhaps we wait to write a separate package provides such semantic annotation automatically when you declare column "Species" has species names, etc. |
Coverage classes are now defined.
|
Need to write out tests for coverage constructors still. Also working on implementation of extractors to turn S4 into nicer R objects. Still need to write out extractors for each measurementScale type and write unit tests. Some minor tweaks added, we should be passing all tests at the moment.
EML coverage nodes specify taxanomic, geographic, and temporal coverage.
They can refer to a dataset node but can also be used to define coverage of individual columns (e.g. a species column) or individual cells in a column (e.g. the species name). The latter is much richer but less commonly implemented.
taxonomic coverage
see eml taxa documentation
@schamberlain I think ideally taxonomic coverage would make use of
taxize_
to help identify and correct species names. While higher taxonomic information can be specified, this would probably best be reserved for cases not referring to a particular species, since (a) we can already programmatically recover the rest of the classification given the genus and species, and (b) higher taxonomy may be inconsistent anyway.temporal coverage
See eml temporal coverage documentation
We'll want to automatically decide if the coverage is a specific range of calendar dates, an estimated timescale (geological timescale), approximate uncertainty, and whether to include any citations to literature describing dating method (e.g. carbon dating). Could be a whole wizard / module....
Meanwhile, just supporting manual definition of this structure would be a good start.
geographic coverage
Can be bounding box, polygon, or geographicDescription (e.g. "Oregon"). Tempting to process natural language descriptions into coordinates, but that throws out true data in place of estimated data (e.g. best left to read-eml world, not the write-eml).
The text was updated successfully, but these errors were encountered: