Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gap Analysis GoMexSI file format with EOL's DarwinCore export format #16

Closed
jhpoelen opened this issue Sep 4, 2013 · 7 comments
Closed
Assignees

Comments

@jhpoelen
Copy link
Member

jhpoelen commented Sep 4, 2013

as discussed with @jsimons9 today :

GoMexSI might be able to leverage the DarwinCore associations export format that is proposed by EOL. The advantage would be that we'd only have to support a single export format (less work). To see whether GoMexSI can re-use the DarwinCore export format proposed by EOL, we need to make a list of missing data elements / concepts that would need to be added.

@ghost ghost assigned jsimons9 Sep 4, 2013
@jhpoelen
Copy link
Member Author

jhpoelen commented Sep 4, 2013

@jsimons9 let me know if you need more info

@jhammock please let us know if there's updates on the DarwinCore format that you've shared with us some time ago.

@jsimons9
Copy link
Collaborator

Is the DarwinCore formats you speak of at this website?? http://rs.tdwg.org/dwc/terms/index.htm

If so, I see some major difficulties. There are certain terms that we do have in common, but in some cases they are utilizing them differently. I don't know how to tackle this. I think the time and effort to try to conform completely to Darwin Core may not be possible at this time. Looking at what they have done, it appears to be a daunting task that really would require some experts and a lot time to do. The Darwin Core seems to be specifically for taxonomic collections. I will continue to study it but I would like a Darwin Core expert to look at our data categories and comment or make suggestions before we go any further. They may see it in a different light than I do.

I was once part of an effort to transform the CF database to an SQL database and that took an SQL expert and a number of CF biologists in a room together going through item by item on how to convert or group or whatever was necessary to make the conversion, and it took a number of sessions to complete.

@jhammock
Copy link
Collaborator

Hi, Jim! No, the extension we are roughing up at EOL is not yet documented in the DarwinCore docs as far as I know. I think we're maybe considering publishing it eventually, but not before you guys have gotten your hands on it and had your say. We don't even really have documentation for it yet, but I do have a template file. (some new metadata fields since last you saw it, Jorrit). These are the elements we have so far:

AssociationID, Occurrence ID (required), Association Type (required), Target Occurrence ID (required), Determined Date, Determined By, Measurement Method, Remarks, Source, Citation, Contributor, ReferenceID

http://eol.org/schema/associationID, http://rs.tdwg.org/dwc/terms/occurrenceID, http://eol.org/schema/associationType, http://eol.org/schema/targetOccurrenceID, http://rs.tdwg.org/dwc/terms/measurementDeterminedDate, http://rs.tdwg.org/dwc/terms/measurementDeterminedBy, http://rs.tdwg.org/dwc/terms/measurementMethod, http://rs.tdwg.org/dwc/terms/measurementRemarks, http://purl.org/dc/terms/source, http://purl.org/dc/terms/bibliographicCitation, http://purl.org/dc/terms/contributor, http://eol.org/schema/reference/referenceID

We have draft definitions and so forth, but still- no documentation yet, and not published. (and the eol.org/schema uris don't resolve, sorry!) Still feeling our way.

@jsimons9
Copy link
Collaborator

Hi Jen,

Thanks for that information. Have you seen a copy of the column headers I am using for the GoMexSI data?? I would like to start serving all that data in csv outputs but not sure about waiting until we have compliance with Darwin Core or EOL. I would really be great if a group of us could get together in the same place to hash this out, but I don’t see that happening any time soon.

Are those live links in your email?? I tried the first one and got the EOL website but it said “Not Found” on the page.

Take care,

Jim

@jhpoelen
Copy link
Member Author

@jhammock thanks for jumping in

@jsimons9 You can find the archive using the Jen's template in the usual data access section on the GloBI wiki: https://github.com/jhpoelen/eol-globi-data/wiki#accessing-species-interaction-data . Although the data format is pretty self explanatory, I'd be happy to spend some time with you to help understand the format that the EOL team has formulated. . . actually . . . this is perfect timing to make suggestions for changes / enhancement since the format is still very much evolving. Having two separate formats is sometimes pragmatic (but somewhat expensive in the long term), but I think we should at least understand the differences between the two by answering the questions: a) how does the one format translate to the other? and b) what is lost in translation? . These two questions should be answered in the gap analysis.

Thanks for being patient!

@jhammock
Copy link
Collaborator

Sorry for the delay, gents, I just took a manufactured long weekend. Jorrit, in case it matters, we just added some new optional elements to the Associations extension, mostly attribution stuff, all fields that we have for other data and had neglected to make available directly in Associations. And a few optional fields have also been added to the Occurrences sheet, to better accommodate museum collection data. The full list is now:

Occurrences:
http://rs.tdwg.org/dwc/terms/occurrenceID http://rs.tdwg.org/dwc/terms/taxonID http://rs.tdwg.org/dwc/terms/eventID http://rs.tdwg.org/dwc/terms/institutionCode http://rs.tdwg.org/dwc/terms/collectionCode http://rs.tdwg.org/dwc/terms/catalogNumber http://rs.tdwg.org/dwc/terms/sex http://rs.tdwg.org/dwc/terms/lifeStage http://rs.tdwg.org/dwc/terms/reproductiveCondition http://rs.tdwg.org/dwc/terms/behavior http://rs.tdwg.org/dwc/terms/establishmentMeans http://rs.tdwg.org/dwc/terms/occurrenceRemarks http://rs.tdwg.org/dwc/terms/individualCount http://rs.tdwg.org/dwc/terms/preparations http://rs.tdwg.org/dwc/terms/fieldNotes http://rs.tdwg.org/dwc/terms/samplingProtocol http://rs.tdwg.org/dwc/terms/samplingEffort http://rs.tdwg.org/dwc/terms/recordedBy http://rs.tdwg.org/dwc/terms/identifiedBy http://rs.tdwg.org/dwc/terms/dateIdentified http://rs.tdwg.org/dwc/terms/eventDate http://purl.org/dc/terms/modified http://rs.tdwg.org/dwc/terms/locality http://rs.tdwg.org/dwc/terms/decimalLatitude http://rs.tdwg.org/dwc/terms/decimalLongitude http://rs.tdwg.org/dwc/terms/verbatimLatitude http://rs.tdwg.org/dwc/terms/verbatimLongitude http://rs.tdwg.org/dwc/terms/verbatimElevation

Associations:
http://eol.org/schema/associationID http://rs.tdwg.org/dwc/terms/occurrenceID http://eol.org/schema/associationType http://eol.org/schema/targetOccurrenceID http://rs.tdwg.org/dwc/terms/measurementDeterminedDate http://rs.tdwg.org/dwc/terms/measurementDeterminedBy http://rs.tdwg.org/dwc/terms/measurementMethod http://rs.tdwg.org/dwc/terms/measurementRemarks http://purl.org/dc/terms/source http://purl.org/dc/terms/bibliographicCitation http://purl.org/dc/terms/contributor http://eol.org/schema/reference/referenceID

This need not have any effect on your data, unless there are cases where you felt detail was missing previously. Let me know if you're curious about any of the new fields.

Jen

@jhpoelen
Copy link
Member Author

After some in-person discussion with Jim, I figured that the GoMexSI format is optimized for data entry, while the darwin core archive is optimized for exchange of large datasets between bioinformatics systems. This is why I believe that a gap analysis between the two formats is no necessary. Instead, we'll continue to effort to ensure the quality of the data mapping from GoMexSI format to normalized data elements that GloBI produces. Closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants