-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gap Analysis GoMexSI file format with EOL's DarwinCore export format #16
Comments
Is the DarwinCore formats you speak of at this website?? http://rs.tdwg.org/dwc/terms/index.htm If so, I see some major difficulties. There are certain terms that we do have in common, but in some cases they are utilizing them differently. I don't know how to tackle this. I think the time and effort to try to conform completely to Darwin Core may not be possible at this time. Looking at what they have done, it appears to be a daunting task that really would require some experts and a lot time to do. The Darwin Core seems to be specifically for taxonomic collections. I will continue to study it but I would like a Darwin Core expert to look at our data categories and comment or make suggestions before we go any further. They may see it in a different light than I do. I was once part of an effort to transform the CF database to an SQL database and that took an SQL expert and a number of CF biologists in a room together going through item by item on how to convert or group or whatever was necessary to make the conversion, and it took a number of sessions to complete. |
Hi, Jim! No, the extension we are roughing up at EOL is not yet documented in the DarwinCore docs as far as I know. I think we're maybe considering publishing it eventually, but not before you guys have gotten your hands on it and had your say. We don't even really have documentation for it yet, but I do have a template file. (some new metadata fields since last you saw it, Jorrit). These are the elements we have so far: AssociationID, Occurrence ID (required), Association Type (required), Target Occurrence ID (required), Determined Date, Determined By, Measurement Method, Remarks, Source, Citation, Contributor, ReferenceID http://eol.org/schema/associationID, http://rs.tdwg.org/dwc/terms/occurrenceID, http://eol.org/schema/associationType, http://eol.org/schema/targetOccurrenceID, http://rs.tdwg.org/dwc/terms/measurementDeterminedDate, http://rs.tdwg.org/dwc/terms/measurementDeterminedBy, http://rs.tdwg.org/dwc/terms/measurementMethod, http://rs.tdwg.org/dwc/terms/measurementRemarks, http://purl.org/dc/terms/source, http://purl.org/dc/terms/bibliographicCitation, http://purl.org/dc/terms/contributor, http://eol.org/schema/reference/referenceID We have draft definitions and so forth, but still- no documentation yet, and not published. (and the eol.org/schema uris don't resolve, sorry!) Still feeling our way. |
Hi Jen, Thanks for that information. Have you seen a copy of the column headers I am using for the GoMexSI data?? I would like to start serving all that data in csv outputs but not sure about waiting until we have compliance with Darwin Core or EOL. I would really be great if a group of us could get together in the same place to hash this out, but I don’t see that happening any time soon. Are those live links in your email?? I tried the first one and got the EOL website but it said “Not Found” on the page. Take care, Jim |
@jhammock thanks for jumping in @jsimons9 You can find the archive using the Jen's template in the usual data access section on the GloBI wiki: https://github.com/jhpoelen/eol-globi-data/wiki#accessing-species-interaction-data . Although the data format is pretty self explanatory, I'd be happy to spend some time with you to help understand the format that the EOL team has formulated. . . actually . . . this is perfect timing to make suggestions for changes / enhancement since the format is still very much evolving. Having two separate formats is sometimes pragmatic (but somewhat expensive in the long term), but I think we should at least understand the differences between the two by answering the questions: a) how does the one format translate to the other? and b) what is lost in translation? . These two questions should be answered in the gap analysis. Thanks for being patient! |
After some in-person discussion with Jim, I figured that the GoMexSI format is optimized for data entry, while the darwin core archive is optimized for exchange of large datasets between bioinformatics systems. This is why I believe that a gap analysis between the two formats is no necessary. Instead, we'll continue to effort to ensure the quality of the data mapping from GoMexSI format to normalized data elements that GloBI produces. Closing issue. |
as discussed with @jsimons9 today :
GoMexSI might be able to leverage the DarwinCore associations export format that is proposed by EOL. The advantage would be that we'd only have to support a single export format (less work). To see whether GoMexSI can re-use the DarwinCore export format proposed by EOL, we need to make a list of missing data elements / concepts that would need to be added.
The text was updated successfully, but these errors were encountered: