-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add trait dataset registry entries #4
Comments
@jmadin created an earlier mock-up version of the registry in March using Ruby on Rails https://afternoon-tor-83256.herokuapp.com/ |
I'd be happy to make this mock-up registry reflect the Google Sheets fields that @caterinap created if this helps. And move the rails project over here. The beauty of these modern web app approaches is that they are built on APIs, and so easy to pipe information onto maps, into R, directly into html text, etc. (e.g., "Currently there are XX registered trait datasets."). My only concern is, and someone else mentioned this earlier, that there may already be existing registries out there, and no point re-inventing the wheel. Plus it would take some time to develop properly - although, perhaps the barebones is okay for this stage. |
The Google Sheet fields correspond to Table1 in the paper and were created based on a discussion where all authors were invited to contribute, so maybe it would be worth updating the mock-up registry with those. But first a question, in the website, should we point to the web app or to the google form/csv? If it's possible to add new datasets via the web app then it would be way better than the Google solution! (right now it requires a log-in). |
While I can see a webapp being useful in later stages of the project, I'd suggest to keep things as simple as possible at this stage and make it easy for most, not just ruby/web developers, to contribute. Rather than introducing a webserver and a database, I'd suggest to stick with static, version controlled, lists of files (or tables) managed in github (or google sheets) for now. These lists can then be rendered in html using same Jekyll templates (also see https://jekyllrb.com/docs/datafiles/) or some javascript. The cost of maintaining a webapp and managing associated data in a rail app on heroku should not be under-estimated. Also, since there's relatively few folks coding in rails relative to those that can edit tables or make html pages, I expect that the number of folks that can review, suggest and contribute functionality will be limited. Instead, I'd favor an approach taken by https://github.com/OBOFoundry/OBOFoundry.github.io in which individual datasets (in this case ontologies) get their own file (see https://github.com/OBOFoundry/OBOFoundry.github.io/tree/master/ontology) and can be managed individually be those that maintain the datasets. In my mind, this promotes a sense of ownership and allows for delegation of maintenance of datasets info across a large group of folks. |
@jhpoelen The structure used by the OBOFoundry seems easy enough, and it's easy enough to create a new file in Github by clicking the "create new file" button that we shouldn't have too much trouble getting folks that aren't Github-savvy to contribute. I agree that a webapp would be great, but maybe that's something we build into funding applications? |
Yes Jorrit and Brian - I think that we need to keep it simple for now and apply for funds to employ a developer to support better solutions as we grow.
|
Could we also adopt a similar infrastructure for a registry for scientists? |
+1 totally! @bmaitner would it help to work on same examples? |
@jhpoelen I think so. Perhaps we should start a new Issue for that though? Would be good to discuss what information to include, format, etc. . Or perhaps this goes under the OTN member profiles and map issue? |
A separate issue found like a good plan. Should we re-use #3 or create a new one? |
I think #3 is fine |
I've added placeholders for dataset entries at https://github.com/open-traits-network/open-traits-network.github.io/tree/master/_datasets . Also, I've created a placeholder dataset list page at https://github.com/open-traits-network/open-traits-network.github.io/tree/master/datasets.md which can be reached at https://opentraits.org/datasets . I hope that others can help:
|
Regarding metadata elements, the current trait registry mockup that @jmadin put together had these elements: Dataset name To this list I think I'd add: I'd suggest that small subset of those fields would be mandatory, though. Perhaps: ID |
@bmaitner sounds good to me. Just curious - did you consider using EML-inspired structure https://knb.ecoinformatics.org/external//emlparser/docs/index.html ? What would really help me to provide feedback is a few examples along the lines of #3 . Let me know how I can help. |
Hi everyone, trying to register a dataset on the Google Form and have a couple of questions about traitList (a required field):
I had a look at the mockup (https://afternoon-tor-83256.herokuapp.com/) but could not find an example trait list. Thanks a lot! Hervé P.S.: I am totally new to GitHub and the dataset was part of this paper: https://www.nature.com/articles/ncomms16047 |
I like the idea of using existing standards (and the associated documentation) where possible, but I think we have to consider the trade-off of ease of entering data vs. ease of parsing/searching data. I wonder if relying on especially rigid formatting might discourage users from uploading datasets. However, perhaps some of the less-strict EML fields would be sufficient for our purposes? e.g. generalTaxonomicCoverage, geographicDescription, boundingCoordinates, etc? Possibly with a link to a full EML description? |
Just wanted to say that we had a first "discussion" (it was more a collaborative doc) about the fields of the registry, These are the ones reported in Table 1 of the paper. Concerning standards, it is currently pretty low, but I based all the fields I could on the Darwin Core (e.g. https://dwc.tdwg.org/terms/#decimalLongitude). Happy to change anything you want. The app is not linked to the google form (they were created independently). |
@caterinap thanks for your very helpful reply. Those example records are very useful. I agree we may need to clarify instructions for standardizing trait names further down the line. Unfortunately, most of my traits do not map to existing plant ontologies (this is a bigger issue that I'll have to solve separately). |
Hi @caterinap. I was just autogenerating the dataset markdown files for the website from your google link (above). The doesn't seem to be a field for dataset name. I wanted to name the files with this name, and also display it on dataset webpages. Is this something that you have? Or perhaps should add? |
I'll use the data set URL for now. |
Should we change the menu item and page name from "Datasets" to "Registry" (or "Dataset Registry"? We can continue to call instances of this registry a "Dataset". Thoughts? |
@jmadin I added a |
Excellent, thanks @caterinap |
Agree! I added |
One question: when the dataset has not an official name. Should we set a "standard" (format and content). Something like Geography Taxon Author (trait type? year?) Do we want spaces or underscores? (I agree that it's very frustrating to fill in datasets without a standard!) |
+1 for a standard, even if it is one that we expect will need revision in the future. And I always prefer underscores. The dataset name is just a unique identifier, right? So it doesn't much matter if it sounds a bit odd. I think the more pressing concern is that it be clear so that folks don't accidentally add the same thing twice. For example, someone might also call this "England Decapod Pearse". So perhaps Author_Year_Geography_Taxon<_letter if more than one database present?>, with the constraints that 1) only the first author is used; 2) the smallest political division/taxonomic rank that encompasses all the records is used. I suggest placing author and year first since it might make it easier for folks to scroll through and see if a dataset has been entered already, since author name is less ambiguous than the geographical or taxonomic fields. |
About dataset ids . Most major registries I know (e.g., iDigBio, GBIF) are pushing for using randomly generated UUIDs to identify specific datasets. Also, from Nelson et al. 2018 , https://doi.org/10.1002/aps3.1027 : [...]
|
Can have both transparent and opaque identifiers? @bmaitner I like your proposal. What about things like TRY ? If we follow the standard we might loose the original name for them. Should we just keep the database name when there is one? |
@caterinap but database name is an existing field, so there shouldn't be an issue with losing it (I think?). However, I think the issue of keeping different versions of a data set linked is one we should think about (especially if the main author changes between versions). It might also be worth considering keeping track of which data sets contain/are contained by other data sets. Do we need fields for these relationships? |
@caterinap Is there some way to add the missing dataset names in the Google Form/Sheet that we download? I just re-downloaded the spreadsheet to transfer to the website, but don't want to have to re-enter the missing dataset names each time we do this. |
I've attempted to capture the current dataset registration process at https://github.com/open-traits-network/open-traits-network.github.io/tree/master/_datasets#readme . Please note that the google form is no longer used or referenced. Remaining google form entries have been copied into #45 . Thanks for all the discussion and input. Please open a new issue if you have suggestions ideas. |
Excellent - thanks very much Jorrit
…________________________________
From: Jorrit Poelen <notifications@github.com>
Sent: Thursday, 10 October 2019 2:39 AM
To: open-traits-network/open-traits-network.github.io <open-traits-network.github.io@noreply.github.com>
Cc: Rachael Gallagher <rachael.gallagher@mq.edu.au>; Comment <comment@noreply.github.com>
Subject: Re: [open-traits-network/open-traits-network.github.io] add trait dataset registry entries (#4)
I've attempted to capture the current dataset registration process at https://github.com/open-traits-network/open-traits-network.github.io/tree/master/_members#readme . Please note that the google form is no longer used or referenced. Remaining google form entries have been copied into #45<#45> .
Thanks for all the discussion and input. Please open a new issue if you have suggestions ideas.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#4?email_source=notifications&email_token=AEJQ6N5NIHLHIOQGWQIK2M3QNX3JRA5CNFSM4IOWXLZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAYKG2Y#issuecomment-540058475>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AEJQ6N5WNBTPJY76F2GW7QDQNX3JRANCNFSM4IOWXLZA>.
|
@caterinap suggested to re-use the existing google spreadsheet to populate the trait dataset registry entries.
The text was updated successfully, but these errors were encountered: