-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R package to access dataset registry #109
Comments
I'd be happy to help with implementation and testing. Making the OTN dataset registry accessible programmatically would be valuable. |
@iimog thanks for offering your help. One idea I had to make it easier to access the OTN registry programmatically, was to render the OTN dataset registry as The R rpackage would then take the feed and munch on it to make it available in appropriate R data slang (e.g., data frames, vectors) @iimog curious to hear your thoughts! |
I think I could help with the munching. For rendering the data into standardised data.frames, the traitdataform package could be useful. It applies the ETS terms to an input dataset and returns a long-table output, while maintaining metadata elements in the attributes of the R object. The feed would need to provide input recipes for all trait datasets, i.e.
Those can be parsed into a script such as this for the Arthropod Traits Set. Then applying |
I like these suggestions. Making the information from the registry available as json is a good first step. Using |
Others who registered some interest via email are: Luca Santini, @willpearse, @hoganhaben, Jerome Mathieu and I've encouraged them to get involved via GitHub instead. |
This is super cool; thanks for the ping Rachael. I would love to help with this! My two quick thoughts are: For making the registry accessible @iimog , I think your idea of working directly with what we've got now from Jekyll is a Very Good Idea. That will mean we aren't doing things twice, which is going to make everyone's lives easier, even if the initial set-up is hard. For loading actual trait datasets... I'm perhaps a bit biased, but given my experiences with doing something like this with MADtraits (https://github.com/willpearse/MADtraits) I think focusing on only those datasets that use |
yes, Thank you @rachaelgallagher. I am happy to help with coding. I like @fdschneider's idea on using traitdataform::standardize to standardize the datasets. Please let me know how I can help. |
Great to see excitement about making the OTN datasets easy to work with in R. Obviously there's a lot of work to do to make this happen (translations scripts, metadata improvements, OTN R package repository, picking a package name, etc) . . . but to get the party started, I just added the official(TM) OTN dataset feed via https://opentraits.org/datasets.json . |
@rachaelgallagher can I make an R package called 'ROpenTraits'? If so, I'll get @jhpoelen 's JSON into an R package and we will have started "something". Apologies for asking before doing something so trivial, but as there are only two repos on the GitHub account right now I didn't want to do anything naughty! :D |
I think that's fine and can' see any issues @willpearse (though others may flag something!) |
It's alive! It's alliiiiiiiive! Link to quick feature requests thead: open-traits-network/ROpenTraits#2 Get started with the package (and, simultaneously, learn everything you need to know about the package :p):
Could someone help me set up Travis? |
(I appreciate that this is a trivial package, and so thank you all for doing the hard work so I could just swoop in!) |
Great! We have a package. The feature request threat is a very good idea, as it prompted me to think about the intention of the package. A question that comes to my mind is: what is the relationship between ROpenTraits and MADtraits (which also makes lots of public trait data available computationally, right?) and traitdataform. Are the packages competing/duplicating work because many of the OpenTraits datasets are already in MADtraits (and harmonized)? Could the download and harmonization steps developed for MADtraits be useful for ROpenTraits? Or should we exclude any harmonisation from ROpenTraits and just return the raw data as provided by the authors? Then users (including the maintainers of MAD) must develop harmonization themselves (although traitdataform can help with that). My take on this would be: As a primary goal, ROpenTraits should provide all the registered data in their raw form as an R-object (data.frame), automatized through the json feed. A secondary goal could be to provide recipes for harmonization with |
Very nice to see the ideas around the R package floating around. Some minor suggestions / comments:
Once this is in place, I think we are in a great position to do data integrations driven by individual research interests. |
Also . . . there's some issue backlog that we might want to look at first . . . they contain a bunch of legacy dataset registrations that have not been added yet. |
@fdschneider Thanks for bringing up overlaps et al. because these are important things. My understanding was the main idea of OTN was to bring people together and get people sharing data; thus I feel like your primary and secondary goals are more secondary and tertiary, if that makes sense? Most importantly, though, even with those goals I don't think MADtraits and OTN need to compete. To my eyes, the main value This would allow MADtraits to focus on finding other, older datasets that 'fall through the cracks' and aren't compliant, and (R)OTN can focus on getting people to properly format all their new data. Eventually, MADtraits will have no new data to load in because it's all in (R)OTN and that will be fine by me! Perhaps, as we discussed at the meeting in New Orleans and as you mention above, MADtraits might then load some of those data in as well, but honestly I think the idea would be to get everyone so standardised that it doesn't really matter anymore because it's so trivial. If we can get data collectors on-board with OTN et al. then, fingers crossed, life becomes easier for everyone, right? What do you think @rachaelgallagher ? I realise this has spiralled into a bit of a wider discussion! |
in discussion with @rachaelgallagher @jmadin @bmaitner - the creation of an R package came up.
Please share your thoughts on what should be included in the first basic version of the R package and who is interested to make the OTN dataset registry accessible programatically . . .
The text was updated successfully, but these errors were encountered: