Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upccafs - Client for CCAFS General Circulation Models Data #82
Comments
Editor checks:
Editor commentsCurrently seeking reviewers. I find the README and documentation a little sparse. I try to follow the rule-of-thumb that a user may encounter the data and APIs we wrap for the first time through the package help or README. Package-level documentation should at least briefly describe the data source and format, and provide links to more detailed documentation at the source. Below is
Reviewers: @mikoontz @manuramon |
|
Reviewers: @mikoontz @manuramon |
|
Here is my review. I hope that helps. Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 5 hours Review CommentsThe CCAFS package has been developed to have a first sight of Climate Change, Agriculture, and Food Security (CCAFS) General Circulation Models (GCM) data in a fast and easy manner. This initial version is written to gather data from CCAFS data from Amazon S3, but I think it would be of great interest to make it more general allowing the use of data stored in other servers/places (for instance, including an URL argument in cc_data_flech). Anyway, the package is well designed and fits its purpose. Here are some specific comments:
|
|
|
Thank you for your reviews, @mikoontz and @manuramon! @sckott When responding to Mike's point, I think we should consider the evolving standard of "the package may be the users' first encounter with this data/API". This doesn't mean one has to document everything in the package, but the README, vignettes, and other documentation should point the reader to adequate online documentation if necessary. I also suggest that one possible solution to the dealing with the raster dependencies is to re-export essential functions and methods from that package (for print/plot etc.), rather than attaching its whole namespace via I will take a closer look at the testing suite in a bit, as well. Mike, in general the Automated Tests category doesn't refer to the tests we run, but the package tests housed in |
|
Oh great, thanks for that. In that case, I can update my review: All tests in
The badges suggest that the automatic tests by AppVeyor and Travis-CI work. I'm not clear on whether reporting that fulfills my reviewer duties. I'm not sure how to get My intuition is to pass the
But I can't image that's correct, since it returns a bunch of failed tests that worked fine when I copied the code directly from within the
|
|
@mikoontz @sckott I took a look at your testing and see there isn't coverage for stuff in |
Good idea.
makes sense
good ideas, will have a go at adding cache testing |
|
thanks for the reviews @manuramon @mikoontz ! |
|
responses to @manuramon
Hmm, thanks for the idea, but i think it's important to keep this pkg specific to CCAFS data, and make new pkgs (following similar pattern) for other data sources. specific comments:
Okay, can do! (see ropensci/ccafs#6)
Thanks! Will expand on what each function does. (see ropensci/ccafs#7)
i guess
Good point, will add that specific thing, and will fill out the vignette more. (see ropensci/ccafs#8)
will fix it. thanks! (see ropensci/ccafs#9)
will fix the type. thanks! (see ropensci/ccafs#9) for reference to
good catch. hmm, that's done via
thanks for catching that, I'll just add
Nice, thanks, will try that out, at least for possibly including for examples. (see ropensci/ccafs#11) |
|
responses to @mikoontz
by I'm not sure what is best. Do you have a preference as a (i assume) potential user?
as @noamross said i'd prefer to re-export essential fxns over putting in
The reasoning behind that is to keep the package as light weight as possible, e.g., https://github.com/ropenscilabs/ccafs/blob/master/NAMESPACE#L23-L24 shows we only import two of the
Thanks! Will definitely improve the pkg level file see ropensci/ccafs#6
No rule per se for CRAN but yeah, good idea, i'll add that in the pkg level man file and in readme (see ropensci/ccafs#13)
it is available like
It seems to be common now to not put
yeah, makes sense to add more explanation. a bucket is the top level thing that files are organized in within S3 (see ropensci/ccafs#15)
right, thanks! noted by other reviewer as well see ropensci/ccafs#5
Probably not the expected behavior. Perhaps returning a
Right, should include a simple explanation about that. When the pkg manual is made they don't wrap lines, so that's why i try to make sure all my code and docs are 80 line width or less. Although in the vignette its md/html, so I guess doesn't matter there so much. (see ropensci/ccafs#17)
Yeah, you got it. The single bracket method is for indexing like
thanks, will fix (see ropensci/ccafs#9) |
|
I'll just respond to the points where you have a question for me. Let me know what other feedback would be helpful.
Whoops, yes I meant the http://ccafs-climate.org website. I would be comfortable working with both platforms in parallel (your package in conjunction with the official CCAFS website) as long as there is some more direction from your package about how to leverage the official websites search capabilities to find the key strings that would be necessary. The package would still provide a valuable service to a novice user. I don't think there is a need to build in a spatial/temporal/data type subsetting engine directly into the package.
Re-exporting essential functions sounds great! I didn't know that was possible. I am on board with keeping the package lightweight, and allowing for potential use cases of the keys that may not require the entire |
|
@mikoontz thanks for your feedback! I would like to add functionality for searching for what data is available, but they sure aren't making it easy to do programmatically - will keep looking into that re-exporting it is |
|
Issues linked in my responses above - dealing with those now |
|
All issues raised by reviewers have been addressed - Anything else to do? There still isn't a way to search for data - for a future package version, I'll see if I can scrape the search page for CCAFS, but i don't think that's a good long-term plan. |
|
Editor comments have been addressed. @mikoontz and @manuramon, could you please take a look and let us know that @sckott's changes have addressed your concerns? |
|
Hi @sckott @noamross and @mikoontz. For me, @sckott has addressed all our concerns in a pretty way. I think the resulted package is nice and useful and for sure that in a (not so far) future this package will be improved with new functionalities (such us search for data). PS: the example in the cc_data_fetch using the progress bar results in one line printed for each percentage point in the vignette. This is a not important issue (in fact, there is not an issue) |
|
Hi @sckott @noamross and @manuramon, I think the package is much improved, largely thanks to the expanded documentation. Thanks for the extra information! I have 4 concerns still: First, I can't get the vignette to run using either of these lines of code:
Second, I still think that the vignette could use an example workflow to get the key from the website, even given the current state of the search functionality with the R interface (i.e. not present, but perhaps in development). As an example, I used the CCAFS website to try to get a key like a user might. Specifically, I tried to recreate getting the key that the vignette uses. Here was my workflow:
Is this the workflow that you would use? If so, can you include it with the vignette or is that outside the scope of the vignette? Third, is this package unable to get the regional datasets? I tried to get the region A1 version of the above key (with a 30 second resolution, since that's all that is available) and the package can fetch but not read. I tried a few other combinations of regional datasets and had the same problem. Are there other data sets you know aren't retrievable? I couldn't figure out how to get a key from the "Bias Correction" or the "Weather Station" parts of the website (I could only retrieve data from the "Spatial Downscaling" section); does that mean those data can't be accessed by the
When I run Fourth, have you had any communication with the CCAFS/CGIAR folks about accessing their data via R instead of their interface? From their "Terms and Conditions" paragraph (copied below) that pops up before you can get a download link, it seems like they'd be interested in assessing the impact of their work.
|
|
@mikoontz to get the vignette working I run the I also have tried to get a regional data set (the example @mikoontz pointed out above) and received the following error (Error in .rasterObjectFromFile(x, band=... : Cannot create a RasterLayer from this file) when running |
|
Thanks @manuramon, you're right. I can run the vignettes now using the I also see that I can call the
|
right - it can be surpressed, which I'll do for the vignette probably |
|
thanks for the help on vignette @manuramon
I don't know, I'll have a look
Yes. |
|
AFAICT "Bias Correction" or the "Weather Station" sections are somehow private data for people that have access. |
|
@mikoontz Try again after reinstalling |
|
working on this a bit trying to search via the website - it's sort of a hot dumpster fire, but there's some progress |
|
if you two have time @manuramon @mikoontz try out new function |
|
i think the dumpster fire has been tamed, let me know what you think |
|
@sckott I have found the |
|
@manuramon thanks, agree that there's not a better way to do it, for the few parameters that only have two options i just use a character string, but most parameters have at least 6 options many of which are quite long strings with spaces and such, |
|
Hi @sckott, @manuramon, and @noamross. Nice package edits! There are three sections to this updated review: the vignette, the regional datasets, and the Vignette Can you add the "amazon_s3_keys" vignette to the package help
Regional Datasets I'm now able to read the regional datasets that incorporate their associated .prj files. It looks like the ccafs_files object print method isn't reporting the correct number of files. When I read the same regional key that we've been using, the resulting object from the
The Just 2 comments on it: First, the exhaustive list of possible argument values in
An example use case would turn the directly coded search of the file key like you had in the
...into something like this, where tab complete helped every step of the way.
I have no idea whether it is good practice to hard code a lookup table like this, but maybe there is a better way to implement the concept. It may also make it easier to for the package maintainer to update the Second, the cc_search() function returns errors if the format= or extent= arguments are integer values, but the |
yeah, will do I'll check on the regional datasets problem cc_search
|
added new vigentte link to pkg level man file ropensci/software-review#82
|
@mikoontz Okay, I think I addressed all your points - changes made - anything else I should do folks? |
|
Awesome @sckott. I see all the points as addressed: vignette is in the help file, good accounting of the files coming from the regional keys, and a way to search with helpful prompts. Green light from me! This is great! cc @manuramon |
|
Great job @sckott. I think this final version of the Green light from me, as well. PS. For me this has been the first time I have reviewed an R package and I have learned a lot and enjoyed it, so thanks to all of you! I hope to have the opportunity to review more packages in the future. |
|
thanks @mikoontz and @manuramon glad you enjoyed it @manuramon ! |
|
Thanks @mikoontz and @manuramon for your reviews and excellent follow-up! @sckott, I've done final checks and we're good to go. |
|
cool, thanks everyone ( @mikoontz @manuramon @noamross ) for improving the package |
Summary
An interface to data from Climate Change, Agriculture, and Food Security (CCAFS) General Circulation Models (GCM) data. Data is stored in Amazon S3, from which we provide functions to fetch data.
https://github.com/ropenscilabs/ccafs
Scientists, particularly those studying climate change, forecasting crop production, etc.
Noppers.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.mdwith a high-level description in the package root or ininst/.Detail
R CMD check(ordevtools::check()) succeed? Paste and describe any errors or warnings: