Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
ccafs - Client for CCAFS General Circulation Models Data #82
An interface to data from Climate Change, Agriculture, and Food Security (CCAFS) General Circulation Models (GCM) data. Data is stored in Amazon S3, from which we provide functions to fetch data.
Scientists, particularly those studying climate change, forecasting crop production, etc.
Confirm each of the following by checking the box. This package:
Currently seeking reviewers.
I find the README and documentation a little sparse. I try to follow the rule-of-thumb that a user may encounter the data and APIs we wrap for the first time through the package help or README. Package-level documentation should at least briefly describe the data source and format, and provide links to more detailed documentation at the source.
Here is my review. I hope that helps.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
Final approval (post-review)
Estimated hours spent reviewing: 5 hours
The CCAFS package has been developed to have a first sight of Climate Change, Agriculture, and Food Security (CCAFS) General Circulation Models (GCM) data in a fast and easy manner. This initial version is written to gather data from CCAFS data from Amazon S3, but I think it would be of great interest to make it more general allowing the use of data stored in other servers/places (for instance, including an URL argument in cc_data_flech).
Anyway, the package is well designed and fits its purpose.
Here are some specific comments:
@sckott When responding to Mike's point, I think we should consider the evolving standard of "the package may be the users' first encounter with this data/API". This doesn't mean one has to document everything in the package, but the README, vignettes, and other documentation should point the reader to adequate online documentation if necessary.
I also suggest that one possible solution to the dealing with the raster dependencies is to re-export essential functions and methods from that package (for print/plot etc.), rather than attaching its whole namespace via
I will take a closer look at the testing suite in a bit, as well. Mike, in general the Automated Tests category doesn't refer to the tests we run, but the package tests housed in
Oh great, thanks for that. In that case, I can update my review:
All tests in
The badges suggest that the automatic tests by AppVeyor and Travis-CI work. I'm not clear on whether reporting that fulfills my reviewer duties. I'm not sure how to get
My intuition is to pass the
But I can't image that's correct, since it returns a bunch of failed tests that worked fine when I copied the code directly from within the
@sckott I took a look at your testing and see there isn't coverage for stuff in
good ideas, will have a go at adding cache testing
responses to @manuramon
Hmm, thanks for the idea, but i think it's important to keep this pkg specific to CCAFS data, and make new pkgs (following similar pattern) for other data sources.
Okay, can do! (see ropensci/ccafs#6)
Thanks! Will expand on what each function does. (see ropensci/ccafs#7)
Good point, will add that specific thing, and will fill out the vignette more. (see ropensci/ccafs#8)
will fix it. thanks! (see ropensci/ccafs#9)
will fix the type. thanks! (see ropensci/ccafs#9) for reference to
good catch. hmm, that's done via
thanks for catching that, I'll just add
Nice, thanks, will try that out, at least for possibly including for examples. (see ropensci/ccafs#11)
responses to @mikoontz
I'm not sure what is best. Do you have a preference as a (i assume) potential user?
as @noamross said i'd prefer to re-export essential fxns over putting in
The reasoning behind that is to keep the package as light weight as possible, e.g., https://github.com/ropenscilabs/ccafs/blob/master/NAMESPACE#L23-L24 shows we only import two of the
Thanks! Will definitely improve the pkg level file see ropensci/ccafs#6
No rule per se for CRAN but yeah, good idea, i'll add that in the pkg level man file and in readme (see ropensci/ccafs#13)
it is available like
It seems to be common now to not put
yeah, makes sense to add more explanation. a bucket is the top level thing that files are organized in within S3 (see ropensci/ccafs#15)
right, thanks! noted by other reviewer as well see ropensci/ccafs#5
Probably not the expected behavior. Perhaps returning a
Right, should include a simple explanation about that. When the pkg manual is made they don't wrap lines, so that's why i try to make sure all my code and docs are 80 line width or less. Although in the vignette its md/html, so I guess doesn't matter there so much. (see ropensci/ccafs#17)
Yeah, you got it. The single bracket method is for indexing like
thanks, will fix (see ropensci/ccafs#9)
I'll just respond to the points where you have a question for me. Let me know what other feedback would be helpful.
Whoops, yes I meant the http://ccafs-climate.org website.
I would be comfortable working with both platforms in parallel (your package in conjunction with the official CCAFS website) as long as there is some more direction from your package about how to leverage the official websites search capabilities to find the key strings that would be necessary. The package would still provide a valuable service to a novice user. I don't think there is a need to build in a spatial/temporal/data type subsetting engine directly into the package.
Re-exporting essential functions sounds great! I didn't know that was possible. I am on board with keeping the package lightweight, and allowing for potential use cases of the keys that may not require the entire
referenced this issue
Jan 4, 2017
Hi @sckott @noamross and @mikoontz. For me, @sckott has addressed all our concerns in a pretty way. I think the resulted package is nice and useful and for sure that in a (not so far) future this package will be improved with new functionalities (such us search for data).
PS: the example in the cc_data_fetch using the progress bar results in one line printed for each percentage point in the vignette. This is a not important issue (in fact, there is not an issue)
I think the package is much improved, largely thanks to the expanded documentation. Thanks for the extra information!
I have 4 concerns still:
First, I can't get the vignette to run using either of these lines of code:
Second, I still think that the vignette could use an example workflow to get the key from the website, even given the current state of the search functionality with the R interface (i.e. not present, but perhaps in development).
As an example, I used the CCAFS website to try to get a key like a user might. Specifically, I tried to recreate getting the key that the vignette uses. Here was my workflow:
Is this the workflow that you would use? If so, can you include it with the vignette or is that outside the scope of the vignette?
Third, is this package unable to get the regional datasets? I tried to get the region A1 version of the above key (with a 30 second resolution, since that's all that is available) and the package can fetch but not read. I tried a few other combinations of regional datasets and had the same problem. Are there other data sets you know aren't retrievable? I couldn't figure out how to get a key from the "Bias Correction" or the "Weather Station" parts of the website (I could only retrieve data from the "Spatial Downscaling" section); does that mean those data can't be accessed by the
When I run
Fourth, have you had any communication with the CCAFS/CGIAR folks about accessing their data via R instead of their interface? From their "Terms and Conditions" paragraph (copied below) that pops up before you can get a download link, it seems like they'd be interested in assessing the impact of their work.
@mikoontz to get the vignette working I run the
I also have tried to get a regional data set (the example @mikoontz pointed out above) and received the following error (Error in .rasterObjectFromFile(x, band=... : Cannot create a RasterLayer from this file) when running
Thanks @manuramon, you're right. I can run the vignettes now using the
I also see that I can call the
thanks for the help on vignette @manuramon
I don't know, I'll have a look
referenced this issue
Jan 9, 2017
@mikoontz Try again after reinstalling
@sckott I have found the
Can you add the "amazon_s3_keys" vignette to the package help
I'm now able to read the regional datasets that incorporate their associated .prj files. It looks like the ccafs_files object print method isn't reporting the correct number of files. When I read the same regional key that we've been using, the resulting object from the
Just 2 comments on it:
First, the exhaustive list of possible argument values in
An example use case would turn the directly coded search of the file key like you had in the
...into something like this, where tab complete helped every step of the way.
I have no idea whether it is good practice to hard code a lookup table like this, but maybe there is a better way to implement the concept. It may also make it easier to for the package maintainer to update the
Second, the cc_search() function returns errors if the format= or extent= arguments are integer values, but the
yeah, will do
I'll check on the regional datasets problem
added a commit
Feb 14, 2017
added a commit
Feb 14, 2017
Great job @sckott. I think this final version of the
Green light from me, as well.
PS. For me this has been the first time I have reviewed an R package and I have learned a lot and enjoyed it, so thanks to all of you! I hope to have the opportunity to review more packages in the future.