New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmipr - Client for Coupled Model Intercomparison Project (CMIP) data #99

Closed
sckott opened this Issue Feb 16, 2017 · 37 comments

Comments

@sckott
Copy link
Member

sckott commented Feb 16, 2017

Summary

  • What does this package do? (explain in 50 words or less):

Client to work with CMIP data - downscaled climate and hydrology projections. Package lists avail. files, downloads and caches, and reads into raster objects.

  • Paste the full DESCRIPTION file inside a code block below:
Package: cmipr
Type: Package
Title: Client for Coupled Model Intercomparison Project (CMIP) Data
Description: Client for Coupled Model Intercomparison Project (CMIP) data
		(<http://gdo-dcp.ucllnl.org/downscaled_cmip_projections/>).
		Data is stored on an FTP server, from which we provide
		functions to fetch data and return tidy data.
Version: 0.0.8.9310
Authors@R: person("Scott", "Chamberlain", role = c("aut", "cre"),
    email = "myrmecocystus@gmail.com")
License: MIT + file LICENSE
URL: https://github.com/ropenscilabs/cmipr
BugReports: https://github.com/ropenscilabs/cmipr/issues
VignetteBuilder: knitr
Imports:
    hoardr (>= 0.0.3),
    raster (>= 2.5-8),
    ncdf4 (>= 1.15),
    curl (>= 2.3),
    crul (>= 0.2.1.9100),
    tibble
Suggests:
    roxygen2 (>= 6.0.1),
    testthat,
    knitr,
    covr
RoxygenNote: 6.0.1
Remotes: ropensci/crul, ropensci/hoardr
  • URL for the package (the development repository, not a stylized html page):

https://github.com/ropenscilabs/cmipr

  • Who is the target audience?

People using climate projections for research - or studying climate change itself.

  • Are there other R packages that accomplish the same thing? If so, what is different about yours?

https://github.com/JGCRI/RCMIP5 deals with CMIP data, but AFAICT only handles data after you have it on your machine. So that package and this one could be used together.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration with Travis CI and/or another service.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package contains a paper.md with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

maybe https://github.com/bpbond since author of the above mentioned pkg

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Feb 17, 2017

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thank you for your submission @sckott!

  • RCMIP5 doc states "This package does not handle downloading (i.e. from nodes in the Earth System Grid Federation, http://esgf.org) the NetCDF data themselves. Sorry."
    So AFAICT too there's no overlap.

  • Here is the output from goodpractice::gp().

It is good practice to

  ? write unit tests for all functions, and all package code in general.
    80% of code lines are covered by test cases.

    R/cmip_fetch.R:24:NA
    R/cmip_fetch.R:25:NA
    R/cmip_fetch.R:26:NA
    R/cmip_read.R:41:NA
    R/cmip_read.R:43:NA
    ... and 8 more lines

  ? avoid long code lines, it is bad for readability. Also, many people
    prefer editor windows that are about 80 characters wide. Try make your
    lines shorter than 80 characters

    tests\testthat\test-cmip_fetch.R:4:1
    tests\testthat\test-cmip_read.R:4:1

where the long lines are only due to paths so you could only add "# nolint" at the end of these lines. 😉

  • I identified one typo using devtools::spell_check(). "Reaa" in cmip_read.Rd:5,17 should be "Read".

Reviewers: @cvitolo @bpbond
Due date: 2017-03-13

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Feb 17, 2017

thanks @masalmon will fix those things

sckott added a commit to ropenscilabs/cmipr that referenced this issue Feb 17, 2017

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Feb 17, 2017

@masalmon okay, those 2 thigns fixed

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Feb 19, 2017

Thanks for accepting to review this package @cvitolo and @bpbond !

As a reminder here are links to the recently updated reviewing and packaging guides and to the review template

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Mar 6, 2017

Hi @cvitolo and @bpbond ! Just a friendly reminder that your reviews are due on Monday, March 13.

@cvitolo

This comment has been minimized.

Copy link

cvitolo commented Mar 11, 2017

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README

I think that the readme should mention the netcdf library as external dependency.

  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4


Review Comments

@masalmon @sckott in my opinion this package is well written and it does what it says. I only have very few comments which I filed as issues, also reporting specific examples.

The main comment is that all the functions contain basic documentation but it would be great to have some more information. Once there is a more in-depth documentation, you could use pkgdown to automatically create a website for your package (or alternatively create a github-page). @masalmon suggested that for one of my packages not long ago and I think it's a great idea because your users might find that more visually appealing than the standard R help pages.

Also, please consider mentioning the NetCDF library as external dependency as ncdf4 is required by the raster package to handle nc files in your examples.

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Mar 12, 2017

Thanks a lot for your review @cvitolo !

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Mar 12, 2017

thanks for your review @cvitolo ! 😸 😃

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Mar 13, 2017

Also @cvitolo the different issues are fine now but note for future reviews that for rOpenSci's reviews all is usually kept in one thread, not like what happens at JOSS (rOpenSci package authors sometimes open issues themselves when dealing with the review as you see in #94 ). 😊

@cvitolo

This comment has been minimized.

Copy link

cvitolo commented Mar 13, 2017

@masalmon @sckott my apologies!
When I had my packages reviewed, I found very convenient to work on issues because they can develop in separate threads. This is extremely useful when the issue is complex and/or has some unexpected consequences on the remaining of the code.
Anyway, I'll follow best practice next time :)

@bpbond

This comment has been minimized.

Copy link

bpbond commented Mar 13, 2017

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 2


Review Comments

This package provides an interface to download (a subset of) Coupled Model Intercomparison Project (CMIP, see Taylor et al. 2013) files from ftp://gdo-dcp.ucllnl.org/pub/dcp/archive/. This could be quite useful for R users interested in a scriptable way to download such files.

Overall, this package is well designed and well structured. The R code is clean and clearly written and formatted, although a bit light on comments. Test coverage is good (although see ropenscilabs/cmipr#15). In generally the package complies with the ROpenSci and Mozilla guides. Documentation is thin--especially the function descriptions, which are extremely short and should be fleshed out--but what is there is clear and well written. R CMD CHECK passes with no errors, warnings, or notes.

A few suggestions:

  • Would be useful if cmip_list_files listed file sizes along with date and filename
  • cmip_fetch overwrite parameter doesn't seem to work?
  • An optional progress bar during fetching might be useful.
  • There's no CONTRIBUTING.md included.

The biggest problem is that the package needs to make clear it's using ftp://gdo-dcp.ucllnl.org/pub/dcp/archive/, which as far as I can see provides only a tiny subset of CMIP data--only temperature and precipitation? I'm concerned that people will be confused that they can't download, e.g., ocean pH data. This also means that the package is vulnerable if LLNL's ftp service is disrupted or changes, as there's no failover or backup source of data. Note that there are much more complete CMIP archives available online (e.g. the atmos.ethz.ch rsync server, although this is a private server and you'd need to contact them for access).

My test environment:

R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6
@maelle

This comment has been minimized.

Copy link
Member

maelle commented Mar 13, 2017

Thanks a lot for your review @bpbond 😃

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Mar 14, 2017

thanks for your review @bpbond !

So I can get a complete view of CMIP data:

  • I'm getting data in this pkg from the ftp server
  • There's more complete data available via rsync from atmos.ehz.ch
  • Are there any other places to get CMIP data?

Do you think the folks behind atmos.ehz.ch would allow public access - or is it a fill out a form, wait for email, kind of thing?

@bpbond

This comment has been minimized.

Copy link

bpbond commented Mar 14, 2017

Other places to get data - well, the canonical 'place' is the Earth System Grid Federation but that's been a source of huge frustration for lots of people (which is probably why LLNL established its ftp server). I'm sure there are other private repositories too (we have one, but it's sealed behind a firewall that I can't control).

The atmos.ehz.ch server is amazing, a huge data store. I have no idea if they'd allow public access or not, but it doesn't hurt to ask. The contact I have for that server (hopefully not outdated) is Urs Beyerle.

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Mar 14, 2017

thanks! that info is really helpful. I'll follow up on the rsync source

@bpbond

This comment has been minimized.

Copy link

bpbond commented Mar 14, 2017

Just occurred to me - one other alternative would be to ask LLNL if they'd be willing to expand their CMIP data holdings.

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Mar 14, 2017

Ah, good point, I'll ask them that

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Mar 20, 2017

Was told that public access not possible - but pointed me towards https://github.com/Prodiguer/synda - that not useable here, but sounds like there might be some open web services that synda uses that we can use, e.g., at this location http://esgf-index1.ceda.ac.uk/esg-search/search

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Mar 22, 2017

@bpbond heard back from ATMOS folks

have you ever used the Synda command line tool? https://github.com/Prodiguer/synda They suggested to have users use that on the command line and then just read data in within R - as opposed to doing the downloads ourselves here. thoughts?

He said:

Have you ever looked into original CMIP5 data? It can be quite a mess. The archive you mentioned (ftp://gdo-dcp.ucllnl.org/pub/dcp/archive/cmip5/) looks already
like a processed data archive (http://gdo-dcp.ucllnl.org). This data is in much better shape than the original CMIP5 data. And therefore much easier to process.

may be true, but you said it's only partial data, yes?

@bpbond

This comment has been minimized.

Copy link

bpbond commented Mar 23, 2017

Hey Scott. No, I don't know Synda–huh, interesting, thanks.

I agree that the original CMIP5 data can be a mess...but yeah, LLNL seems (as far as I could see) to be extremely partial: as in, out of the hundreds of CMIP5 output variables, they only seem to have two or three.

cmipr would still bring value if it just handles downloads from there, I guess, but it's pretty limited, and so as to not waste folks' time you'll want to make it extremely clear what's available and what's not. Bummer.

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Mar 23, 2017

@bpbond I think we really just need to re-create what synda does in R cause a do X in CLI then Y in R isn't ideal - i'll play around - will probably need to put this review on hold for a while 😸

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jun 19, 2017

@sckott any update on this package? 😉

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Jun 19, 2017

now that charlatan is through review - i'll return to this one - it still may be a few weeks or so - reviewer pointed out that we should really be leveraging much more of the data available, so working on that soon - not a clear easy path with the data available though :(

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jun 19, 2017

Great, I hope it will be a fruitful and interesting process! 🍀

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jul 25, 2017

@sckott any progress?

In any case you can add the peer-review badge to cmipr README via

[![](http://badges.ropensci.org/99_status.svg)](https://github.com/ropensci/onboarding/issues/99) 🐱

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Jul 25, 2017

@maelle will do the badge

blarg, no sorry, no progress. will try to get to very soon.

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Sep 30, 2017

@sckott any news? 😉

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Oct 2, 2017

sorry @maelle about the delay! I have on to do list to look into the new data source options, and have not done so yet. will try to get it done asap

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Oct 2, 2017

Thanks for the update! 😸

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jun 30, 2018

👋 @sckott re-visiting the issue, any news? 😺

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Jul 3, 2018

ooofffff - not yet. is on my to do list but keeps getting pushed off the end.

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Jul 13, 2018

@bpbond Finally trying to get back to this. Looking at api and the python client http://esgf-pyclient.readthedocs.io/en/latest/ and trying the same in R (been using base url https://esgf-index1.ceda.ac.uk so far), but it seems like I can only use the search API freely, but many data files I've tried so far are behind login walls. Do you know what it takes to get access to ESGF data?

Is it a valuable contribution to provide an R interface to the search API at least?

@bpbond

This comment has been minimized.

Copy link

bpbond commented Jul 13, 2018

Hey @sckott . That's true as far as I know--downloading data requires an ESGF login account; note that ESGF is a source of huge frustration to many researchers.

I think even exposing the search API through R would be valuable, yes! Anything to make the system easier to use...

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Jul 13, 2018

Thanks @bpbond - Okay, sounds like best approach moving forward is to just give access to any freely available cmip data I can, and have a search API interface. Then if folks with access to the actual data files want to contribute later on that's great

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Nov 9, 2018

@maelle I think i'm going to not work on this package anymore - too many pkgs and this is a tough one to sort out, so we can close this and i'll mark the repo as abandoned.

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Nov 9, 2018

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Dec 10, 2018

Oops I had missed this!

@maelle maelle closed this Dec 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment