New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application to add rBBS package #98

Closed
oharar opened this Issue Feb 16, 2017 · 18 comments

Comments

Projects
None yet
4 participants
@oharar
Copy link

oharar commented Feb 16, 2017

Summary

  • What does this package do? (explain in 50 words or less):

Retrieves North American Breeding Bird survey data from the USGS

  • Paste the full DESCRIPTION file inside a code block below:
Package: rBBS
Type: Package
Title: Imports North American Breeding Bird Survey data
Version: 0.1.1
Date: 2015-01-09
Authors@R: person("Bob", "O'Hara", role = c("aut", "cre"),
                     email = "bob.ohara@ntnu.no")
Maintainer: Bob O'Hara <bob.ohara@ntnu.no>
Description: Imports North American Breeding Bird Survey data from the USGS' ftp
    server, or a local copy.
Depends: R (>= 2.10), RCurl, plyr
Suggests: testthat
License: GPL-3
LazyData: TRUE
RoxygenNote: 6.0.1
  • URL for the package (the development repository, not a stylized html page):
    https://github.com/oharar/rBBS

  • Who is the target audience?
    Ecologists and ornithologists wanting to use the BBS data

  • Are there other R packages that accomplish the same thing? If so, what is different about yours?

I think the spocc package can access the data (through BISON), but rBBS should have more flexibility in what it can retrieve (I hope!).

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration with Travis CI and/or another service.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package contains a paper.md with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
    Uses RCurl, not httr (which I have not worked with)

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
    Scott Chamberlain (sckott), creator of spocc

@sckott sckott added the package label Feb 16, 2017

@sckott sckott self-assigned this Feb 17, 2017

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Feb 17, 2017

hi @oharar thanks for your submission!

We discussed internally a bit and under our rules about overlap, we'd like to ask you to explain a bit how your pkg does different things than rbison does - or better - or more flexibly as you said above.

library(rbison)
bison_solr(resourceID = '440,100043')

gets to Breeding Bird Survey data as far as I can tell. And thus users could do that from spocc as well.

Even if the above duplicates what your pkg does, there is still good reason sometimes to have a route to FTP data since that often supports the common use case where the user wants a lot of data - web APIs usually aren't a good fit for the i want all the data use cases.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Feb 27, 2017

@oharar Did you get the above message?

@oharar

This comment has been minimized.

Copy link
Author

oharar commented Feb 28, 2017

Sorry. I did, but haven't had time to poke around spocc. Hopefully this week...

@oharar

This comment has been minimized.

Copy link
Author

oharar commented Mar 2, 2017

Finally got passed a few other things (innumerate evolutionary biologists whimper), and had a look.

rbison can extract the BBS data, but as far as I can see it doesn't have access to all of the meta-data (e.g. the weather data). Although it provides the raw counts, this would need more work to extract them - they're text in one field of the output (at the moment everything is hidden from me by a bad gateway - I can only think my requests are being sent through Cirith Ungol).

So, overall my package should provide access to more BBS (meta-)data, and also also provide abundance data (in particular) in a more usable form. Oh, and people using the data a lot can download the full dataset, and then query it from the local copy, so their requests can avoid even having to think about going via Minas Morgul.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Mar 3, 2017

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Currently seeking reviewers. It's a good fit and not overlapping.

  • You didn't check the CRAN box above. Do you not intend to send to CRAN?
  • I see that package has CI integration with Travis - please do add a badge to your readme for that
  • tests take a long time to run - seems like you could easily test the functionality of the package without downloading so many files
  • Some tests fail
  • LICENSE.md file and data-raw directories should have entries in .Rbuildignore
  • Ran goodpractice::gp(), some things for reviewers and submitter to think about going forward - you don't necessarily have to do everything to the "T" below - just some suggestions
It is good practice towrite unit tests for all functions, and all package code in general. 91% of code lines are covered by test cases.

    R/GetRouteData.R:22:NA
    R/GetRouteData.R:23:NA
    R/GetRouteData.R:24:NA
    R/GetRouteData.R:25:NA
    R/GetRouteData.R:36:NA
    ... and 7 more linesnot use "Depends" in DESCRIPTION, as it can cause name clashes, and poor interaction with other packages. Use "Imports" instead.
  ✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite often. A build date will be added to the package when
    you perform `R CMD build` on it.
  ✖ add a "URL" field to DESCRIPTION. It helps users find information about your package online. If your package does not have a
    homepage, add an URL to GitHub, or the CRAN package package page.
  ✖ add a "BugReports" field to DESCRIPTION, and point it to a bug tracker. Many online code hosting services provide bug trackers
    for free, https://github.com, https://gitlab.com, etc.
  ✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and developers are used it and it is easier to read
    your code for them if you use '<-'.

    R/GetRouteData.R:5:13
    R/GetRouteData.R:20:8
    R/GetRouteData.R:29:10
    R/GetRouteData.R:61:31
    R/GetRouteData.R:67:29
    ... and 9 more linesavoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try
    make your lines shorter than 80 characters

    R/GetRegions.R:3:1
    R/GetRegions.R:5:1
    R/GetRegions.R:6:1
    R/GetRegions.R:9:1
    R/GetRegions.R:11:1
    ... and 36 more linesavoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply()
    instead.

    R/GetRegions.R:19:30
    R/GetRegions.R:34:35
    R/GetRegions.R:38:27
    R/GetRegions.R:44:20
    R/GetRouteData.R:57:15
    ... and 1 more linesnot import packages as a whole, as this can cause name clashes between the imported packages. Instead, import only the specific
    functions you need.
  ✖ avoid 'T' and 'F', as they are just variables which are set to the logicals 'TRUE' and 'FALSE' by default, but are not reserved
    words and hence can be overwritten by the user.  Hence, one should always use 'TRUE' and 'FALSE' for the logicals.

    R/GetRegions.R:NA:NA
    R/GetRegions.R:NA:NA
    R/GetRegions.R:NA:NA
    R/GetRegions.R:NA:NA
    R/GetRegions.R:NA:NA
    ... and 5 more lines

Reviewers: @soodoku @FvD
Due date: 2017-03-31

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Mar 3, 2017

@oharar reviewers assigned.

@soodoku @FvD let me know if you have any questions about the review process.

oharar pushed a commit to oharar/rBBS that referenced this issue Mar 7, 2017

oharar
Update descrption, Readme, GetRegions.R & GetRouteData.R for ROpenSci…
…, after editor comments here: ropensci/software-review#98 (comment) (note that this is not a full response!)
@FvD

This comment has been minimized.

Copy link

FvD commented Mar 20, 2017

rBBS Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3 hrs.

Review Comments

Documentation

The package includes all the following forms of documentation:

  • A statement of need

The README is clear about the need for the package but perhaps could be improved on the following points:

  1. Help people not familiar with USA wildlife surveys and non-ornithologists understand what this is about. For example mention that BBS stands for North American Breeding Bird Survey and explain that the data can be accessed from the USGS at URI: www.pwrc.usgs.gov/bbS/.
  2. Include the gist of the discussion about overlap with rBison and spocc in de README (for clarity, and to highlight how rBBS is more practical in its use).
  3. Spell check the README.
  • Vignette(s)

There are no vignettes available. I do hope the author will consider writing one, because rBBS is exactly the kind of package that helps people to put aside their bias against programming, and gets to them to tinker in R simply because they like the data set. It also makes for fun and digestible examples in classroom / training settings.

  • Examples

All exported functions have examples, and all examples run successfully when I repeat them. However, the documentation contains a note to self about function get10RouteData that is not exported yet:

## Would love to run this but function disappears
## CollaredDove <- Get10RouteData(ColDoveAOU, weather = NULL, routes = NULL, 
##                      year=c(1995,2000,2005, 2010))

Either decide to export get10RouteData or remove this comment.

For some of the other examples I offer some minor, cosmetic remarks:

GetRegions()

  1. Remove the remark "Here you go ...."

GetSpNames()

  1. Remove the remark "Here you go ...."

Getweather()

  1. The heading "Get woodpeckers and friends" seams out of place with this function.
  2. In all other examples the results are assigned to a variable. Making it read Weather <- GetWeather() seems to be more consistent.

The documentation for rBBS and rBBS-package is identical.

Functionality

  • Performance:

Yes (the claim being that working with local files speeds up working with the data).

  • Automated tests:

The tests are there, but I wonder if it adds to the strength of the test to download so much data. It takes a long time to run the tests locally because the downloads vary in speed to the point where they can stall, perhaps because I'm not in the USA. It appears that your Travis-CI run #7 also had problems with the downloads in the test and failed for that reason.

Maybe you could use your current testing strategy, but assign random download targets. This would speed up the testing process in the short term, but in the long term still target a wider variety of download targets so that you will see (or be informed about) any issues with particular data sets in on the BBS ftp server.

  • Packaging guidelines:

The package does not follow the rOpenSci packaging guidelines to the letter, but I do not know how strict rOpenSci is about this. The issues that cause conflict are:

  1. The package name has uppercase characters
  2. Variable names follow CamelCase instead of snake_case style
  3. Function names follow CamelCase instead of snake_case style
  4. The package imports and uses RCurl instead of the recommended httr package.
  5. The package does not use Roxygen for documentation.

Furthermore, there are some files that warrant removal from the root of the repository:

  • Read-and-delete-me
  • README-unnamed-chunk-5-1.png
  • README-unnamed-chunk-6-1.png

Other comments

  1. There are chunks of code commented out that should be removed. For instance the commented out version of GetSpNames() in GetSpNames.R.
  2. There is a file in your R directory called sysdata.rda that contains the RegionsForZipFiles object, but you have also set up a data-raw folder to create this same object. I've not examined this in detail, but if you need the rda file, consider putting it in an inst/ folder.
@sckott

This comment has been minimized.

Copy link
Member

sckott commented Mar 20, 2017

Thanks for your review @FvD !

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Apr 3, 2017

@soodoku deadline was 31 March, plz get your review in soon, thanks!

@soodoku

This comment has been minimized.

Copy link

soodoku commented Apr 3, 2017

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Apr 3, 2017

thanks

@oharar

This comment has been minimized.

Copy link
Author

oharar commented Apr 4, 2017

Don't rush on my account! I won't be able to get to do much for a few days (and I still have to finish making changes from Scott's comments).

@soodoku

This comment has been minimized.

Copy link

soodoku commented Apr 4, 2017

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • [X ] Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION
Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • [ X] Installation: Installation succeeds as documented.
  • [ X] Functionality: Any functional claims of the software been confirmed.
  • [ X] Performance: Any performance claims of the software been confirmed.
  • [ X] Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

3

Review Comments

Dear Bob,

Thank you for making this great data more easily accessible! And many apologies again for the delay in submitting the review.

The aim of the review is to flag problems, and to suggest improvements.

Coding Style

  • More readable code with better spacing:

In a variety of places, spacing is missing. For instance,

```
if(!is.null(countrynum) & any(!(countrynum%in%c(124, 484, 840))))
```

in https://github.com/oharar/rBBS/blob/master/R/GetRouteData.R

or

Generally, a good idea to insert spacing before and after %in% etc.

download.file(ZipName,temp, quiet=FALSE)

Generally, a good idea to insert spacing after the comma.

  • Standard case for variable names:

    I know that in lots of places you defer to the variable names style in the output data and it is completely understandable. But I think it would make users' lives easier if we standardized case (you can just use `tolower()`` to convert names to lower case, and ideally, for compound words, move to snake case.)

    • tolower will change stuff like AOU and Dir in say

    https://github.com/oharar/rBBS/blob/master/R/GetRouteData.R

    • In other places, a variable name like countrynum can be rewritten as country_num. But since that is a bit harder, it is your call.
  • Where function arguments have limited categorical options, it is good to specify all the options in the function signature and then use match.args(). So for instance, countrynum in GetRouteData could be changed thus:
    For instance, countrynum in https://github.com/oharar/rBBS/blob/master/R/GetRouteData.R

  • Functions like GetRegions can be decomposed into smaller internal functions, which you can call to accomplish simple tasks like read_50stop_data to read data from ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/50-StopData/1997ToPresent_SurveyWide/ which you have in https://github.com/oharar/rBBS/blob/master/R/GetRegions.R To be clear when I say internal functions, I do mean functions within functions such as GetDat in https://github.com/oharar/rBBS/blob/master/R/GetRouteData.R but to give GetDat a separate file and if you choose, not export it.

  • In GetRegions, it may be a good idea to change the col. name of State/Prov/TerrName to a col. name that follows R standard. People are liable to have issues.

Documentation

  • Better documentation for arguments:

    Vector of state codes. If they are unique to one country, countrynum can be NULL could be improved by guiding people to a list of acceptable state codes via a URL.

  • Description of GetUnzip is basically not there:

  • May be better to refer to 'Data Frame' as '\code{data.fame}' in documentation.

  • Deterministic output filename. It may be preferable to set a default and also allow some user discretion?
    FileName="weather.csv" in https://github.com/oharar/rBBS/blob/master/R/GetWeather.R

  • The Dir="ftp://ftpext.usgs.gov/pub/er/md/laurel/BBS/DataFiles/" is in many places and it gave me some pause. If there are mirrors for this, it would be a good idea to list the mirrors. If there aren't mirrors, a global variable perhaps? With option to overwrite it?

  • Fix minor spelling errors like teh in GetRegions or teh speciew in readme.md etc.

Code

  • curl_download is a bit (a lot?) more reliable than download.file. Issue in

  • Lots of things need to go right for the package to produce stuff. So it needs robust and informative error handling. One simple addition would be adding 'file.exists()' for cases where you read from a file. There are various other checks that could be instituted elsewhere.

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Apr 4, 2017

Awesome, thanks for your review @soodoku 😺

Continue discussion here @oharar

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Jun 7, 2017

@oharar any progress on responses to reviewers?

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Aug 11, 2017

@oharar Any thoughts?

@sckott sckott added the holding label Aug 11, 2017

@sckott sckott added pulled and removed holding labels Oct 12, 2017

@sckott

This comment has been minimized.

Copy link
Member

sckott commented Oct 12, 2017

Given discussion over email - closing this now, you are welcome to resubmit. If software hasn't changed drastically, we can reuse reviews

@sckott sckott closed this Oct 12, 2017

@oharar

This comment has been minimized.

Copy link
Author

oharar commented Oct 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment