/ software-review Public

# submission: outcomerate#213

Closed
opened this issue Apr 28, 2018 · 26 comments
Closed

# submission: outcomerate #213

opened this issue Apr 28, 2018 · 26 comments
Assignees
Labels

### Summary

• What does this package do? (explain in 50 words or less):
outcomerate is a lightweight R package that implements the standard outcome rates for surveys, as defined in the Standard Definitions of the American Association of Public Opinion Research (AAPOR).

• Paste the full DESCRIPTION file inside a code block below:

Package: outcomerate
Version: 0.0.1
Title: AAPOR Survey Outcome Rates
Description: Standardized survey outcome rate function, including the response rate, contact rate, cooperation rate, and refusal rate.
Authors@R: person("Rafael", "Pilliard Hellwig", email = "rafael.taph@gmail.com", role = c("aut", "cre"))
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1
Depends:
R (>= 2.10)
Suggests:
dplyr,
forcats,
knitr,
survey,
testthat,
covr
Imports:
Rdpack (>= 0.7)
RdMacros:
Rdpack
URL: https://github.com/rtaph/outcomerate
BugReports: https://github.com/rtaph/outcomerate/issues

• URL for the package (the development repository, not a stylized html page): https://github.com/rtaph/outcomerate

• Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

• data munging: because it allows researchers to compute rates of interest given specific input data
• reproducibility: because it implements standardized definitions so that the research community can better compare the quality of survey data
•   Who is the target audience and what are scientific applications of this package?
Survey analysts are the primary target. The package has applications for any survey that uses standard disposition codes and requires the calculation of AAPOR outcome rates.

• Are there other R packages that accomplish the same thing? If so, how does
yours differ or meet our criteria for best-in-category?
Not to my knowledge.

•   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

### Requirements

Confirm each of the following by checking the box. This package:

• has a CRAN and OSI accepted license.
• contains a README with instructions for installing the development version.
• includes documentation with examples for all functions.
• contains a vignette with examples of its essential functions and uses.
• has a test suite.
• has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
• I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

#### Publication options

• Do you intend for this package to go on CRAN?
• Do you wish to automatically submit to the Journal of Open Source Software? If so:
• The package has an obvious research application according to JOSS's definition.
• The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
• The package is deposited in a long-term repository with the DOI:
• (Do not submit your package separately to JOSS)
• Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
• The package is novel and will be of interest to the broad readership of the journal.
• The manuscript describing the package is no longer than 3000 words.
• You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
• (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
• (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
• (Please do not submit your package separately to Methods in Ecology and Evolution)

### Detail

• Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

• Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

• If this is a resubmission following rejection, please explain the change in circumstances:
Not applicable.

• If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

changed the title outcomerate submission submission: outcomerate Apr 28, 2018

### Editor checks:

• Fit: The package meets criteria for fit and overlap
• Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
• Repository: The repository link resolves correctly
• Archive (JOSS only, may be post-review): The repository DOI resolves correctly
• Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

👋 @rtaph. Thanks for this submission. I've run goodpractice::gp() on your package and everything comes out clean. The package meets all of our recommended guidelines really well so thanks! 🚀
I'll start working on finding two reviewers who are able to put the package through a bunch more testing, user interface review etc so you can get additional feedback.

Reviewers: @carlganz, @nealrichardson
Due date: May 22, 2018

### karthik commented Apr 30, 2018

 🙏@carlganz for agreeing to review. Note tentative due date above.

### karthik commented May 1, 2018

 Thanks @nealrichardson for agreeing to review. I've also tagged you in the issue above. Note review date and let me know if you need more time.

### karthik commented May 3, 2018 • edited

 @rtaph You can add a badge to your readme right away. Badge location is: https://badges.ropensci.org/213_status.svg  It will update as your review progresses and change to green once accepted. 🙏

## Package Review

• As the reviewer I confirm that there are no conflicts of interest for me to review this work

#### Documentation

The package includes all the following forms of documentation:

• A statement of need clearly stating problems the software is designed to solve and its target audience in README
• Installation instructions: for the development version of package and any non-standard dependencies in README
• Vignette(s) demonstrating major functionality that runs successfully locally
• Function Documentation: for all exported functions in R help
• Examples for all exported functions in R Help that run successfully locally
• Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

#### Functionality

• Installation: Installation succeeds as documented.
• Functionality: Any functional claims of the software been confirmed.
• Performance: Any performance claims of the software been confirmed.
• Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
• Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

#### Final approval (post-review)

• The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3

This package is very focused on a specific problem, and it solves it well. It wraps and automates what would otherwise be tedious arithmetic and formulas you'd have to keep looking up.

Here are a few notes on ways I think you can improve the code and documentation of the package.

#### Code functionality

• It seems that I can't get "all the rates" unless I provide a value for e. I think I should be able to easily get all of the rates that don't depend on e just by calling outcomerate(my_data). Instead, I get the error "The parameter e must be provided for RR3, RR4, REF2, CON2". It would be nice if instead you excluded those 4 metrics and calculated the rest.
• Why would I ever want return_nd = TRUE? I don't see any documentation or examples that do that. In my experience it's wise not to add extra arguments and features with no application or demand, even if it's trivial to do---they clutter your interface and create a liability for you to maintain them. So I'd advocate removing it. But, if you determine that it has value, consider returning numden before even bothering to calculate rates (outcomerate.R#L151).
• In asserters.R, assert_weight: what if the weight vector contains missing values? (It errors, but not helpfully.)

#### Code style

• NULL is generally a better default value for function arguments than NA. You're less likely to accidentally provide NULL, NA is actually type logical and that can be confusing, and is.null() is a much more natural check than is.na(), which will break your if() statements if the input is length > 1.
• outcomerate.factor <- outcomerate.character works--no need to define a different method and coerce the data as.character().
• Likewise, outcomerate.table <- outcomerate.numeric passes the tests too.
• Doing that, then outcomerate.R#L108-L114 simplifies a bit too because you don't have to munge the table objects you get: freq <- table(x) (perhaps multiplied by weight)
• Speaking of multiplying by weight, what's the point of a scalar (length 1) weight when you're calculating proportions? Doesn't it difference out? (Empirically, it does, just experimenting with altering a scalar weight in the tests.)
• outcomerate.R#L116 might be more readable as freq <- xtabs(weight ~ x). This would require adding stats to Imports, though stats is a base package so it's "free" from a dependency perspective.

#### Docs

• I recommend the spelling package to automatically check your spelling. (It will catch for you a few misspellings in the readme and the vignette.) You can have it run every time you run CHECK. To do so, add this to your DESCRIPTION, and then a tests/spelling.R file with this. CHECK will then report all of your misspellings. Once you've fixed them, you can do spelling::update_wordlist() to whitelist the other words that you use (like "middleearth") that it will flag as misspelled but that are ok. Note that that test file I linked has a special extra check I like: by default, spelling errors won't fail CHECK, but I have it check to see if it's running on Travis-CI and fail the build there if there are misspellings. I find that to be a good way to get me to notice spelling errors without risking a CRAN check failure.
• roxygen: I recommend using the markdown support added in roxygen2 v6.0 (see note in "DESCRIPTION" below for how to add). That way, your \item{}s can just become *  lists. Will make the inline docs more readable.
• Also would help to offer some more words to explain what the different rates are beyond just "RR1" etc. https://www.aapor.org/AAPOR_Main/media/publications/Standard-Definitions20169theditionfinal.pdf starting at p. 60 is very readable and helped me. (That might be a better link to use in your docs, too.) Some summary e.g. that the "RR" are "Response rates", which are (explain that in your words), and that the differences among RR1-6 are in whether partial interviews are included and how the unknown types are factored in. Something like that.
• What is fmat and where does it come from? That seems to be where all of the real business is. It should probably be documented and more visible.

#### Vignette

• I'm a bit more cautious about adding package dependencies than some, and it's only in Suggests, but tidyverse is a lot, and you probably don't need all of that for your vignette.
• Why do you assume e = 0.8 in this example? Seems that if the goal is to prevent "claims regarding survey quality to become opaque", this shouldn't be opaque either. Some discussion around this, around how one might estimate or reason about e and not just make up a number that suits their goals, would be helpful.
• The "small wrapper function" or() has an ambiguous sounding name. Perhaps something like get_rates() instead?

#### Tests

• context() can be more useful as a descriptive string rather than the file name, which any test failure will report anyway
• (test-params.R) It's good to assert the error message in expect_error() since helpful error messages are an important part of a user interface and can serve as a source of documentation, too. (cf. http://style.tidyverse.org/error-messages.html). As I mentioned above, I found at least one condition that errored but not in a way that would have helped me to understand what I had done wrong.

#### DESCRIPTION

• "Description" needs to be more than one complete sentence. This is a good place for CRAN human checks to hold up accepting your package submission, so you should try to give them the prose they want. Here's a few of the gotchas I remind myself of each time I submit a new package: https://github.com/nealrichardson/skeletor/blob/master/inst/pkg/DESCRIPTION#L4-L9
• add Roxygen: list(markdown = TRUE), as suggested above, so you can use the markdown syntax in your roxygen docs.
• "Suggests" includes survey but I don't see it used anywhere, though perhaps I'm missing it.

### rtaph commented May 24, 2018

 Thank you @karthik, @nealrichardson, and @carlganz for your support. I will review these comments over the weekend and get back to you shortly.

### carlganz commented May 24, 2018

 Sorry that I have not written the review yet. I will get done by tomorrow I promise.

## Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

• As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

#### Documentation

The package includes all the following forms of documentation:

• A statement of need clearly stating problems the software is designed to solve and its target audience in README
• Installation instructions: for the development version of package and any non-standard dependencies in README
• Vignette(s) demonstrating major functionality that runs successfully locally
• Function Documentation: for all exported functions in R help
• Examples for all exported functions in R Help that run successfully locally
• Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

#### Functionality

• Installation: Installation succeeds as documented.
• Functionality: Any functional claims of the software been confirmed.
• Performance: Any performance claims of the software been confirmed.
• Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
• Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

#### Final approval (post-review)

• The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 2

The goal of the outcomerate package is to simplify the calculation of AAPOR's different definitions for response rates of surveys. The package is lightweight, and accomplishes its primary goal. The vignette does a good job explaining response rates, and how to use the package. The package also makes good use of 3-dimensional matrices. There are a few areas where the package can be improved most of which @nealrichardson has already mentioned. Here is a list in no particular order:

• Definitely use NULL instead of NA for the default parameters.

• Similarly, for assert_weights consider giving a warning if any of the weights are NA

• When I was reading source code I initially didn't understand what fmat was. Even though the data structure is not exported, I encourage you to include source code that generated fmat in the "data-raw" folder as Hadley suggests here. I think it makes reading the source code easier if the code for all the data structures used is available.

• I don't think the apply used here is necessary. numden[,1]/numden[,2] gives you what you want but drops names. Up to you if you want to keep apply approach.

• Replacing tidyverse with the specific packages used in the vignette (i.e. dplyr, tidyr, ggplot2) is a good idea.

• In the documentation for the fake middleearth dataset you should include a description of each variable.

• Please change name of or function in vignette to something more informative.

• As Neal already said some of the outcome.* definitions can be simplified.

Good work!

### karthik commented Jun 1, 2018

 👋 @rtaph both reviews are now in. Please let us know when you're had a chance to read and respond to these comments. 🙏

### rtaph commented Jun 3, 2018

 Gentlemen-- thank you all for the feedback and sorry it has taken a while to respond. I am integrating your points and hope to be able to share a revised version soon. Aiming for end of week. Here are my replies to each point: It seems that I can't get "all the rates" unless I provide a value for e. I think I should be able to easily get all of the rates that don't depend on e just by calling outcomerate(my_data). Instead, I get the error "The parameter e must be provided for RR3, RR4, REF2, CON2". It would be nice if instead you excluded those 4 metrics and calculated the rest. Yes, I see where you are coming from and I agree in the case of the default where rate = NULL. I would like to further address this with an eligibility_rate() function to estimate ‘e’ based on ineligible respondents, “NE”, in the x vector. The reason not to integrate it into the other collection of rates is that if you calculate grouped outcome rates (one of the motivations for the package), ‘e’ should not be calculated using a split-apply-combine approach, but globally. Why would I ever want return_nd = TRUE? I don't see any documentation or examples that do that. In my experience it's wise not to add extra arguments and features with no application or demand, even if it's trivial to do---they clutter your interface and create a liability for you to maintain them. So I'd advocate removing it. But, if you determine that it has value, consider returning numden before even bothering to calculate rates (outcomerate.R#L151). I find it useful to calculate grouped outcome rates while fieldwork is happening and when counts are small. For example, computing outcome rates by researcher helps spot early problems with field work. However, when counts are low at the start of a project, the rates are unstable. Having numerators and denominators can be useful in combination with empirical Bayes methods to smooth out the raw estimates. I plan to write a vignette on this but have not had time yet. In asserters.R, assert_weight: what if the weight vector contains missing values? (It errors, but not helpfully.) Good catch. I will add a case for this. NULL is generally a better default value for function arguments than NA. You're less likely to accidentally provide NULL, NA is actually type logical and that can be confusing, and is.null() is a much more natural check than is.na(), which will break your if() statements if the input is length > 1. Great point. I will make the change. outcomerate.factor <- outcomerate.character works--no need to define a different method and coerce the data as.character(). Clever. I’ve made the change. Likewise, outcomerate.table <- outcomerate.numeric passes the tests too. One thing I am concerned about with this change is that the table would be coerced to an unnamed numeric vector, and that the order of the input then suddenly matters to the function output. A table object whose first element is the counts for “I” would not give the same results as a table whose first element is, say, counts for “NC”. Doing that, then outcomerate.R#L108-L114 simplifies a bit too because you don't have to munge the table objects you get: freq <- table(x) (perhaps multiplied by weight) So if I understand you, you are suggesting to have the different methods resolve to the table method. Is that correct? I will look into this. Speaking of multiplying by weight, what's the point of a scalar (length 1) weight when you're calculating proportions? Doesn't it difference out? (Empirically, it does, just experimenting with altering a scalar weight in the tests.) Yes, it does, unless you return the numerator and denominator. Otherwise it makes no difference. outcomerate.R#L116 might be more readable as freq <- xtabs(weight ~ x). This would require adding stats to Imports, though stats is a base package so it's "free" from a dependency perspective. Interesting—I am open to the idea. Will see how this works when I try to address the point about using the table method as the ultimate method called. I recommend the spelling package.... Great suggestion. roxygen: I recommend using the markdown support added in roxygen2 v6.0 (see note in "DESCRIPTION" below for how to add). That way, your \item{}s can just become * lists. Will make the inline docs more readable. Much better, yes. Thanks for the suggestion. Also would help to offer some more words to explain what the different rates are beyond just "RR1" etc. Good point. Will add more text to briefly explain what each rate represents. What is fmat and where does it come from? That seems to be where all of the real business is. It should probably be documented and more visible. I will write a separate doc for fmat. It is a formula matrix that defines all of the equations from the AAPOR definitions in one place (instead of writing them out separately as code). I'm a bit more cautious about adding package dependencies than some, and it's only in Suggests, but tidyverse is a lot, and you probably don't need all of that for your vignette. Agreed. Will simplify it down to only those packages used. Why do you assume e = 0.8 in this example? Seems that if the goal is to prevent "claims regarding survey quality to become opaque", this shouldn't be opaque either. Some discussion around this, around how one might estimate or reason about e and not just make up a number that suits their goals, would be helpful. You make a fair point. I think once I add the function eligibility_rate() and show how to calculate it based on ineligible respondents, this will no longer be an issue. The 0.8 in the example is purely illustrative for the fictional case. The "small wrapper function" or() has an ambiguous sounding name. Perhaps something like get_rates() instead? Yes, sure. context() can be more useful as a descriptive string rather than the file name, which any test failure will report anyway Yep, good point. (test-params.R) It's good to assert the error message in expect_error() since helpful error messages are an important part of a user interface and can serve as a source of documentation, too. (cf. http://style.tidyverse.org/error-messages.html). As I mentioned above, I found at least one condition that errored but not in a way that would have helped me to understand what I had done wrong. Yes, good suggestion. "Description" needs to be more than one complete sentence. This is a good place for CRAN human checks to hold up accepting your package submission, so you should try to give them the prose they want. Agreed. Thanks for the catch. add Roxygen: list(markdown = TRUE), as suggested above, so you can use the markdown syntax in your roxygen docs. Thanks. "Suggests" includes survey but I don't see it used anywhere, though perhaps I'm missing it. I was originally going to use a dataset from the package but then changed my mind and made my own middleearth dataset. I will remove it. Definitely use NULL instead of NA for the default parameters. Sure. Similarly, for assert_weights consider giving a warning if any of the weights are NA Sure. When I was reading source code I initially didn't understand what fmat was. Even though the data structure is not exported, I encourage you to include source code that generated fmat in the "data-raw" folder as Hadley suggests here. I think it makes reading the source code easier if the code for all the data structures used is available. Yes, sure. I will add a raw data folder and docs for the fmat object. I don't think the apply used here is necessary. numden[,1]/numden[,2] gives you what you want but drops names. Up to you if you want to keep apply approach. I think I had implemented it this way to keep the names. I'll take a look at it again. Replacing tidyverse with the specific packages used in the vignette (i.e. dplyr, tidyr, ggplot2) is a good idea. Yes, will do. In the documentation for the fake middleearth dataset you should include a description of each variable. Good suggestion. Please change name of or function in vignette to something more informative. Yes, will do. As Neal already said some of the outcome.* definitions can be simplified. Sure

### nealrichardson commented Jun 4, 2018

 Likewise, outcomerate.table <- outcomerate.numeric passes the tests too. One thing I am concerned about with this change is that the table would be coerced to an unnamed numeric vector, and that the order of the input then suddenly matters to the function output. As far as I know, this method definition doesn't coerce anything--it just says "use this function". A table behaves like a numeric vector with names, so the code in outcomerate.numeric just worked when given a table. When I tried it, all of your tests passed, so if there's some behavior that this changes or alters, it's not anything you're asserting in tests. If it's important, I suggest writing a test and seeing if it fails with this change.

### rtaph commented Jun 7, 2018

 Dear @nealrichardson, @carlganz, and @karthik, I have implemented all recommendations to the package (ropensci/outcomerate@2a178ba), save for writing another vignette to explain the use of return_nd = TRUE which I hope to do soon. In the mean time, please feel free to review the changes and let me know if you are happy with the current state of the package. Thanks,

### karthik commented Aug 28, 2018

 Thanks @rtaph! @carlganz and @nealrichardson can you please respond and sign off on this or raise any other concerns?

 LGTM

### nealrichardson commented Aug 30, 2018

 Thanks for the reminder. Looks good to me too; I checked the final approval box on the checklist above.

### karthik commented Aug 31, 2018 • edited

 Congrats @rtaph, your submission has been approved! 🎉 Thank you for submitting and @carlganz and @nealrichardson for thorough and timely reviews. To-dos: Transfer the repo to the rOpenSci organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do. Add the rOpenSci footer to the bottom of your README [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)  Fix any links in badges for CI and coverage to point to the ropensci URL. (We'll turn on the services on our end as needed) Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/technotes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.

### rtaph commented Sep 1, 2018

 Thank you @karthik, I will do so shortly. Regarding a blog post, I would be happy to participate. @nealrichardson and @carlganz, with your permission, I would like to add your names to the DESCRIPTION file as reviewers. Do you have an ORCID you would like me to include?

### nealrichardson commented Sep 4, 2018

 Thanks @rtaph. Sure, you can add me to DESCRIPTION as a reviewer. I don't have an ORCID.

### rtaph commented Sep 8, 2018

 Done. Thanks again, @karthik, @nealrichardson, and @carlganz!

closed this as completed Sep 8, 2018

### stefaniebutland commented Sep 10, 2018

 @rtaph Glad to hear you're interested in contributing a blog post about outcomerate. (Thanks for your patience with my delayed response) This link will give you many examples of blog posts by authors of onboarded packages so you can get an idea of the style and length you prefer: https://ropensci.org/tags/review/. Here are some technical and editorial guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post. We ask that you submit your draft post via pull request a week before the planned publication date so we can give you some feedback. Deadline? We can publish 2018-10-02 if you submit your draft by 2018-09-25. Let me know if you prefer a later date. Happy to answer any questions.

### rtaph commented Sep 12, 2018

 @stefaniebutland The 2018-09-25 sounds good. I'll get started on something and let you know if I have any questions. Thanks :)

### stefaniebutland commented Sep 25, 2018

 Hi @rtaph. Still hoping to publish your post next Tuesday Oct 2nd. Will you be able to submit a draft by Thurs Sep 27?

### rtaph commented Sep 25, 2018

 Hi @stefaniebutland ! Yes, sorry for being a bit late. Will try to submit by Thursday :)

### rtaph commented Sep 29, 2018

 @karthik, can you help me initiate a gh-pages branch to host the \docs directory as a pkgdown site? I am writing the rOpenSci onboarding blog and would like to hyperlink examples in the vignette.

### karthik commented Oct 1, 2018

 @rtaph Done. You also have admin now to take care of other housekeeping. 🙏

### rtaph commented Oct 1, 2018

 Much appreciated, @karthik !