Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

excluder: checks for exclusion criteria in online data #455

Closed
11 of 27 tasks
JeffreyRStevens opened this issue Jul 26, 2021 · 69 comments
Closed
11 of 27 tasks

excluder: checks for exclusion criteria in online data #455

JeffreyRStevens opened this issue Jul 26, 2021 · 69 comments
Assignees

Comments

@JeffreyRStevens
Copy link
Member

JeffreyRStevens commented Jul 26, 2021

Date accepted: 2021-11-04
Submitting Author Name: Jeffrey Stevens
Submitting Author Github Handle: @JeffreyRStevens
Repository: https://github.com/JeffreyRStevens/excluder
Version submitted: 0.2.2
Submission type: Standard
Editor: @maurolepore
Reviewers: @juliasilge, @jmobrien

Due date for @juliasilge: 2021-09-20

Due date for @jmobrien: 2021-09-20
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: excluder
Title: Checks for Exclusion Criteria in Online Data
Version: 0.2.2
Authors@R: 
    person(given = "Jeffrey R.",
           family = "Stevens",
           role = c("aut", "cre"),
           email = "jeffrey.r.stevens@gmail.com",
           comment = c(ORCID = "0000-0003-2375-1360"))
Description: Data that are collected through online sources such as Mechanical 
            Turk may require excluding data because of IP address duplication, 
            geolocation, or completion duration. This package facilitates
            exclusion of these data for Qualtrics datasets.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
URL: https://jeffreyrstevens.github.io/excluder/, https://github.com/jeffreyrstevens/excluder/
BugReports: https://github.com/jeffreyrstevens/excluder/issues/
Imports: 
    dplyr,
    iptools,
    janitor,
    lubridate,
    maps,
    tidyr,
    magrittr,
    rlang
Depends: 
    R (>= 3.5.0)
Suggests: 
    testthat (>= 3.0.0),
    readr,
    stringr,
    covr,
    knitr,
    rmarkdown,
    lifecycle
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The package falls under data munging because it processes data from Qualtrics or other online sources by checking for, marking, and excluding rows of data frames for common exclusion criteria (e.g., IP addresses outside of the United States or duplicate entries from the same location/IP address).

  • Who is the target audience and what are scientific applications of this package?

The target audience is data scientists using Qualtrics or other online systems to collect data from participants (e.g., Mechanical Turk workers). Ensuring good data quality from these participants can be tricky. For instance, while Mechanical Turk in theory screens workers based on location (e.g., if you want to restrict your participant pool to workers in the United States), this is not necessarily represented in the data. Finding the tools to screen for IP address location can be tricky, and this package simplifies checking for and excluding participants based on common data that Qualtrics reports such as geolocation, IP address, duplicate records from the same location, participant screen resolution, participant progress through the survey, and survey completion duration.

There are no similar packages to my knowledge. The {qualtRics} package at rOpenSci focuses on importing data from Qualtrics. The {MTurkR} package directly interfaces with the MTurk Requestor API, but the APIs have been deprecated and the package has been removed from CRAN.

Yes, it seems to comply with this guidance. Depending on the data that the user collects, there could be personally identifiable information accessed by this package. In particular, IP addresses that are recorded by Qualtrics can be processed with this package. Note that the package only works with personally identifiable information from data sets that already exist on the users' local file system, and the package does not collect or transmit data in any way. The package also includes a function deidentify() that the user can use to strip location, IP address, language and even participant computer information (e.g., operating system, web browser, screen resolution) from the data frames to deidentify them.

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

#454

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@noamross
Copy link
Contributor

@ropensci-review-bot check package

@ropensci-review-bot
Copy link
Collaborator

Thanks, about to send the query.

@ropensci-review-bot
Copy link
Collaborator

ropensci-review-bot commented Jul 29, 2021

Checks for excluder (v0.2.2)

git hash: 1d8446c9

  • ✔️ Package uses 'roxygen2'
  • ✔️ Package has a 'contributing.md' file
  • ✔️ Package has a 'CITATION' file
  • ✔️ Package has a 'codemeta.json' file
  • ✔️ All functions have examples
  • ✔️ Package 'DESCRIPTION' has a URL field
  • ✔️ Package 'DESCRIPTION' has a BugReports field
  • ✔️ Package name is available
  • ✔️ Package has continuous integration checks
  • ✔️ Package coverage is 81.8%
  • ✔️ R CMD check found no errors
  • ✔️ R CMD check found no warnings

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 3)


1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 35 files) and
  • 1 authors
  • 1 vignette
  • 3 internal data files
  • 8 imported packages
  • 54 exported functions (median 16 lines of code)
  • no non-exported function in R (median 20 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 35 91.1
files_vignettes 1 64.8
files_tests 6 81.5
loc_R 607 50.7
loc_vignettes 118 57.1
loc_tests 252 57.1
num_vignettes 1 60.7
data_size_total 18483 74.3
data_size_median 6139 75.9
n_fns_r 54 50.8
n_fns_r_exported 54 88.4
n_fns_r_not_exported 0 0.0 TRUE
n_fns_per_file_r 1 0.0 TRUE
num_params_per_fn 4 54.3
loc_per_fn_r 18 69.6
loc_per_fn_r_exp 16 39.6
loc_per_fn_r_not_exp 20 77.1
rel_whitespace_R 19 53.9
rel_whitespace_vignettes 34 75.0
rel_whitespace_tests 15 78.7
doclines_per_fn_exp 48 61.4
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 40 57.0

1a. Network visualisation

Interactive network visualisation of calls between objects in package can be viewed by clicking here


2. goodpractice and other checks

Details of goodpractice and other checks (click to open)

3a. Continuous Integration Badges

github

GitHub Workflow Results

name conclusion sha date
pkgdown success 1d8446 2021-07-26
R-CMD-check success 1d8446 2021-07-26
test-coverage success 1d8446 2021-07-26

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following note:

  1. checking Rd cross-references ... NOTE
    Packages unavailable to check Rd xrefs: ‘qualtRics’, ‘rgeolocate’

Test coverage with covr

Package coverage: 81.79

Cyclocomplexity with cyclocomp

The following function have cyclocomplexity >= 15:

function cyclocomplexity
check_duplicates 19

Static code analyses with lintr

lintr found the following 201 potential issues:

message number of times
Lines should not be more than 80 characters. 201


Package Versions

package version
pkgstats 0.0.0.265
pkgcheck 0.0.1.367


Editor-in-Chief Instructions:

This package may be submitted

@JeffreyRStevens
Copy link
Member Author

The only check that failed was "Package does not have a 'contributing.md' file". Does this mean that the package should not have a contributing.md file? So should I just remove the file and remove all references to Contributing to this package?

@noamross
Copy link
Contributor

Thanks for the submission @JeffreyRStevens! Sorry for a bit of a delay as we were working out our new automated diagnostics bot, which your review is the first to use, and you just found a bug in! Your CONTRIBUTING.md file is fine, we are just failing to check the right subfolder. We'll move ahead with this :)

@JeffreyRStevens
Copy link
Member Author

Ah, OK--no problem. Glad to be a guinea pig to help debug!

@maurolepore
Copy link
Member

maurolepore commented Aug 7, 2021

@JeffreyRStevens, it's my pleasure to be the handling editor of your submission.

Editor checks:

  • Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package.
  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via a CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly

Editor comments

Congratulations! The bot and I are very happy to see the package meets rOpenSci's guidelines 👍. I'll start looking for reviewers.

Note the following minor issues. You might want to consider before the review:

  • devtools::check() shows:
checking Rd cross-references ... NOTE
  Packages unavailable to check Rd xrefs:qualtRics’, ‘rgeolocate
  • check_deduplicate() seems to have a relatively complex logic. It may be justified by the goal of the function (to check for various conditions) but might be worth ensuring there is no more logic than necessary.

  • Some if() statements seem fragile as they could get inputs of length greater than 1. Consider the argument dupl_ip to check_duplicates():

# Good
dupl_ip <- TRUE
if (identical(dupl_ip, TRUE)) {
  message("Do something.")
}
#> Do something.

# Good
dupl_ip <- TRUE
stopifnot(length(dupl_ip) == 1L)
if (dupl_ip) {
  message("Do something.")
}
#> Do something.

# Fragile
dupl_ip <- c(TRUE, FALSE)  # This might be accidentally non-atomic
if (dupl_ip == TRUE) {
  message("Do something.")
}
#> Warning in if (dupl_ip == TRUE) {: the condition has length > 1 and only the
#> first element will be used
#> Do something.

Created on 2021-08-07 by the reprex package (v2.0.0)

  • Some lines are longer than 80 characters. Wrapping at 80 characters makes the code easier to review, without having to scroll right when navigating the script.

  • I see two methods to avoid the R CMD check NOTE about undefined global variables: var <- NULL (e.g.) and .data$var. Maybe best to use only one method?

  • In R/ I see a subdirectory (R/deprecated). I find this strange. How is it used?

  • devtools::test() shows:

check-mark-exclude: 
Note: Using an external vector in selections is ambiguous.
i Use `all_of(location_col)` instead of `location_col` to silence this message.
i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
  • The structure of the tests impairs the use of the convenient workflow with usethis::use_r() and usethis::use_test(). Consider structuring tests so that for each exported fun() in R/fun.R you have a test in tests/testthat/test-fun.R.

Reviewers:

Due date:

@maurolepore
Copy link
Member

@ropensci-review-bot seeking reviewers

@ropensci-review-bot
Copy link
Collaborator

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/455_status.svg)](https://github.com/ropensci/software-review/issues/455)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

@maurolepore
Copy link
Member

@JeffreyRStevens, could you suggest two or three potential reviewers? Although I wouldn't pick more than one, your list will inform the type of expertise you think would be useful when reviewing {excluder}.

I'll also use other criteria as described in How to look for reviewers.

@JeffreyRStevens
Copy link
Member Author

@maurolepore thank you for your speedy initial review of {excluder}. I have pushed some changes to address your comments.

  • In running usethis::check(), I am not able to replicate your error regarding Packages unavailable to check Rd xrefs: ‘qualtRics’, ‘rgeolocate’. Would you mind checking if this is still an issue in the new version? If so, does this mean that you need to have the packages installed on your local machine to pass the check?
  • I'm not sure if it is possible to simplify the logic of check_duplicates(). I'm open to ideas from you or the reviewers, though.
  • Regarding the fragile if() statements, are you suggesting that I check if the dupl_id argument is identical to TRUE or of length 1. And, if so, I assume you'd like me to look at other if() statements to see if the same issue applies? Just want to clarify the action items for this issue.
  • I trimmed lines longer than 80 characters. The remaining lines > 80 characters are primarily messages/warnings or tests. Should I break those across multiple lines? Is the aim to reduce the number of lines > 80 or minimize the number of characters > 80? For example, in remove_label_rows.R lines 53-59, I broke this over multiple lines to minimize characters > 80, but it resulted in severa lines slightly larger that 80 characters rather than just line much larger. Which is preferred?
  • Regarding undefined global variables, I used exclusions <- NULL in collapse_exclusions.R because I can't figure out how to use .data in a way to avoid the R CMD note. If I use .data$exclusions to replace exclusions in the unite() function (line 55) or the column definition in the mutate() function (line 60), I receive a compiling error when running the check. The current version has removed the exclusions <- NULL, which results in the R CMD note. One option would be to reinsert the NULL definition and remove all .data from this particular function. But would I need to replace all .data with NULL definitions in all functions or just this one?
  • LOL--the directory R/deprecated was supposed to be added to the .gitignore file. It has now been removed.
  • I have addressed the all_of(location_col) warning in the test.
  • I have restructured the tests to have a test for each function.
  • I have added the rOpenSci peer review badge to the README.
  • Two potential reviewers would be the authors of {qualtRics}: Julia Silge and Jasper Ginn.

Thank you again for your careful review, and let me know if I missed something.

@maurolepore
Copy link
Member

Thanks @JeffreyRStevens for responding quickly.

  • Yes, I still see that R CMD NOTE and two more. FYI I updated all my packages then run devtools::check(). Here is a reprex -- collapsed to save space.
Output
devtools::load_all()
#> ℹ Loading excluder
packageVersion("excluder")
#> [1] '0.2.2'
gert::git_log(max = 1)
#> # A tibble: 1 × 6
#>   commit         author          time                files merge message        
#> * <chr>          <chr>           <dttm>              <int> <lgl> <chr>          
#> 1 d99bc23850c3f… Jeffrey R. Ste… 2021-08-08 15:06:45    29 FALSE "Change all te…

devtools::check()
#> ℹ Updating excluder documentation
#> ℹ Loading excluder
#> Warning: [/home/mauro/git/excluder/R/check_duplicates.R:9] @details Link
#> to unavailable package: qualtRics::fetch_survey. there is no package called
#> 'qualtRics'
#> Warning: [/home/mauro/git/excluder/R/check_duration.R:9] @details Link to
#> unavailable package: qualtRics::fetch_survey. there is no package called
#> 'qualtRics'
#> Warning: [/home/mauro/git/excluder/R/check_ip.R:9] @details Link to unavailable
#> package: qualtRics::fetch_survey. there is no package called 'qualtRics'
#> Warning: [/home/mauro/git/excluder/R/check_location.R:9] @details Link to
#> unavailable package: qualtRics::fetch_survey. there is no package called
#> 'qualtRics'
#> Warning: [/home/mauro/git/excluder/R/check_preview.R:9] @details Link to
#> unavailable package: qualtRics::fetch_survey. there is no package called
#> 'qualtRics'
#> Warning: [/home/mauro/git/excluder/R/check_progress.R:9] @details Link to
#> unavailable package: qualtRics::fetch_survey. there is no package called
#> 'qualtRics'
#> Warning: [/home/mauro/git/excluder/R/check_resolution.R:10] @details Link
#> to unavailable package: qualtRics::fetch_survey. there is no package called
#> 'qualtRics'
#> Warning: [/home/mauro/git/excluder/R/qualtrics_numeric.R:3] @description Link
#> to unavailable package: rgeolocate::ip2location. there is no package called
#> 'rgeolocate'
#> Warning: [/home/mauro/git/excluder/R/qualtrics_raw.R:3] @description Link
#> to unavailable package: rgeolocate::ip2location. there is no package called
#> 'rgeolocate'
#> Warning: [/home/mauro/git/excluder/R/qualtrics_text.R:3] @description Link
#> to unavailable package: rgeolocate::ip2location. there is no package called
#> 'rgeolocate'
#> Warning: [/home/mauro/git/excluder/R/remove_label_rows.R:7] @details Link
#> to unavailable package: qualtRics::fetch_survey. there is no package called
#> 'qualtRics'
#> Writing NAMESPACE
#> Writing NAMESPACE
#> ── Building ──────────────────────────────────────────────────────── excluder ──
#> Setting env vars:
#> • CFLAGS    : -Wall -pedantic
#> • CXXFLAGS  : -Wall -pedantic
#> • CXX11FLAGS: -Wall -pedantic
#> ────────────────────────────────────────────────────────────────────────────────
#>      checking for file ‘/home/mauro/git/excluder/DESCRIPTION’ ...  ✓  checking for file ‘/home/mauro/git/excluder/DESCRIPTION’
#>   ─  preparing ‘excluder’:
#>        checking DESCRIPTION meta-information ...  ✓  checking DESCRIPTION meta-information
#>   ─  installing the package to build vignettes
#>      creating vignettes ...  ✓  creating vignettes (2.6s)
#>   ─  checking for LF line-endings in source and make files and shell scripts
#>   ─  checking for empty or unneeded directories
#>   ─  building ‘excluder_0.2.2.tar.gz’
#>      
#> ── Checking ──────────────────────────────────────────────────────── excluder ──
#> Setting env vars:
#> • _R_CHECK_CRAN_INCOMING_USE_ASPELL_: TRUE
#> • _R_CHECK_CRAN_INCOMING_REMOTE_    : FALSE
#> • _R_CHECK_CRAN_INCOMING_           : FALSE
#> • _R_CHECK_FORCE_SUGGESTS_          : FALSE
#> • NOT_CRAN                          : true
#> ── R CMD check ─────────────────────────────────────────────────────────────────
#> * using log directory ‘/tmp/Rtmpcusbks/excluder.Rcheck’
#> * using R version 4.1.0 (2021-05-18)
#> * using platform: x86_64-pc-linux-gnu (64-bit)
#> * using session charset: UTF-8
#> * using options ‘--no-manual --as-cran’
#> * checking for file ‘excluder/DESCRIPTION’ ... OK
#> * this is package ‘excluder’ version ‘0.2.2’
#> * package encoding: UTF-8
#> * checking package namespace information ... OK
#> * checking package dependencies ... OK
#> * checking if this is a source package ... OK
#> * checking if there is a namespace ... OK
#> * checking for executable files ... OK
#> * checking for hidden files and directories ... OK
#> * checking for portable file names ... OK
#> * checking for sufficient/correct file permissions ... OK
#> * checking whether package ‘excluder’ can be installed ... OK
#> * checking installed package size ... OK
#> * checking package directory ... OK
#> * checking for future file timestamps ... OK
#> * checking ‘build’ directory ... OK
#> * checking DESCRIPTION meta-information ... OK
#> * checking top-level files ... NOTE
#> Non-standard files/directories found at top level:
#>   ‘mid-guppy_reprex.R’ ‘mid-guppy_reprex.md’ ‘ok-coqui_reprex.R’
#>   ‘ok-coqui_reprex.spin.R’ ‘ok-coqui_reprex.spin.Rmd’
#> * checking for left-over files ... OK
#> * checking index information ... OK
#> * checking package subdirectories ... OK
#> * checking R files for non-ASCII characters ... OK
#> * checking R files for syntax errors ... OK
#> * checking whether the package can be loaded ... OK
#> * checking whether the package can be loaded with stated dependencies ... OK
#> * checking whether the package can be unloaded cleanly ... OK
#> * checking whether the namespace can be loaded with stated dependencies ... OK
#> * checking whether the namespace can be unloaded cleanly ... OK
#> * checking loading without being on the library search path ... OK
#> * checking dependencies in R code ... OK
#> * checking S3 generic/method consistency ... OK
#> * checking replacement functions ... OK
#> * checking foreign function calls ... OK
#> * checking R code for possible problems ... NOTE
#> collapse_exclusions: no visible binding for global variable
#>   ‘exclusions’
#> Undefined global functions or variables:
#>   exclusions
#> * checking Rd files ... OK
#> * checking Rd metadata ... OK
#> * checking Rd line widths ... OK
#> * checking Rd cross-references ... NOTE
#> Packages unavailable to check Rd xrefs: ‘qualtRics’, ‘rgeolocate’
#> * checking for missing documentation entries ... OK
#> * checking for code/documentation mismatches ... OK
#> * checking Rd \usage sections ... OK
#> * checking Rd contents ... OK
#> * checking for unstated dependencies in examples ... OK
#> * checking contents of ‘data’ directory ... OK
#> * checking data for non-ASCII characters ... OK
#> * checking LazyData ... OK
#> * checking data for ASCII and uncompressed saves ... OK
#> * checking installed files from ‘inst/doc’ ... OK
#> * checking files in ‘vignettes’ ... OK
#> * checking examples ... OK
#> * checking for unstated dependencies in ‘tests’ ... OK
#> * checking tests ...
#>   Running ‘testthat.R’
#>  OK
#> * checking for unstated dependencies in vignettes ... OK
#> * checking package vignettes in ‘inst/doc’ ... OK
#> * checking re-building of vignette outputs ... OK
#> * checking for non-standard things in the check directory ... OK
#> * checking for detritus in the temp directory ... OK
#> * DONE
#> 
#> Status: 3 NOTEs
#> See
#>   ‘/tmp/Rtmpcusbks/excluder.Rcheck/00check.log’
#> for details.
#> ── R CMD check results ───────────────────────────────────── excluder 0.2.2 ────
#> Duration: 56.5s
#> 
#> > checking top-level files ... NOTE
#>   Non-standard files/directories found at top level:
#>     ‘mid-guppy_reprex.R’ ‘mid-guppy_reprex.md’ ‘ok-coqui_reprex.R’
#>     ‘ok-coqui_reprex.spin.R’ ‘ok-coqui_reprex.spin.Rmd’
#> 
#> > checking R code for possible problems ... NOTE
#>   collapse_exclusions: no visible binding for global variable
#>     ‘exclusions’
#>   Undefined global functions or variables:
#>     exclusions
#> 
#> > checking Rd cross-references ... NOTE
#>   Packages unavailable to check Rd xrefs: ‘qualtRics’, ‘rgeolocate’
#> 
#> 0 errors ✓ | 0 warnings ✓ | 3 notes x
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.0 (2021-05-18)
#>  os       Ubuntu 20.04.2 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/Mazatlan            
#>  date     2021-08-10                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package     * version  date       lib source        
#>    AsioHeaders   1.16.1-1 2020-07-07 [1] RSPM (R 4.1.0)
#>    askpass       1.1      2019-01-13 [1] CRAN (R 4.1.0)
#>    assertthat    0.2.1    2019-03-21 [1] CRAN (R 4.1.0)
#>    backports     1.2.1    2020-12-09 [1] CRAN (R 4.1.0)
#>    cachem        1.0.5    2021-05-15 [1] CRAN (R 4.1.0)
#>    callr         3.7.0    2021-04-20 [1] CRAN (R 4.1.0)
#>    cli           3.0.1    2021-07-17 [1] CRAN (R 4.1.0)
#>    commonmark    1.7      2018-12-01 [1] CRAN (R 4.1.0)
#>    crayon        1.4.1    2021-02-08 [1] CRAN (R 4.1.0)
#>    credentials   1.3.1    2021-07-25 [1] RSPM (R 4.1.0)
#>    DBI           1.1.1    2021-01-15 [1] CRAN (R 4.1.0)
#>    desc          1.3.0    2021-03-05 [1] CRAN (R 4.1.0)
#>    devtools      2.4.2    2021-06-07 [1] CRAN (R 4.1.0)
#>    digest        0.6.27   2020-10-24 [1] CRAN (R 4.1.0)
#>    dplyr         1.0.7    2021-06-18 [1] CRAN (R 4.1.0)
#>    ellipsis      0.3.2    2021-04-29 [1] CRAN (R 4.1.0)
#>    evaluate      0.14     2019-05-28 [1] CRAN (R 4.1.0)
#>  P excluder    * 0.2.2    2021-08-07 [?] local         
#>    fansi         0.5.0    2021-05-25 [1] CRAN (R 4.1.0)
#>    fastmap       1.1.0    2021-01-25 [1] CRAN (R 4.1.0)
#>    fs            1.5.0    2020-07-31 [1] RSPM (R 4.1.0)
#>    generics      0.1.0    2020-10-31 [1] CRAN (R 4.1.0)
#>    gert          1.3.1    2021-06-23 [1] CRAN (R 4.1.0)
#>    glue          1.4.2    2020-08-27 [1] CRAN (R 4.1.0)
#>    highr         0.9      2021-04-16 [1] CRAN (R 4.1.0)
#>    hms           1.1.0    2021-05-17 [1] CRAN (R 4.1.0)
#>    htmltools     0.5.1.1  2021-01-22 [1] CRAN (R 4.1.0)
#>    iptools       0.6.1    2018-12-09 [1] RSPM (R 4.1.0)
#>    janitor       2.1.0    2021-01-05 [1] CRAN (R 4.1.0)
#>    knitr         1.33     2021-04-24 [1] CRAN (R 4.1.0)
#>    lifecycle     1.0.0    2021-02-15 [1] CRAN (R 4.1.0)
#>    lubridate     1.7.10   2021-02-26 [1] CRAN (R 4.1.0)
#>    magrittr      2.0.1    2020-11-17 [1] CRAN (R 4.1.0)
#>    maps          3.3.0    2018-04-03 [1] RSPM (R 4.0.3)
#>    memoise       2.0.0    2021-01-26 [1] CRAN (R 4.1.0)
#>    openssl       1.4.4    2021-04-30 [1] CRAN (R 4.1.0)
#>    pillar        1.6.2    2021-07-29 [1] CRAN (R 4.1.0)
#>    pkgbuild      1.2.0    2020-12-15 [1] CRAN (R 4.1.0)
#>    pkgconfig     2.0.3    2019-09-22 [1] CRAN (R 4.1.0)
#>    pkgload       1.2.1    2021-04-06 [1] CRAN (R 4.1.0)
#>    prettyunits   1.1.1    2020-01-24 [1] CRAN (R 4.1.0)
#>    processx      3.5.2    2021-04-30 [1] CRAN (R 4.1.0)
#>    ps            1.6.0    2021-02-28 [1] CRAN (R 4.1.0)
#>    purrr         0.3.4    2020-04-17 [1] CRAN (R 4.1.0)
#>    R6            2.5.0    2020-10-28 [1] CRAN (R 4.1.0)
#>    rcmdcheck     1.3.3    2019-05-07 [1] CRAN (R 4.1.0)
#>    Rcpp          1.0.7    2021-07-07 [1] CRAN (R 4.1.0)
#>    readr         2.0.1    2021-08-10 [1] CRAN (R 4.1.0)
#>    remotes       2.4.0    2021-06-02 [1] CRAN (R 4.1.0)
#>    reprex        2.0.1    2021-08-05 [1] RSPM (R 4.1.0)
#>    rlang         0.4.11   2021-04-30 [1] CRAN (R 4.1.0)
#>    rmarkdown     2.10     2021-08-06 [1] CRAN (R 4.1.0)
#>    roxygen2      7.1.1    2020-06-27 [1] CRAN (R 4.1.0)
#>    rprojroot     2.0.2    2020-11-15 [1] CRAN (R 4.1.0)
#>    rstudioapi    0.13     2020-11-12 [1] CRAN (R 4.1.0)
#>    sessioninfo   1.1.1    2018-11-05 [1] CRAN (R 4.1.0)
#>    snakecase     0.11.0   2019-05-25 [1] CRAN (R 4.1.0)
#>    stringi       1.7.3    2021-07-16 [1] CRAN (R 4.1.0)
#>    stringr       1.4.0    2019-02-10 [1] CRAN (R 4.1.0)
#>    styler        1.5.1    2021-07-13 [1] CRAN (R 4.1.0)
#>    sys           3.4      2020-07-23 [1] CRAN (R 4.1.0)
#>    testthat    * 3.0.4    2021-07-01 [1] CRAN (R 4.1.0)
#>    tibble        3.1.3    2021-07-23 [1] RSPM (R 4.1.0)
#>    tidyr         1.1.3    2021-03-03 [1] CRAN (R 4.1.0)
#>    tidyselect    1.1.1    2021-04-30 [1] CRAN (R 4.1.0)
#>    triebeard     0.3.0    2016-08-04 [1] RSPM (R 4.1.0)
#>    tzdb          0.1.2    2021-07-20 [1] RSPM (R 4.1.0)
#>    usethis       2.0.1    2021-02-10 [1] CRAN (R 4.1.0)
#>    utf8          1.2.2    2021-07-24 [1] RSPM (R 4.1.0)
#>    vctrs         0.3.8    2021-04-29 [1] CRAN (R 4.1.0)
#>    withr         2.4.2    2021-04-18 [1] CRAN (R 4.1.0)
#>    xfun          0.25     2021-08-06 [1] CRAN (R 4.1.0)
#>    xml2          1.3.2    2020-04-23 [1] CRAN (R 4.1.0)
#>    xopen         1.0.0    2018-09-17 [1] CRAN (R 4.1.0)
#>    yaml          2.2.1    2020-02-01 [1] CRAN (R 4.1.0)
#> 
#> [1] /home/mauro/R/x86_64-pc-linux-gnu-library/4.1
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
#> 
#>  P ── Loaded and on-disk path mismatch.
  • Inside if() I suggest you use identical() not == because if() only takes objects of length 1. identical() always returns an object of length 1, but the output of == can be longer. If you do use == inside if() then be sure to first assert the input is indeed of length 1 (e.g. with stopifnot()).

  • When styling your code I suggest you stick to lines of 80 characters or less. The goal is to be friendly with the reviewers, who may adjust their screen to show the typical 80 characters. Longer lines force them to scroll right, which sounds like a minor issue but after a while can get annoying. The image below shows how the first line of code overflows my 80 characters marker. Under that I have the same code styled in different ways -- all fit in my screen.

  • In line 55 I think you need to quote "exclusions". In ?tidyr::unite() I see the argument col is The name of the new column, as a string or symbol. I thinks symbols (e.g. exclusions) are allowed for backward compatibility and I believe the tidyverse now recommnds reserving symbols for existing columns and strings (e.g. "exclusions") for new columns. Note the examples section of unite()'s helpflie uses strings ("z" and "xy") -- not symbols (z or xy). I tried "exclusions" and successfully removed the NOTE. Line 60 looks good to me.

Thanks for your work and suggestions for reviewers.

@JeffreyRStevens
Copy link
Member Author

Many thanks for the clarifications and help, @maurolepore.

  • I believe that I fixed the R CMD note about qualtRics and rgeolocate. I was referencing their documentation, and if you don't have those packages installed, the documentation cannot be linked. So I changed the references to external URLs rather than internal documentation.
  • I have now changed all == operators to identical() in if() statements that use user inputted arguments. I have added stopifnot() statements when other operators (e.g., %in%) are used in if() statements with user inputted arguments.
  • I greatly reduced the number of lines over 80 characters but did not eliminate them all (e.g., when breaking a line would cause a message to print on two lines or would make readability much worse).
  • Thank you for this! The quotes fixed it.

@maurolepore
Copy link
Member

@ropensci-review-bot add @juliasilge to reviewers

@ropensci-review-bot
Copy link
Collaborator

@juliasilge added to the reviewers list. Review due date is 2021-09-20. Thanks @juliasilge for accepting to review! Please refer to our reviewer guide.

@ropensci-review-bot
Copy link
Collaborator

@juliasilge: If you haven't done so, please fill this form for us to update our reviewers records.

@maurolepore
Copy link
Member

@ropensci-review-bot add @jmobrien to reviewers

@ropensci-review-bot
Copy link
Collaborator

@jmobrien added to the reviewers list. Review due date is 2021-09-20. Thanks @jmobrien for accepting to review! Please refer to our reviewer guide.

@ropensci-review-bot
Copy link
Collaborator

@jmobrien: If you haven't done so, please fill this form for us to update our reviewers records.

@maurolepore
Copy link
Member

@JeffreyRStevens, I'm thrilled that @juliasilge and @jmobrien accepted to review the excluder package.

Note @ropensci-review-bot set the due date following rOpenSci's guidelines to 2021-09-20. However, these are difficult times and I would like to allow reviewers 1 extra week. Feel free to submit your review whenever ready but if you don't I'll touch base on 2021-09-27.

I look forward to working with you all.

@juliasilge
Copy link

Package Review

Congratulations to the author on this useful package for folks handling Qualtrics surveys. It will be convenient to have all these common checks in a consistent set of functions, and the messaging in the console is very nice. 🙌

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples (that run successfully locally) for all exported functions
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

On documentation, I find some of the way the documentation works a bit confusing, especially when things go wrong for users. For example, if I have a dataset with no IP addresses or location information and I try to run exclude_duplicates(), I get this error:

Error in check_duplicates(x, quiet = TRUE, ...) : 
The column specifying location ('location_col') was not found.

However, when I look at the documentation for the function I used, I don't see anything about location_col and it's not clear which link I should click to find the relevant info. Some options might be to change the documentation for the dots or write a more clear error message.

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 2

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

🎯 The function name collapse_exclusions() reminds me a bit too much of dplyr::collapse() when what it does is much more like tidyr::unite(). What about a name like unite_exclusions()? Along those lines, did you consider names like filter_*() instead of exclude_*()? A benefit to using names that align with other names in the ecosystem is that these functions then slot right in to people's mental model of what they are doing.

🎯 None of my real surveys have IP addresses so I was not able to check the iptools integration beyond the example survey data included in the package.

🎯 In the future, if you wanted to increase the polish of the console messages (once the nicest features of this package that I think will draw folks to use it), you might check out using cli.

@maurolepore
Copy link
Member

@juliasilge thanks for your review.
@jmobrien when do you think you'll be able to submit yours?

@jmobrien
Copy link

jmobrien commented Sep 27, 2021 via email

@maurolepore
Copy link
Member

@JeffreyRStevens,

I'm watching your discussion with the reviewers. Do you feel the ball is no longer on your court? If so I would ask the reviewers once again whether your changes sufficiently address any issues they raised in the review. Please confirm and I'll write a new comment with a @mention to them.

We no longer review submissions to JOSS, but do mention the review -- I think you might get fast-tracking.

@JeffreyRStevens
Copy link
Member Author

@maurolepore, yes, I think that I've addressed the reviewers concerns, and I'm waiting for their final approval. Thanks.

@maurolepore
Copy link
Member

Dear reviewers,

@jmobrien, do you have an estimate about when you might be able to confirm whether the changes made are sufficient to approve this package?

@juliasilge, it seems your last concern (return visibly) has been addressed. Is there anything else you'd like to request or do you confirm the changes made are sufficient to approve this package?

@jmobrien
Copy link

jmobrien commented Oct 31, 2021 via email

@juliasilge
Copy link

For me, the changes have addressed the issues I was interested in, and I approve it moving forward. ✅

@jmobrien
Copy link

jmobrien commented Nov 1, 2021

This looks much better. The modularization achieves a lot of the code streamlining I was hoping for, especially.

I think I'm ready to approve, except for one maybe-bug and maybe 1 or 2 suggestions:

1. Small bug(?):

mark_ip() checks for missing IPs (previews, mostly) and notes them, but doesn't actually mark those rows. Thus, they aren't kept in check_ip() and they're left in with exclude_ip(). If this is intentional (because mark_preview() exists), then fine, but otherwise this should be fixed.

2. marking columns in check_*() functions

Currently check_() functions drop the marking variable columns. But since check_()* seems intended for manual review, and the marks sometimes have useful summary notes (e.g., duration). Would it make sense to keep them in, or at least have an argument for that? If so, I also might move them to the first column in that case for easier viewing if a survey is long.

3. Exclusions for exclude_*()?

Is the philosophy for having check_*() functions that, following review, you might sometimes see cases you still want to keep? If so and that occurs, it seems like exclude_*() stops working well. There's no way to note which cases you want to protect (other than modifying the actual data, which presumably you don't want to do).

I'm not sure how you would want to approach an explicit feature for this, so I don't think adding one should hold up approval. Maybe something to consider for the future. Still, if this does fit the intended use for check_*(), maybe a simple addition to the vignette covering how to do this with a mark --> [adjustments] --> filter workflow in lieu of exclude would be good for users to see.

I think all of the above should be pretty trivial to implement, and ignore them if they don't fit your goals. Other than that, I think this is ready to go! Great work!

@JeffreyRStevens
Copy link
Member Author

@jmobrien, @juliasilge, @maurolepore, I have a question for you all.
I'm working on the issues raised above and have created a new function (in utils.R) called keep_marked_column() that is used in the check_*() functions. This function reads in the data, the name of the exclusion column that is created by the mark_*() function, and a flag for whether to keep the column in the output (as per item 2 above). It then simply removes or relocates the exclusion column based on the flag. However, I'm running into a problem with passing R CMD checks.
When I run the R CMD check on the package with the function as it is currently written (and pushed), I receive the dreaded no visible binding for global variable note. This seems to result from when I embrace the column name with {{ }} in line 31: x %>% dplyr::select(-{{ column }}). If I switch to an indirect call of the column name with .data[[column]] (as used in line 33), I receive an error that the column name is not found (e.g., when running check_duplicates(), I get Error: object 'exclusion_duplicates' not found). I can get it to work with .data[[!!rlang::enquo(column)]], but then I get the check error again.
Is there a more elegant way of resolving this besides assigning all of the columns to a globalVariables() call?

@maurolepore
Copy link
Member

@JeffreyRStevens, I'm willing to help. Can you share the link to a branch or PR so I can pull it locally and reproduce the issue?

@JeffreyRStevens
Copy link
Member Author

@maurolepore, many thanks! It is just on the main branch at https://github.com/JeffreyRStevens/excluder.

@jmobrien
Copy link

jmobrien commented Nov 2, 2021

@JeffreyRStevens took a glance at it on your page. Just a thought, but I'm actually wondering if simplest solution to what you're facing may be to just make the column argument of keep_marked_column() take a string instead of a bare name. Then, just use all_of() inside each of the select and relocate functions?

The name being passed into keep_marked_column() is known and fixed for each check function, so I don't think you need anything more flexible. And the name itself is really all you'll ever need out of that argument, so a string seems like it could be fully adequate?

(You could also do something similar using deparse(substitute(column)), or some more "tidy" equivalent, inside the function)

@jmobrien
Copy link

jmobrien commented Nov 2, 2021

Working with the current approach, might this work?

I haven't tested it in a build and @JeffreyRStevens might know something better/more best-practices, but I think it's an option.

keep_marked_column <- function(x, column, keep) {
  col <- rlang::ensym(column)
  if (identical(keep, FALSE)) {
    x %>% dplyr::select(!!col)
  } else {
    x %>% dplyr::relocate(!!col)
  }
}

@maurolepore
Copy link
Member

@JeffreyRStevens,

My local experiment suggests that adding .data$ to your column names avoids the R CMD check NOTE. To confirm the checks on GitHub Actions I believe you'll need to approve the workflows somehow (apparently because I'm a first-time contributor to your package).

https://github.com/JeffreyRStevens/excluder/pull/5/files

My suggestion is independent to the advice that @jmobrien shared.

--

Also, while I'm here, you may simplify the if() clause:

keep <- FALSE
out <- "it's true"
if (!keep) {
  out <- "it's false"
} 
out
#> [1] "it's false"

keep <- TRUE
out <- "it's true"
if (!keep) {
  out <- "it's false"
} 
out
#> [1] "it's true"

See https://speakerdeck.com/jennybc/code-smells-and-feels?slide=44

@JeffreyRStevens
Copy link
Member Author

Many thanks, @maurolepore and @jmobrien. I have merged @maurolepore's PR. I will tidy up a few things and then report back with a final summary soon.

@JeffreyRStevens
Copy link
Member Author

@jmobrien

  1. Small bug(?):
    I have now added a flag in mark_ip() that tracks whether to include NAs.

  2. marking columns in check_*() functions
    I have now added a flag in the check*_() functions that tracks whether to keep (and move to the first column) the exclusion column or remove it.

  3. Exclusions for exclude_*()?
    This is an interesting idea that I would like to think about a bit before doing anything. If you're OK with it @jmobrien, I'd like to wait to deal with this until after approval.

Is there anything else?

@JeffreyRStevens
Copy link
Member Author

I have increased testing coverage with some additional tests. @jmobrien, does this meet your final approval?

@maurolepore
Copy link
Member

I think all of the above should be pretty trivial to implement, and ignore them if they don't fit your goals. Other than that, I think this is ready to go! Great work!
-- #455 (comment)

The way I interpret @jmobrien's comment is that he didn't need another round of review. You have addressed all his latest comments, so congratulations @JeffreyRStevens -- excluder is approved!

And once again thanks @juliasilge and @jmobrien for your amazing work!

I'll follow up soon with some more comments and instructions.

@maurolepore
Copy link
Member

@ropensci-review-bot approve excluder

@ropensci-review-bot
Copy link
Collaborator

Approved! Thanks @JeffreyRStevens for submitting and @juliasilge, @jmobrien for your reviews! 😁

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the new repository URL.
  • Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
  • We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

@maurolepore
Copy link
Member

@JeffreyRStevens I think the comment above includes everything you except two things:

  1. As you intend to submit to CRAN, please see https://devdevguide.netlify.app/building.html#crangotchas

  2. See the links under Package promotion.

@JeffreyRStevens
Copy link
Member Author

This is fantastic news! Many thanks @maurolepore, @juliasilge, and @jmobrien for the time and effort you have put into reviewing this package. It is soooo much better after your review, and I have learned a ton.

@JeffreyRStevens
Copy link
Member Author

@maurolepore, I believe I have fully transitioned the package over to rOpenSci. The one remaining issue that I can't sort out is how to change the URL link in the About section of the GitHub page (https://github.com/ropensci/excluder) to https://docs.ropensci.org/excluder/. It is currently still linking to jeffreyrstevens.github.io/excluder even though I have changed all of the links in the repo. Do you know how to change this or will it just resolve on its own at some point?

@maelle
Copy link
Member

maelle commented Nov 4, 2021

@JeffreyRStevens I've made you admin of your repo again, so you should be able to make changes now.

Admin rights are always lost during transfer, but we're working on a bot command for allowing package authors themselves to reclaim them. You'll get a smoother experience if you submit a second package. 😉

@jmobrien
Copy link

jmobrien commented Nov 4, 2021

Sorry @JeffreyRStevens and @maurolepore, my internet was out almost all of yesterday and I was dealing with that. But @maurolepore was correct about my stance. Just so it's explicit: I'm happy about the changes and am glad to see this approved! Congrats!

@JeffreyRStevens
Copy link
Member Author

Many thanks @maelle and @maurolepore. I have now changed the URL.
And thanks, @jmobrien!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants