Description
Date accepted: 2021-11-04
Submitting Author Name: Jeffrey Stevens
Submitting Author Github Handle: @JeffreyRStevens
Repository: https://github.com/JeffreyRStevens/excluder
Version submitted: 0.2.2
Submission type: Standard
Editor: @maurolepore
Reviewers: @juliasilge, @jmobrien
Due date for @jmobrien: 2021-09-20
Archive: TBD
Version accepted: TBD
- Paste the full DESCRIPTION file inside a code block below:
Package: excluder
Title: Checks for Exclusion Criteria in Online Data
Version: 0.2.2
Authors@R:
person(given = "Jeffrey R.",
family = "Stevens",
role = c("aut", "cre"),
email = "jeffrey.r.stevens@gmail.com",
comment = c(ORCID = "0000-0003-2375-1360"))
Description: Data that are collected through online sources such as Mechanical
Turk may require excluding data because of IP address duplication,
geolocation, or completion duration. This package facilitates
exclusion of these data for Qualtrics datasets.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
URL: https://jeffreyrstevens.github.io/excluder/, https://github.com/jeffreyrstevens/excluder/
BugReports: https://github.com/jeffreyrstevens/excluder/issues/
Imports:
dplyr,
iptools,
janitor,
lubridate,
maps,
tidyr,
magrittr,
rlang
Depends:
R (>= 3.5.0)
Suggests:
testthat (>= 3.0.0),
readr,
stringr,
covr,
knitr,
rmarkdown,
lifecycle
Config/testthat/edition: 3
VignetteBuilder: knitr
Scope
-
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- data retrieval
- data extraction
- data munging
- data deposition
- workflow automation
- version control
- citation management and bibliometrics
- scientific software wrappers
- field and lab reproducibility tools
- database software bindings
- geospatial data
- text analysis
-
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
The package falls under data munging because it processes data from Qualtrics or other online sources by checking for, marking, and excluding rows of data frames for common exclusion criteria (e.g., IP addresses outside of the United States or duplicate entries from the same location/IP address).
- Who is the target audience and what are scientific applications of this package?
The target audience is data scientists using Qualtrics or other online systems to collect data from participants (e.g., Mechanical Turk workers). Ensuring good data quality from these participants can be tricky. For instance, while Mechanical Turk in theory screens workers based on location (e.g., if you want to restrict your participant pool to workers in the United States), this is not necessarily represented in the data. Finding the tools to screen for IP address location can be tricky, and this package simplifies checking for and excluding participants based on common data that Qualtrics reports such as geolocation, IP address, duplicate records from the same location, participant screen resolution, participant progress through the survey, and survey completion duration.
- Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
There are no similar packages to my knowledge. The {qualtRics} package at rOpenSci focuses on importing data from Qualtrics. The {MTurkR} package directly interfaces with the MTurk Requestor API, but the APIs have been deprecated and the package has been removed from CRAN.
- (If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
Yes, it seems to comply with this guidance. Depending on the data that the user collects, there could be personally identifiable information accessed by this package. In particular, IP addresses that are recorded by Qualtrics can be processed with this package. Note that the package only works with personally identifiable information from data sets that already exist on the users' local file system, and the package does not collect or transmit data in any way. The package also includes a function deidentify()
that the user can use to strip location, IP address, language and even participant computer information (e.g., operating system, web browser, screen resolution) from the data frames to deidentify them.
- If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Technical checks
Confirm each of the following by checking the box.
- I have read the guide for authors and rOpenSci packaging guide.
This package:
- does not violate the Terms of Service of any service it interacts with.
- has a CRAN and OSI accepted license.
- contains a README with instructions for installing the development version.
- includes documentation with examples for all functions, created with roxygen2.
- contains a vignette with examples of its essential functions and uses.
- has a test suite.
- has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.
Publication options
-
Do you intend for this package to go on CRAN?
-
Do you intend for this package to go on Bioconductor?
-
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- The package is novel and will be of interest to the broad readership of the journal.
- The manuscript describing the package is no longer than 3000 words.
- You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
- (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
- (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
- (Please do not submit your package separately to Methods in Ecology and Evolution)
Code of conduct
- I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.