Skip to content

excluder: checks for exclusion criteria in online data #455

Closed
@JeffreyRStevens

Description

@JeffreyRStevens

Date accepted: 2021-11-04
Submitting Author Name: Jeffrey Stevens
Submitting Author Github Handle: @JeffreyRStevens
Repository: https://github.com/JeffreyRStevens/excluder
Version submitted: 0.2.2
Submission type: Standard
Editor: @maurolepore
Reviewers: @juliasilge, @jmobrien

Due date for @juliasilge: 2021-09-20

Due date for @jmobrien: 2021-09-20
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: excluder
Title: Checks for Exclusion Criteria in Online Data
Version: 0.2.2
Authors@R: 
    person(given = "Jeffrey R.",
           family = "Stevens",
           role = c("aut", "cre"),
           email = "jeffrey.r.stevens@gmail.com",
           comment = c(ORCID = "0000-0003-2375-1360"))
Description: Data that are collected through online sources such as Mechanical 
            Turk may require excluding data because of IP address duplication, 
            geolocation, or completion duration. This package facilitates
            exclusion of these data for Qualtrics datasets.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
URL: https://jeffreyrstevens.github.io/excluder/, https://github.com/jeffreyrstevens/excluder/
BugReports: https://github.com/jeffreyrstevens/excluder/issues/
Imports: 
    dplyr,
    iptools,
    janitor,
    lubridate,
    maps,
    tidyr,
    magrittr,
    rlang
Depends: 
    R (>= 3.5.0)
Suggests: 
    testthat (>= 3.0.0),
    readr,
    stringr,
    covr,
    knitr,
    rmarkdown,
    lifecycle
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The package falls under data munging because it processes data from Qualtrics or other online sources by checking for, marking, and excluding rows of data frames for common exclusion criteria (e.g., IP addresses outside of the United States or duplicate entries from the same location/IP address).

  • Who is the target audience and what are scientific applications of this package?

The target audience is data scientists using Qualtrics or other online systems to collect data from participants (e.g., Mechanical Turk workers). Ensuring good data quality from these participants can be tricky. For instance, while Mechanical Turk in theory screens workers based on location (e.g., if you want to restrict your participant pool to workers in the United States), this is not necessarily represented in the data. Finding the tools to screen for IP address location can be tricky, and this package simplifies checking for and excluding participants based on common data that Qualtrics reports such as geolocation, IP address, duplicate records from the same location, participant screen resolution, participant progress through the survey, and survey completion duration.

There are no similar packages to my knowledge. The {qualtRics} package at rOpenSci focuses on importing data from Qualtrics. The {MTurkR} package directly interfaces with the MTurk Requestor API, but the APIs have been deprecated and the package has been removed from CRAN.

Yes, it seems to comply with this guidance. Depending on the data that the user collects, there could be personally identifiable information accessed by this package. In particular, IP addresses that are recorded by Qualtrics can be processed with this package. Note that the package only works with personally identifiable information from data sets that already exist on the users' local file system, and the package does not collect or transmit data in any way. The package also includes a function deidentify() that the user can use to strip location, IP address, language and even participant computer information (e.g., operating system, web browser, screen resolution) from the data frames to deidentify them.

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

#454

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions