Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daiquiri: Data quality reporting for temporal datasets #535

Closed
12 of 29 tasks
phuongquan opened this issue May 16, 2022 · 80 comments
Closed
12 of 29 tasks

daiquiri: Data quality reporting for temporal datasets #535

phuongquan opened this issue May 16, 2022 · 80 comments
Assignees

Comments

@phuongquan
Copy link
Member

phuongquan commented May 16, 2022

Date accepted: 2022-10-25

Submitting Author Name: T. Phuong Quan
Submitting Author Github Handle: @phuongquan
Other Package Authors Github handles: (comma separated, delete if none)
Repository: https://github.com/phuongquan/daiquiri
Version submitted: 0.7.1
Submission type: Standard
Editor: @maurolepore
Reviewers: @brad-cannell, @elinw

Due date for @brad-cannell: 2022-07-27

Due date for @elinw: 2022-08-17
Archive: TBD
Version accepted: TBD
Language: en

  • Paste the full DESCRIPTION file inside a code block below:
Package: daiquiri
Type: Package
Title: Data Quality Reporting for Temporal Datasets
Version: 0.7.1
Authors@R: c(
    person(c("T.", "Phuong"), "Quan", email = "phuong.quan@ndm.ox.ac.uk",
        role = c("aut", "cre"), comment = c(ORCID = "0000-0001-8566-1817")),
    person("Jack", "Cregan", role = "ctb"),
    person(family = "University of Oxford", role = "cph"),
    person(family = "National Institute for Health Research (NIHR)", role = "fnd")
    )
Description: Generate reports that enable quick visual review of 
    temporal shifts in record-level data. Time series plots showing aggregated 
    values are automatically created for each data field (column) depending on its 
    contents (e.g. min/max/mean values for numeric data, no. of distinct 
    values for categorical data), as well as overviews for missing values, 
    non-conformant values, and duplicated rows. The resulting reports are sharable 
    and can contribute to forming a transparent record of the entire analysis process. 
    It is designed with Electronic Health Records in mind, but can be used for 
    any type of record-level temporal data (i.e. tabular data where each row represents 
    a single “event”, one column contains the "event date", and other columns 
    contain any associated values for the event).
URL: https://github.com/phuongquan/daiquiri
BugReports: https://github.com/phuongquan/daiquiri/issues
License: GPL (>=3)
Encoding: UTF-8
Imports:
    data.table (>= 1.12.8),
    readr (>= 1.3.1),
    ggplot2 (>= 3.1.0),
    scales (>= 1.1.0),
    cowplot (>= 0.9.3),
    rmarkdown,
    reactable (>= 0.2.3),
    utils,
    stats
RoxygenNote: 7.1.2
Suggests:
    covr,
    knitr,
    testthat (>= 3.0.0),
    codemetar
VignetteBuilder: knitr
Config/testthat/edition: 3

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

It takes a generic data frame containing raw, record-level, temporal data, and generates a data quality report that enables quick visual review of any unexpected temporal shifts in measures such as missingness, min/max/mean/distinct values, and non-conformance.

  • Who is the target audience and what are scientific applications of this package?

The target audience is all researchers who analyse data from large, temporal datasets, particularly routinely-collected data such as electronic health records. The package helps them to quickly check for temporal biases in their data before embarking on their main analyses. It also helps them to do this in a thorough, consistent and transparent way (since the reports are shareable), hence increasing the quality of their studies as well as trust in the scientific process.

To my knowledge, there are a small number of R packages that generate summary statistics and/or data quality reports, (with the two most similar being dataquieR and DQAstats), but none which assist in identifying temporal changes in the data, nor which are as lightweight to use and consume.

Yes

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

#527

  • Explain reasons for any pkgcheck items which your package is unable to pass.

Here is the list of items from version 0.0.3.11. The latest development version (0.0.3.13) is not currently working on Windows.

  1. These functions do not have examples: [daiquiri] - This is not actually a function, daiquiri is the package help page and includes links to the "most useful" function and vignette.
  2. Package uses global assignment operator (‘<<-’). - This is necessary for a withCallingHandlers() call
  3. Package has no continuous integration checks. - The package uses GitHub Actions for R-CMD-check and for covr so I don't know why this is showing as a fail.
  4. Namespace in Imports field not imported from: ‘reactable’ All declared Imports should be used - The 'reactable' package is in fact used but is probably being missed because it is called from an Rmd file in the inst/rmd folder
  5. There are two functions with cyclocomplexity above 15: a) validate_params_type() and b) aggregatefield(). These functions are essentially large switch statements covering a) all arguments to all exported functions (for validating what a user passes to the function, and where some arguments are used across multiple functions), and b) all the different ways that the package can aggregate a data field (e.g. mean, min, max). The only obvious way I can see to reduce the cyclocomplexity score would be to separate out the items into subfunctions, but I judge that would actually make the code harder to follow rather than easier. I am open to other suggestions.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@ropensci-review-bot
Copy link
Collaborator

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for daiquiri (v0.7.1)

git hash: c316aeb1

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✖️ These functions do not have examples: [daiquiri].
  • ✔️ Package has continuous integration checks.
  • ✖️ Package coverage failed
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>=3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 465
internal daiquiri 120
internal graphics 6
imports ggplot2 63
imports stats 26
imports data.table 17
imports utils 9
imports scales 8
imports readr 5
imports cowplot 5
imports rmarkdown 1
imports reactable NA
suggests covr NA
suggests knitr NA
suggests testthat NA
suggests codemetar NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (83), c (29), format (23), sum (22), length (20), names (20), vapply (17), structure (16), is.na (15), for (12), inherits (11), with (11), character (10), is.nan (10), max (9), suppressWarnings (8), seq_along (7), class (6), labels (6), min (6), paste0 (6), which (6), as.character (5), call (5), logical (5), message (5), ncol (5), options (5), anyNA (4), comment (4), lapply (4), mean (4), quote (4), unique (4), as.integer (3), file (3), nrow (3), vector (3), by (2), ceiling (2), data.frame (2), file.path (2), formals (2), grep (2), nchar (2), strsplit (2), which.max (2), as.Date (1), as.list (1), as.numeric (1), col (1), do.call (1), dQuote (1), emptyenv (1), floor (1), gsub (1), is.symbol (1), log10 (1), missing (1), new.env (1), open (1), paste (1), q (1), rle (1), row.names (1), save (1), seq (1), sort (1), substring (1), sys.calls (1), system.file (1), typeof (1), unlist (1), warning (1), withCallingHandlers (1)

daiquiri

fieldtypes (15), datafield (14), fieldtype (11), get_datafield_max (4), get_datafield_min (4), is.fieldtype_timepoint (3), timepoint_as_aggregationunit (3), aggregate_data (2), aggtype_friendlyname (2), fieldtypes_to_cols (2), ft_allfields (2), ft_ignore (2), ft_timepoint (2), get_datafield_missing (2), identify_duplicaterows (2), is.fieldtype (2), log_initialise (2), plot_overview_heatmap_static (2), plot_overview_totals_static (2), aggregateallfields (1), aggregatefield (1), create_report (1), export_aggregated_data (1), fieldtypes_template (1), fieldtypes_to_string (1), ft_categorical (1), ft_datetime (1), ft_duplicates (1), ft_freetext (1), ft_numeric (1), ft_simple (1), ft_uniqueidentifier (1), get_aggfunctions (1), get_collector (1), get_dataclass (1), get_datafield_basetype (1), get_datafield_count (1), get_datafield_fieldtype_name (1), get_datafield_validation_warnings_n (1), get_datafield_vector (1), get_fieldtype_name (1), is.aggregatedata (1), is.aggregatefield (1), is.datafield (1), is.fieldtype_calculated (1), is.fieldtype_datetime (1), is.fieldtype_ignore (1), is.fieldtype_numeric (1), is.fieldtypes (1), is.sourcedata (1), log_close (1), log_function_end (1), log_function_start (1), log_message (1), plot_overview_combo_static (1), plot_timeseries_static (1), prepare_data (1), report_data (1), summarise_aggregated_data (1), summarise_source_data (1), yscale_breaks (1)

ggplot2

element_text (17), theme (7), ggplot (6), element_blank (5), labs (5), aes_string (4), a (3), element_rect (3), scale_fill_gradient (3), scale_x_date (3), facet_grid (2), geom_point (2), geom_line (1), scale_y_continuous (1), unit (1)

stats

df (10), dt (7), heatmap (6), median (3)

data.table

data.table (10), fifelse (4), as.data.table (2), copy (1)

utils

data (7), object.size (1), packageNa (1)

scales

label_date_short (5), breaks_pretty (3)

graphics

title (6)

cowplot

plot_grid (5)

readr

read_delim (2), col_character (1), cols (1), type_convert (1)

rmarkdown

render (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 8 files) and
  • 1 authors
  • 1 vignette
  • no internal data file
  • 9 imported packages
  • 18 exported functions (median 10 lines of code)
  • 130 non-exported functions in R (median 8 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 8 50.7
files_vignettes 1 68.4
files_tests 7 86.4
loc_R 1314 75.1
loc_vignettes 143 37.3
loc_tests 690 80.9
num_vignettes 1 64.8
n_fns_r 148 84.9
n_fns_r_exported 18 64.2
n_fns_r_not_exported 130 88.2
n_fns_per_file_r 11 88.4
num_params_per_fn 2 10.4
loc_per_fn_r 8 20.0
loc_per_fn_r_exp 10 22.2
loc_per_fn_r_not_exp 8 22.6
rel_whitespace_R 18 74.6
rel_whitespace_vignettes 45 47.5
rel_whitespace_tests 21 80.2
doclines_per_fn_exp 68 78.4
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 141 84.4

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice and other checks (click to open)

3a. Continuous Integration Badges

R-CMD-check

GitHub Workflow Results

id name conclusion sha run_number date
2332879957 R-CMD-check success c316ae 17 2022-05-16
2332879944 test-coverage success c316ae 16 2022-05-16

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following note:

  1. checking dependencies in R code ... NOTE
    Namespace in Imports field not imported from: ‘reactable’
    All declared Imports should be used.

R CMD check generated the following check_fails:

  1. cyclocomp
  2. rcmdcheck_imports_not_imported_from

Test coverage with covr

ERROR: Test Coverage Failed

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
validate_params_type 84
aggregatefield 56

Static code analyses with lintr

lintr found the following 550 potential issues:

message number of times
Lines should not be more than 80 characters. 550


Package Versions

package version
pkgstats 0.0.4.30
pkgcheck 0.0.3.19


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@phuongquan
Copy link
Member Author

Hi I think there is a problem with pkgcheck.

As you can see from the badge on https://github.com/phuongquan/daiquiri, code coverage is 96%, so I don't know why the Test Coverage check has failed your side.

Also, as I mentioned in the submission, the item "These functions do not have examples: [daiquiri]" does not appear to be appropriate since daiquiri is not a function but is the package itself (following https://r-pkgs.org/man.html#man-packages). If you want me to add examples for a selection of functions in the package I can do so, but it is not clear to me if that is appropriate.

Thanks.

@mpadge
Copy link
Member

mpadge commented May 16, 2022

@phuongquan You're right about the functions not having examples. That was a bug which has now been fixed, so thanks! You can ignore that result. The coverage issue is, however, more problematic. I've tried to run covr::package_coverage() (as well as codecov() in a couple of different docker containers and always encouter this error:

simpleWarning in utils::install.packages(repos = NULL, lib = tmp_lib, pkg$path, : installation of package ‘/daiquiri’ had non-zero exit status

simpleWarning in file(con, "r"): cannot open file '/tmp/RtmpRuQwPt/R_LIBS14f26d8825f2/daiquiri/R/daiquiri': No such file or directory

Error in file(con, "r"): cannot open the connection

This is clearly what causes the fails on our system. You should be able to reproduce by cloning your repo in a clean docker container and trying for yourself. I nevertheless see that the GitHub action works, and that your coverage is indeed as you claim, so you may also choose to ignore that if you like. My guess is that it's related to {renv}. covr installs the package in a temporary library location, but our check system loads the package prior to that and so populates your renv directory. That then does not get copied across to the temporary libary location used by covr, and so installation fails. It works on GitHub actions because that uses a fresh clone of the repo which populates renv in the temporary library location.

@phuongquan
Copy link
Member Author

Hi I just wanted to check if there is anything you are waiting on me for?

@emilyriederer
Copy link

Hi @phuongquan - I apologize for the delay on my end. Thank you again for your submission. I will work on finding an editor and follow up shortly.

@emilyriederer
Copy link

emilyriederer commented May 30, 2022

@ropensci-review-bot assign @maurolepore as editor

@emilyriederer
Copy link

@ropensci-review-bot assign @maurolepore as editor

@ropensci-review-bot
Copy link
Collaborator

Assigned! @maurolepore is now the editor

@maurolepore
Copy link
Member

Hi @phuongquan, I'm pleased to be the handling editor of this submission.

I'll re-run checks now. I see the comments above suggest some checks may fail for reasons beyond your control. I'll explore those issues and walk through the editor's template by the end of this week.

@maurolepore
Copy link
Member

@ropensci-review-bot check package

@ropensci-review-bot
Copy link
Collaborator

Thanks, about to send the query.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for daiquiri (v0.7.1)

git hash: c316aeb1

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✖️ Function names are duplicated in other packages
  • ✔️ Package has continuous integration checks.
  • ✖️ Package coverage failed
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>=3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 465
internal daiquiri 120
internal graphics 6
imports ggplot2 63
imports stats 26
imports data.table 17
imports utils 9
imports scales 8
imports readr 5
imports cowplot 5
imports rmarkdown 1
imports reactable NA
suggests covr NA
suggests knitr NA
suggests testthat NA
suggests codemetar NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (83), c (29), format (23), sum (22), length (20), names (20), vapply (17), structure (16), is.na (15), for (12), inherits (11), with (11), character (10), is.nan (10), max (9), suppressWarnings (8), seq_along (7), class (6), labels (6), min (6), paste0 (6), which (6), as.character (5), call (5), logical (5), message (5), ncol (5), options (5), anyNA (4), comment (4), lapply (4), mean (4), quote (4), unique (4), as.integer (3), file (3), nrow (3), vector (3), by (2), ceiling (2), data.frame (2), file.path (2), formals (2), grep (2), nchar (2), strsplit (2), which.max (2), as.Date (1), as.list (1), as.numeric (1), col (1), do.call (1), dQuote (1), emptyenv (1), floor (1), gsub (1), is.symbol (1), log10 (1), missing (1), new.env (1), open (1), paste (1), q (1), rle (1), row.names (1), save (1), seq (1), sort (1), substring (1), sys.calls (1), system.file (1), typeof (1), unlist (1), warning (1), withCallingHandlers (1)

daiquiri

fieldtypes (15), datafield (14), fieldtype (11), get_datafield_max (4), get_datafield_min (4), is.fieldtype_timepoint (3), timepoint_as_aggregationunit (3), aggregate_data (2), aggtype_friendlyname (2), fieldtypes_to_cols (2), ft_allfields (2), ft_ignore (2), ft_timepoint (2), get_datafield_missing (2), identify_duplicaterows (2), is.fieldtype (2), log_initialise (2), plot_overview_heatmap_static (2), plot_overview_totals_static (2), aggregateallfields (1), aggregatefield (1), create_report (1), export_aggregated_data (1), fieldtypes_template (1), fieldtypes_to_string (1), ft_categorical (1), ft_datetime (1), ft_duplicates (1), ft_freetext (1), ft_numeric (1), ft_simple (1), ft_uniqueidentifier (1), get_aggfunctions (1), get_collector (1), get_dataclass (1), get_datafield_basetype (1), get_datafield_count (1), get_datafield_fieldtype_name (1), get_datafield_validation_warnings_n (1), get_datafield_vector (1), get_fieldtype_name (1), is.aggregatedata (1), is.aggregatefield (1), is.datafield (1), is.fieldtype_calculated (1), is.fieldtype_datetime (1), is.fieldtype_ignore (1), is.fieldtype_numeric (1), is.fieldtypes (1), is.sourcedata (1), log_close (1), log_function_end (1), log_function_start (1), log_message (1), plot_overview_combo_static (1), plot_timeseries_static (1), prepare_data (1), report_data (1), summarise_aggregated_data (1), summarise_source_data (1), yscale_breaks (1)

ggplot2

element_text (17), theme (7), ggplot (6), element_blank (5), labs (5), aes_string (4), a (3), element_rect (3), scale_fill_gradient (3), scale_x_date (3), facet_grid (2), geom_point (2), geom_line (1), scale_y_continuous (1), unit (1)

stats

df (10), dt (7), heatmap (6), median (3)

data.table

data.table (10), fifelse (4), as.data.table (2), copy (1)

utils

data (7), object.size (1), packageNa (1)

scales

label_date_short (5), breaks_pretty (3)

graphics

title (6)

cowplot

plot_grid (5)

readr

read_delim (2), col_character (1), cols (1), type_convert (1)

rmarkdown

render (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 8 files) and
  • 1 authors
  • 1 vignette
  • no internal data file
  • 9 imported packages
  • 18 exported functions (median 10 lines of code)
  • 130 non-exported functions in R (median 8 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 8 50.7
files_vignettes 1 68.4
files_tests 7 86.4
loc_R 1314 75.1
loc_vignettes 143 37.3
loc_tests 690 80.9
num_vignettes 1 64.8
n_fns_r 148 84.9
n_fns_r_exported 18 64.2
n_fns_r_not_exported 130 88.2
n_fns_per_file_r 11 88.4
num_params_per_fn 2 10.4
loc_per_fn_r 8 20.0
loc_per_fn_r_exp 10 22.2
loc_per_fn_r_not_exp 8 22.6
rel_whitespace_R 18 74.6
rel_whitespace_vignettes 45 47.5
rel_whitespace_tests 21 80.2
doclines_per_fn_exp 68 78.4
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 141 84.4

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-CMD-check

GitHub Workflow Results

id name conclusion sha run_number date
2332879957 R-CMD-check success c316ae 17 2022-05-16
2332879944 test-coverage success c316ae 16 2022-05-16

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following note:

  1. checking dependencies in R code ... NOTE
    Namespace in Imports field not imported from: ‘reactable’
    All declared Imports should be used.

R CMD check generated the following check_fails:

  1. cyclocomp
  2. rcmdcheck_imports_not_imported_from

Test coverage with covr

ERROR: Test Coverage Failed

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
validate_params_type 84
aggregatefield 56

Static code analyses with lintr

lintr found the following 550 potential issues:

message number of times
Lines should not be more than 80 characters. 550


4. Other Checks

Details of other checks (click to open)

✖️ The following 5 function names are duplicated in other packages:

    • aggregate_data from simITS
    • create_report from DataExplorer, prodigenr, reporter
    • log_close from logr
    • prepare_data from bbsBayes, bigstep, childsds, corporaexplorer, disaggregation, fHMM, ggasym, multigroup, multimorbidity, mutualinf, nmm, parsnip, PLNmodels, sglOptim, shapr, ssMousetrack
    • read_data from creditmodel, deaR, deforestable, diverse, ecocomDP, GeodesiCL, logib, metrix, prepdat, qtlpoly, RTextTools, sjlabelled, whippr


Package Versions

package version
pkgstats 0.0.4.55
pkgcheck 0.0.3.40


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@mpadge
Copy link
Member

mpadge commented May 31, 2022

Note: We're still working on a fix for the package coverage failure there - see package GitHub workflow for actual coverage in the meantime: 96%. Sorry for an inconvenience @phuongquan.

@maurolepore
Copy link
Member

maurolepore commented Jun 3, 2022

Dear @phuongquan,

Thanks again for this awesome submisison! I'm here to help you pass review smoothly. I looked for opportunities to make the package follow rOpenSci's guidelines as closely as possible, and to make the reviewer's job as easy as possible. After the section "Editor checks" you'll see my comments:

  • I listed some issues I need you to please address before I start searching for reviewers. You may refer to them using the labels ml01, ml02, and so on.

  • Also I listed some other issues for you to consider. Those ones have no label and won't block the process
    but if you address them now you're sure they won't be picked up by the reviewers.

Please let me know what questions you have.

Editor checks:

  • Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,

    • Is the case for the package well made?
    • Is the reference index page clear (grouped by topic if necessary)?
    • Are vignettes readable, sufficiently detailed and not just perfunctory?
  • Fit: The package meets criteria for fit and overlap.

  • Installation instructions: Are installation instructions clear enough for human users?

  • Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?

  • Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?

  • License: The package has a CRAN or OSI accepted license.

  • Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?


Editor comments

TODO

Documentation:

  • ml01. In README please add a basic example of the package's usage (e.g. dplyr's usage).

Installation instructions:

  • ml02. Please simplify the installation instructions. This code should install daiquiri v0.7.0 and all its dependencies (details about the repo argument):
# install.packages("devtools")
devtools::install_github("phuongquan/daiquiri@v0.7.0")

Or

# install.packages("pak")
pak::pkg_install("phuongquan/daiquiri@v0.7.0")
CONSIDER

Documentation:

Fit:

Installation instructions:

  • It's common to consider the code on GitHub as "in development" and the code on CRAN as "released". If embrace that release model then users installing from GitHub only need the default branch.

  • If you stick to using GitHub releases, then consider upgrading them from "pre-release" to "release" (https://github.com/phuongquan/daiquiri/releases).

  • The installation instructions are under the heading "Get started". Instead many popular packages use the heading "Installation" (e.g. dplyr, ggplot2), mainy because it's the template you get with
    usethis::use_readme_rmd().

Tests:

  • In my local environment devtools::test() results in 24 warnings with this message:
semi-transparency is not supported on this device: reported only once per page
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22)
#>  os       Ubuntu 20.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Guatemala
#>  date     2022-06-03
#>  pandoc   2.17.1.1 @ /usr/lib/rstudio/bin/quarto/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  brio          1.1.3   2021-11-30 [1] RSPM
#>  cachem        1.0.6   2021-08-19 [1] RSPM
#>  callr         3.7.0   2021-04-20 [1] RSPM
#>  cli           3.3.0   2022-04-25 [1] RSPM
#>  crayon        1.5.1   2022-03-26 [1] RSPM
#>  desc          1.4.1   2022-03-06 [1] RSPM
#>  devtools      2.4.3   2021-11-30 [1] RSPM
#>  digest        0.6.29  2021-12-01 [1] RSPM
#>  ellipsis      0.3.2   2021-04-29 [1] RSPM
#>  evaluate      0.15    2022-02-18 [1] RSPM
#>  fansi         1.0.3   2022-03-24 [1] RSPM
#>  fastmap       1.1.0   2021-01-25 [1] RSPM
#>  fs            1.5.2   2021-12-08 [1] RSPM
#>  glue          1.6.2   2022-02-24 [1] RSPM
#>  highr         0.9     2021-04-16 [1] RSPM
#>  htmltools     0.5.2   2021-08-25 [1] RSPM
#>  knitr         1.39    2022-04-26 [1] RSPM
#>  lifecycle     1.0.1   2021-09-24 [1] RSPM
#>  magrittr      2.0.3   2022-03-30 [1] RSPM
#>  memoise       2.0.1   2021-11-26 [1] RSPM
#>  pillar        1.7.0   2022-02-01 [1] RSPM
#>  pkgbuild      1.3.1   2021-12-20 [1] RSPM
#>  pkgconfig     2.0.3   2019-09-22 [1] RSPM
#>  pkgload       1.2.4   2021-11-30 [1] RSPM
#>  prettyunits   1.1.1   2020-01-24 [1] RSPM
#>  processx      3.5.3   2022-03-25 [1] RSPM
#>  ps            1.7.0   2022-04-23 [1] RSPM
#>  purrr         0.3.4   2020-04-17 [1] RSPM
#>  R.cache       0.15.0  2021-04-30 [1] RSPM
#>  R.methodsS3   1.8.1   2020-08-26 [1] RSPM
#>  R.oo          1.24.0  2020-08-26 [1] RSPM
#>  R.utils       2.11.0  2021-09-26 [1] RSPM
#>  R6            2.5.1   2021-08-19 [1] RSPM
#>  remotes       2.4.2   2021-11-30 [1] RSPM
#>  reprex        2.0.1   2021-08-05 [1] RSPM
#>  rlang         1.0.2   2022-03-04 [1] RSPM
#>  rmarkdown     2.14    2022-04-25 [1] RSPM
#>  rprojroot     2.0.3   2022-04-02 [1] RSPM
#>  rstudioapi    0.13    2020-11-12 [1] RSPM
#>  sessioninfo   1.2.2   2021-12-06 [1] RSPM
#>  stringi       1.7.6   2021-11-29 [1] RSPM
#>  stringr       1.4.0   2019-02-10 [1] RSPM
#>  styler        1.7.0   2022-03-13 [1] RSPM
#>  testthat      3.1.4   2022-04-26 [1] RSPM
#>  tibble        3.1.7   2022-05-03 [1] RSPM (R 4.2.0)
#>  usethis       2.1.6   2022-05-25 [1] RSPM (R 4.2.0)
#>  utf8          1.2.2   2021-07-24 [1] RSPM
#>  vctrs         0.4.1   2022-04-13 [1] RSPM
#>  withr         2.5.0   2022-03-03 [1] RSPM
#>  xfun          0.31    2022-05-10 [1] RSPM (R 4.2.0)
#>  yaml          2.3.5   2022-02-21 [1] RSPM
#> 
#>  [1] /home/mauro/R/x86_64-pc-linux-gnu-library/4.2
#>  [2] /opt/R/4.2.0/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2022-06-03 by the reprex package (v2.0.1)

License:

  • Consider running usethis::use_gpl3_license(). This is a good way to ensure the license and related infrastructure are setup correctly.

Other:

* Project '/cloud/project' loaded. [renv 0.12.2]
Warning message:
Project requested R version '4.1.2' but '4.2.0' is currently being used 
* The project may be out of sync -- use `renv::status()` for more details.
  • Consider running usethis::use_tidy_style(). This will style the code in a way that is more likely to be familiar to the reviewers.

  • Consider running spelling::spell_check_package().

  • Consider running goodpractice(). The most frequent suggestion I see is this:

"avoid long code lines, it's bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters"

# devtools::install_github("mangothecat/goodpractice") 
goodpractice::goodpractice()
  • Consider removing the package reactable from DESCRIPTION or importing one of its functions (maybe via usethis::use_import_from()). This should avoid this R CMD check note:
checking dependencies in R code ... NOTE
  Namespace in Imports field not imported from:reactableAll declared Imports should be used.
  • Consider replacing the current package-documentation with the one generated automatically with usethis::use_package_doc(). This might help avoid the issue with the bot.

  • Consider automating package-development tasks with usethis (https://usethis.r-lib.org/). For a quick tour see https://r-pkgs.org/whole-game.html.

@phuongquan
Copy link
Member Author

Hi @maurolepore,

Many thanks for your comments. I have pushed the following changes:

  • ml01: I have added a Usage section to the README
  • ml02: I encountered a problem with your suggestion of devtools::install_github("phuongquan/daiquiri@v0.7.0") in that it failed for me when I added the option build_vignettes = TRUE. I think it might be an issue with renv and I don't know how to fix it, or if other people would get the same error. Since I feel the vignette is very important, I have updated the installation instructions to continue to use the github release but also included a line to explicitly install the dependencies first. Please let me know if this is sufficient. Also, this should no longer be an issue once it's on CRAN
  • ml03: I have renamed all the test files to use dashes instead of underscores

Other items:

  • I have published a pkgdown site to github pages (https://phuongquan.github.io/daiquiri/index.html)
  • I have recreated the Licence using usethis::use_gpl3_license()
  • In the README, I have renamed the Get Started section to Installation, and moved the reference to the vignette to the Usage section
  • I have reformatted much of the code and reduced line lengths where possible

Lastly, The R CMD check note

Namespace in Imports field not imported from: ‘reactable’ All declared Imports should be used

is erroneous. The 'reactable' package is in fact used (and so needs to be in the DESCRIPTION file) but is probably being missed because it is called from an Rmd file in the inst/rmd folder.

Please let me know if you need anything more from me.

Kind regards,

Phuong

@maurolepore
Copy link
Member

maurolepore commented Jun 10, 2022

Thanks! We are almost there :-)

  • ml02: I thought about this a bit more and I now think we can't ask the reviewers to consider both the HEAD of the repo and the latest release. If you want them to review the latest release, then please make it the HEAD. Then the simpler installation instructions should be sufficient. After the review you are welcome to again point users to a GitHub release if you still think that's the best idea. In general I think it's good to see what other popular packages do. That will give you a good sense of what most users will expect. Users tend to like packages that are consistent with other packages because they can reuse what they already know, lowering the overall effort they need to use the new package.

  • ml04: Please deactivate renv from the version you submit for review. I thought a bit more about this issue and I now think it's imprtant to address it before review. Reviewers will have to deactivate renv to access packages they need for the review (e.g. devtools) and renv blocks that. Please be kind with the voluntary reviewers and do it yourself. After the review you are welcome to activate renv again.

  • ml05: Please ensure the examples for create_report() and report_data() run in 5s or less. This is required for CRAN and kind to the reviewers who may run R CMD check multiple times. Also it may distract them from other more interesting issues. R CMD check shows this:

checking examples (36.8s)
   Examples with CPU (user + system) or elapsed time > 5s
                   user system elapsed
   create_report 17.695  0.679  18.859
   report_data   13.871  0.165  14.737
  • Please address the NOTE about reactable. You can do it by moving "reactable" from Imports: to Suggests of by using any of its functions in any file in the R/ directory. If you use a function only for this purpose please write a comment explaining so -- else the reviewers might wonder what's going on. I want to avoid this relatively simple NOTE to distract reviewers from more interesting issues. Anyway you'll need to address it before you submit to CRAN.
checking dependencies in R code ... NOTE
  Namespace in Imports field not imported from:reactableAll declared Imports should be used.

@mpadge
Copy link
Member

mpadge commented Jun 11, 2022

@maurolepore We've updated the automated checking in the meantime, so it now flags any submissions which have renv in activated mode, and require that to be de-activated for reviews to proceed.

@maurolepore
Copy link
Member

@ropensci-review-bot check package

@ropensci-review-bot
Copy link
Collaborator

Thanks, about to send the query.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for daiquiri (v0.7.2)

git hash: 089bd85e

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✖️ Function names are duplicated in other packages
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 96.1%.
  • ✖️ Package has renv activated
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 402
internal daiquiri 102
internal graphics 6
internal mgcv 1
imports ggplot2 37
imports stats 19
imports data.table 14
imports scales 8
imports utils 4
imports readr 3
imports cowplot NA
imports rmarkdown NA
imports reactable NA
suggests covr NA
suggests knitr NA
suggests testthat NA
suggests codemetar NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (74), c (28), sum (21), format (20), names (19), vapply (16), for (12), is.na (12), length (12), inherits (11), with (11), message (9), structure (9), character (8), suppressWarnings (8), seq_along (7), class (6), labels (6), max (6), as.character (5), call (5), logical (5), min (5), paste0 (5), which (5), as.double (4), is.nan (4), mean (4), options (4), unique (4), by (3), lapply (3), nrow (3), vector (3), anyNA (2), as.integer (2), ceiling (2), data.frame (2), file.path (2), formals (2), grep (2), nchar (2), which.max (2), as.Date (1), as.list (1), as.numeric (1), col (1), do.call (1), dQuote (1), emptyenv (1), file (1), gsub (1), is.symbol (1), missing (1), ncol (1), new.env (1), open (1), paste (1), q (1), rle (1), row.names (1), seq (1), sort (1), strsplit (1), substring (1), sys.calls (1), system.file (1), typeof (1), unlist (1), warning (1)

daiquiri

fieldtypes (12), datafield (11), fieldtype (11), get_datafield_max (4), get_datafield_min (4), is.fieldtype_timepoint (3), timepoint_as_aggregationunit (3), aggregate_data (2), ft_allfields (2), ft_ignore (2), ft_timepoint (2), get_datafield_missing (2), identify_duplicaterows (2), is.fieldtype (2), aggregateallfields (1), aggregatefield (1), aggtype_friendlyname (1), create_report (1), export_aggregated_data (1), fieldtypes_template (1), fieldtypes_to_string (1), ft_categorical (1), ft_datetime (1), ft_duplicates (1), ft_freetext (1), ft_numeric (1), ft_simple (1), ft_uniqueidentifier (1), get_aggfunctions (1), get_collector (1), get_dataclass (1), get_datafield_basetype (1), get_datafield_count (1), get_datafield_fieldtype_name (1), get_datafield_validation_warnings_n (1), get_datafield_vector (1), get_fieldtype_name (1), is.aggregatedata (1), is.aggregatefield (1), is.datafield (1), is.fieldtype_calculated (1), is.fieldtype_datetime (1), is.fieldtype_ignore (1), is.fieldtype_numeric (1), is.fieldtypes (1), is.sourcedata (1), log_initialise (1), plot_overview_heatmap_static (1), plot_overview_totals_static (1), prepare_data (1), report_data (1), summarise_aggregated_data (1), summarise_source_data (1), yscale_breaks (1)

ggplot2

element_text (10), element_blank (5), ggplot (5), labs (4), aes_string (3), facet_grid (2), geom_point (2), theme (2), element_rect (1), geom_line (1), scale_y_continuous (1), unit (1)

stats

df (7), dt (7), median (3), heatmap (2)

data.table

data.table (9), as.data.table (2), fifelse (2), copy (1)

scales

label_date_short (5), breaks_pretty (3)

graphics

title (5), axis (1)

utils

data (3), object.size (1)

readr

cols (3)

mgcv

s (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 8 files) and
  • 1 authors
  • 1 vignette
  • no internal data file
  • 9 imported packages
  • 18 exported functions (median 10 lines of code)
  • 130 non-exported functions in R (median 9 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 8 50.7
files_vignettes 1 68.4
files_tests 7 86.4
loc_R 1914 83.4
loc_vignettes 143 37.3
loc_tests 878 85.1
num_vignettes 1 64.8
n_fns_r 148 84.9
n_fns_r_exported 18 64.2
n_fns_r_not_exported 130 88.2
n_fns_per_file_r 11 88.4
num_params_per_fn 2 10.4
loc_per_fn_r 10 27.7
loc_per_fn_r_exp 10 22.2
loc_per_fn_r_not_exp 9 27.1
rel_whitespace_R 12 75.0
rel_whitespace_vignettes 45 47.5
rel_whitespace_tests 17 79.9
doclines_per_fn_exp 68 78.4
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 141 84.4

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-CMD-check

GitHub Workflow Results

id name conclusion sha run_number date
2462421281 pages build and deployment success 089bd8 3 2022-06-08
2462421408 R-CMD-check success 089bd8 19 2022-06-08
2462421399 test-coverage success 089bd8 18 2022-06-08

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following note:

  1. checking dependencies in R code ... NOTE
    Namespace in Imports field not imported from: ‘reactable’
    All declared Imports should be used.

R CMD check generated the following check_fails:

  1. cyclocomp
  2. rcmdcheck_imports_not_imported_from

Test coverage with covr

Package coverage: 96.12

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
validate_params_type 84
aggregatefield 56

Static code analyses with lintr

lintr found the following 277 potential issues:

message number of times
Lines should not be more than 80 characters. 277


4. Other Checks

Details of other checks (click to open)

✖️ The following 5 function names are duplicated in other packages:

    • aggregate_data from simITS
    • create_report from DataExplorer, prodigenr, reporter
    • log_close from logr
    • prepare_data from bbsBayes, bigstep, childsds, corporaexplorer, disaggregation, fHMM, ggasym, multigroup, multimorbidity, mutualinf, nmm, parsnip, PLNmodels, sglOptim, shapr, ssMousetrack
    • read_data from creditmodel, deaR, deforestable, diverse, ecocomDP, GeodesiCL, logib, metrix, prepdat, qtlpoly, RTextTools, sjlabelled, whippr


Package Versions

package version
pkgstats 0.0.4.75
pkgcheck 0.0.3.60


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@phuongquan
Copy link
Member Author

Hi @maurolepore,

I have pushed the following changes:

  • ml01: I have added an example report to the package website and added a link to it in the Usage section of the README
  • ml02: I want the reviewers to consider only the HEAD of the repo so have removed the reference to the latest 'release' from the installation instructions
  • ml04: I have deactivated renv
  • ml05: I have wrapped the long running examples in /donttest and they no longer appear in R CMD check. (I can't make them run any quicker as it is the rmarkdown rendering that is taking the time)
  • ml06: I have added a dummy function to stop the NOTE about reactable

Please let me know if you need anything more.

Thanks,
Phuong

@maurolepore
Copy link
Member

Thanks! I'll run checks one last time and start searching for reviewers shortly.

Could you please name three people you think might be good reviewers and explain why?

(I would appoint only one but having more helps understand what kinds of skill you think are useful.)

Here are some more non-blocking comments.

  • ml01. Maybe you could minimize your example to somethis like this:
library(daiquiri)

path <- daiquiri_example("raw_data.csv")
raw_data <- readr::read_csv(path)
raw_data

if (interactive()) {
  create_report(raw_data)
}

Here is a version comments for you

library(daiquiri)

# Maybe write this helper (inspired by `readr::readr_example`)
path <- daiquiri_example("raw_data.csv")
# Maybe create a basic .csv that reads well directly with read_csv()?
# I think read_data() might be helpful but not the main goal of the package
raw_data <- readr::read_csv(path)

# Do show what the data looks like. Users may want to know if the data they
# have is a good fit for this package. Ensure to show just a bit. If it doesn't print
# as a tibble, then use `head()` or `str()`.
raw_data

# It won't run when you render README. It  will run if copy-pasting this code
if (interactive()) {
  # If you add this default: `fieldtypes = fieldtypes()`
  create_report(raw_data)
}
  • ml05: Would it run faster if you make the example raw_data smaller, like 1/4 or 1/10 of the size it has now?

@maurolepore
Copy link
Member

Dear @phuongquan,

Both reviews are now ready. Looking forward to your resp.onse

@phuongquan
Copy link
Member Author

Hi @maurolepore,

Sorry for the delayed response. I've just come back from holiday today and will take a look at all the comments asap.

Thanks.

@phuongquan
Copy link
Member Author

@ropensci-review-bot check package

@ropensci-review-bot
Copy link
Collaborator

Thanks, about to send the query.

@ropensci-review-bot
Copy link
Collaborator

🚀

The following problem was found in your submission template:

  • HTML variable [due-dates-list] is missing
    Editors: Please ensure these problems with the submission template are rectified. Package checks have been started regardless.

👋

@ropensci-review-bot
Copy link
Collaborator

Checks for daiquiri (v0.7.3.9000)

git hash: 90d8b074

  • ✔️ Package name is available
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✔️ Package coverage is 95.1%.
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.
  • 👀 Function names are duplicated in other packages

(Checks marked with 👀 may be optionally addressed.)

Package License: GPL (>= 3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 394
internal daiquiri 100
internal graphics 6
imports ggplot2 33
imports stats 25
imports data.table 14
imports scales 8
imports readr 3
imports utils 3
imports reactable 1
imports cowplot NA
imports rmarkdown NA
suggests covr NA
suggests knitr NA
suggests testthat NA
suggests codemetar NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (69), c (24), sum (21), length (19), names (19), format (18), vapply (16), is.na (14), for (12), is.nan (10), with (10), character (8), inherits (8), structure (8), suppressWarnings (8), seq_along (7), as.character (6), class (6), labels (6), max (6), call (5), logical (5), message (5), paste0 (5), which (5), mean (4), min (4), options (4), unique (4), by (3), lapply (3), nrow (3), vector (3), anyNA (2), as.double (2), as.integer (2), ceiling (2), file.path (2), formals (2), grep (2), nchar (2), paste (2), which.max (2), as.Date (1), as.list (1), as.numeric (1), col (1), data.frame (1), do.call (1), dQuote (1), emptyenv (1), file (1), gsub (1), is.symbol (1), missing (1), ncol (1), new.env (1), open (1), q (1), rle (1), row.names (1), seq (1), sort (1), substring (1), summary (1), sys.calls (1), system.file (1), typeof (1), warning (1)

daiquiri

field_types (13), field_type (11), data_field (7), data_field_max (3), data_field_min (3), is_field_type (3), timepoint_as_timepoint_group (3), aggregate_data (2), create_timepoint_groups (2), data_field_missing (2), data_field_vector (2), ft_allfields (2), ft_timepoint (2), identify_duplicate_rows (2), initialise_log (2), is_ft_timepoint (2), agg_fun_friendly_name (1), aggregate_combined_fields (1), aggregate_field (1), close_log (1), create_report (1), data_field_basetype (1), data_field_count (1), data_field_type (1), data_field_validation_warnings_n (1), dummy_reactable_call (1), export_aggregated_data (1), field_type_aggregation_functions (1), field_type_collector (1), field_type_data_class (1), field_type_type (1), field_types_to_string (1), ft_categorical (1), ft_datetime (1), ft_duplicates (1), ft_freetext (1), ft_ignore (1), ft_numeric (1), ft_simple (1), ft_uniqueidentifier (1), is_aggregated_data (1), is_aggregated_field (1), is_data_field (1), is_field_types (1), is_ft_calculated (1), is_ft_datetime (1), is_ft_ignore (1), is_ft_numeric (1), plot_overview_heatmap_static (1), plot_overview_totals_static (1), prepare_data (1), report_data (1), summarise_aggregated_data (1), summarise_source_data (1), yscale_breaks (1)

ggplot2

element_text (10), element_blank (5), labs (4), ggplot (3), aes_string (2), facet_grid (2), geom_point (2), theme (2), element_rect (1), geom_line (1), unit (1)

stats

dt (15), df (5), median (3), heatmap (2)

data.table

data.table (9), as.data.table (2), fifelse (2), copy (1)

scales

label_date_short (5), breaks_pretty (3)

graphics

title (5), axis (1)

readr

cols (3)

utils

data (2), object.size (1)

reactable

colDef (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 8 files) and
  • 1 authors
  • 1 vignette
  • no internal data file
  • 9 imported packages
  • 18 exported functions (median 11 lines of code)
  • 134 non-exported functions in R (median 10 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 8 50.7
files_vignettes 1 68.4
files_tests 7 86.4
loc_R 2186 85.8
loc_vignettes 145 37.9
loc_tests 1169 89.1
num_vignettes 1 64.8
n_fns_r 152 85.4
n_fns_r_exported 18 64.2
n_fns_r_not_exported 134 88.6
n_fns_per_file_r 11 88.6
num_params_per_fn 2 30.8
loc_per_fn_r 10 31.6
loc_per_fn_r_exp 11 25.1
loc_per_fn_r_not_exp 10 31.3
rel_whitespace_R 11 75.9
rel_whitespace_vignettes 44 47.5
rel_whitespace_tests 13 80.0
doclines_per_fn_exp 76 82.9
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 143 84.6

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-CMD-check

GitHub Workflow Results

id name conclusion sha run_number date
3256586257 pages build and deployment success 90d8b0 26 2022-10-15
3256586295 R-CMD-check success 90d8b0 42 2022-10-15
3256586291 test-coverage success 90d8b0 41 2022-10-15

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following check_fail:

  1. cyclocomp

Test coverage with covr

Package coverage: 95.13

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
validate_params_type 84
aggregate_field 56

Static code analyses with lintr

lintr found the following 210 potential issues:

message number of times
Avoid library() and require() calls in packages 1
Lines should not be more than 80 characters. 209


4. Other Checks

Details of other checks (click to open)

✖️ The following 4 function names are duplicated in other packages:

    • aggregate_data from simITS
    • create_report from DataExplorer, prodigenr, reporter
    • prepare_data from bbsBayes, bigstep, childsds, corporaexplorer, disaggregation, fHMM, ggasym, multigroup, multimorbidity, mutualinf, nmm, parsnip, PLNmodels, sglOptim, shapr, ssMousetrack
    • read_data from creditmodel, deaR, deforestable, diverse, ecocomDP, GeodesiCL, logib, metrix, prepdat, qtlpoly, RTextTools, sjlabelled, whippr


Package Versions

package version
pkgstats 0.1.1.54
pkgcheck 0.1.0.24


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

@phuongquan
Copy link
Member Author

Dear @mbcann01 and @maurolepore,

Many thanks for your time and effort and patience in reviewing this package, I really appreciate it. Please see my responses to your comments below, with links to the github commit where relevant. (And apologies for some loss of formatting)

Response to @mbcann01

The example in the "why should I use it" section of README. The figures are nice, but I'm not 100% sure what the connections I'm supposed to be making. Perhaps showing the Y-axis would help me to interpret the figures. Some users may even appreciate you spelling it out for them. Something like, "in the first figure... As you can see, that is reflected in the second figure as...".
Response: I have updated the text describing the figures and increased the size of the x-axis labels to try to make it clearer what is going on. (ropensci/daiquiri@8b2c981)

Example data. This is certainly optional, but I've started including example data in my packages. In the README, it looks as though you are including an example csv file in inst/extdata. To make testing/following along even easier for your users, you may want to consider converting this to exported data that users can load with data(daiquiri_data) or something like that. Here are some instructions if you are interested: https://r-pkgs.org/data.html#sec-data-data
Response: I have deliberately included a csv file rather than an R-specific format in order to be able to demonstrate all the functionality of the package, which includes reading in data from a csv.

I didn't see instructions for contributing anywhere on the README. I apologize if I just overlooked them.
Response: The contributing instructions are in the CONTRIBUTING file in the .github folder.

Response to @maurolepore

o [ ] (r02-01): In the vignette consider removing the literal reference to
"NOTE". The text itself is noteworthy and the announcement "NOTE" can be
dropped and yet be clear. Similarly, the headings "Example:" seems redundant
when it's followed by the text "For example, ...". You may drop the heading as
the sections are short and read well without subdivision.
Response: I have considered each appearance of "NOTE" in the vignette, and have removed/consolidated those instances where I feel they do not add anything, while I have left them in where I judge they serve to re-focus the attention of the reader (ropensci/daiquiri@7fd3143). I have left in the Example headings as I generally find it useful to highlight the locations of examples so that readers can easily find them when skimming the text.

o [ ] (r2-02): Please ensure to document the return value of every function,
even it is NULL. CRAN enforces it. For example fix this in log_close().
Response: I have updated functions that previously returned NULL to return invisibly instead, and documented (ropensci/daiquiri@b66ca9a). I have updated close_log() to now return a path to the closed log file, and documented (ropensci/daiquiri@f05f70b).

o [ ] (r2-03): Ensure to format code and links as such.
With roxygen2 you do this with backticks and square brackets, respectively.
For example, the documentation of fieldtypes_template() mentions
"fieldtype()" -- which is not formatted as code nor linked to its helpfile.
Response: I have checked all documentation and updated code and link formats where found (ropensci/daiquiri@ac191e1)

o [ ] (r2-04): The package documentation typically lives in R/packagename.R
not R/main.R. See ?usethis::use_package_doc() and
?usethis::use_import_from().
Response: I have moved the package documentation (ropensci/daiquiri@7f52bca, ropensci/daiquiri@0526c90)

o [ ] (r2-05): Consider using more of the convenient features of roxygen2,
e.g. use [some_link] instead of \link{some_link} and some_code instead of
\code{some_code}.
Response: I have enabled markdown and updated code and link formats where found (ropensci/daiquiri@ac191e1)

o (r2-06): Ensure the use of \donttest{}, \dontrun{}, and \dontshow() as
per CRAN policies. They are common sources of rejection.
Response: I have spent quite some time reading about this and am still not 100% clear on what correct practice is. I have opted to use \donttest{} to wrap (unavoidably) long-running code as it has been suggested that \dontrun{} may cause problems with initial CRAN submission. (r-lib/devtools#2216)

o [ ] (r2-07): Whenever possible, print the result of the examples. For
example the result of fileldtypes() is assigned to fts but not printed.
Printed results help the reader see the output without the need of running the
code themselves. Also please show one example of a ft_*() function (e.g.
fs_simple()) so the reader can see its structure.
Response: I have done this (ropensci/daiquiri@c3d5fb7)

o [ ] (r2-08): Performance might be an issue. I see tests and examples that
take a long time to run. Consider using smaller data or removing performance
bottlenecks.
Response: It is the knitting of the rmarkdown document (which I have no control over) that takes up all the time for the main and reports tests so I do not see how I can make this any quicker unless I don't test the creation of the markdown document, which I cannot do as it is the main purpose of the package. User feedback I have received so far has indicated that performance is not a problem. Please also see my response to point (r2-09).

o [ ] (r2-09): I fail to run test-sourcedata.R -- it takes too long or freezes.
After removing test-sourcedata.R I successfully run all tests but they still
take too long. The slower your tests, the less frequently you'll run them, and
the more your code becomes prone to errors. For companison run the tests
of other popular packages on CRAN.
ℹ Testing daiquiri
✔ | F W S OK | Context
✔ | 133 | aggregatedata [13.4s]
✔ | 15 | fieldtypes [1.0s]
✔ | 24 | main [240.8s]
✔ | 7 | reports [110.8s]
✔ | 45 | utilities [1.4s]
Response: I don't understand why the tests take so long when you run them, as they run in a fraction of the time for me (on a Windows laptop, Intel i7 processor, 16GB RAM), and sourcedata is not nearly the longest running one:
ℹ Testing daiquiri
✔ | F W S OK | Context
✔ | 133 | aggregatedata [1.4s]
✔ | 15 | fieldtypes
✔ | 24 | main [39.2s]
✔ | 7 | reports [19.4s]
✔ | 38 | sourcedata [5.1s]
✔ | 45 | utilities [0.2s]
Even on an underpowered Windows laptop (Intel i5 processor, 8GB RAM) the tests only took the following amounts of time:
ℹ Testing daiquiri
✔ | F W S OK | Context
✔ | 133 | aggregatedata [3.5s]
✔ | 15 | fieldtypes [0.2s]
✔ | 24 | main [64.6s]
✔ | 7 | reports [24.8s]
✔ | 38 | sourcedata [8.6s]
✔ | 45 | utilities [0.4s]
Unfortunately I don't have access to any other operating systems to be able to test them.
And as mentioned before, it is the knitting of the rmarkdown document that takes up all the time for the main and reports tests so I do not see how I can make this any quicker unless I don't test the creation of the markdown document, which I cannot do as it is the main purpose of the package. Please can you advise on anything further you expect me to do.

o [ ] (r2-10) Adapt the tests or error-class so that each test is sensitive to
a specific error. I see a number of cases where different errors are tested
against the same class and no other property of the error.
Response: I have split tests that tested multiple behaviours into smaller units (ropensci/daiquiri@e3b1f82)

o [ ] (r2-11): In general, try to use one test file for each function you test
(e.g. usually each exported function). This plays well with development
functions in the usethis package like use_test(), use_test(), and
rename_files(). Because testthat automatically prints the file name in the
test report, this approach allows you to not type the name of the function
inside the call to test_that() -- making tests easier to write and maintain.

 # good
 # tests/testthat/test-f.R
 test_that("works as expected", { code })

 # bad
 # tests/testthat/test-main.R

 test_that("f() works as expected", { code })

Response: I have followed the file structure recommended in https://style.tidyverse.org/tests.html, i.e.

"The organisation of test files should match the organisation of R/ files: if a function lives in R/foofy.R, then its tests should live in tests/testthat/test-foofy.R."

I have therefore chosen not to change this.

o [ ] (r2-12): Make your tests granular enough so that a specific failure would
point to a specific expectation, and thus a specific piece of code. This will
make debugging easier. In validate_columnnames() I see a single call to
test_that() has multiple expectations, some expecting good behavior and
others expecting errors; they will be more informative if those different
expectations live in different tests.
Response: I have split tests that tested multiple behaviours into smaller units (ropensci/daiquiri@e3b1f82)

o [ ] (r2-13): Please resolve the items marked with "X" by the command
@ropensci-review-bot check package (see
#535 (comment)).
The section Editor-in-chief instructions state "Processing may not proceed
until the items marked with ✖️ have been resolved."
Response: I have re-run the bot and there are no more ✖️

o [ ] (r2-14): Please note all results by the command @ropensci-review-bot check package and look for opportunities to improve the package. For example,
the goodpractice section reports that many lines are longer than the
recommended width of 80 characters. You may reflow lines relatively easy (and
semi-automatically) and improve readability by a lot.
Response: I really find the 80 character limit to be near impossible to achieve and applying styler did not reduce the number of these. I have manually re-flowed cases where I could see the benefit, but didn't do those which are only slightly over 80 characters. (ropensci/daiquiri@8c8b29d)
Re the "Function names are duplicated in other packages" check, I am considering renaming create_report() to daiquiri_report(), which violates the verb_noun format but increases uniqueness. I also wonder whether read_data() would benefit from a suffix (e.g. read_data_raw(), read_data_as_char(), read_data_unconverted(), read_data_plain()) but cannot decide if any of these are actually any better. For the other functions, I cannot see how duplication can be avoided unless I prefix every exported function with the package name (which I find ugly and distracting). And even if it were possible to find globally unique yet concise and descriptive function names, there's nothing to stop someone else creating a function with that same name at a later date.

o [ ] (r2-15): See rOpenSci's guidelines about code
style.
Using a style that is widely used in R and particularly in rOpenSci would be
more friendly for contributors.
For more information on how to style your code, name functions, and R scripts
inside the R/ folder, we recommend reading the code chapter in The R Packages
book. We recommend the styler package for automating part of the code styling.
We suggest reading the Tidyverse style guide.
For example, please use snake case consistently everywhere -- including the
names of functions, files, arguments, and variables:
INCONSISTENT CONSISTENT
override_columnnames -> override_column_names
showprogress -> show_progress
save_directory -> save_directory
Response: I have run styler on the whole package (ropensci/daiquiri@4aeb7a9). I have also implemented snake case across the package (ropensci/daiquiri@4c60f78, ropensci/daiquiri@628b19f, ropensci/daiquiri@f26164a, ropensci/daiquiri@1501580, ropensci/daiquiri@4d653ef, ropensci/daiquiri@244a156). I have also reorganised the functions within each file to put the exported ones first and added separators between functions (ropensci/daiquiri@cb4c703).

o (r2-17): See rOpenSci's guidelines about naming your
package. In a
Google seardch the name "daiquiri" seems be too common to help discover the
package.
Response: I have thought hard about the name and have selected it for phonetic reasons (daiquiri as shorthand for Data Quality Reporting) and for ease of remembering it. When Googling "daiquiri package" it appears on the first page.

o [ ] (r2-18): Ensure exported functions use the format verb_noun()
consistently. For example I see this format in create_report() but not in
log_close().
Response: I have renamed log_close() to close_log(), log_initialise() to initialise_log(), and fieldtypes_template() to print_field_types_template() (ropensci/daiquiri@9ee0373, ropensci/daiquiri@52d302b)

o [ ] (r2-19): Use related prefixes consistently. For example fieldtype() and
fieldtype_template() are consistent, but ft_timepoint()is not. (But note thatfield_type*() is better because it's also consistent with other names
using snake case.)
Response: I have replaced fieldtype()/fieldtypes() with field_type()/field_types() (ropensci/daiquiri@300ec04). However, while I can see that field_type_timepoint() etc is more consistent than ft_timepoint() etc, I find the ft_ prefix much more concise and easier to read. e.g.

fts <- fieldtypes(
	PatientID = ft_uniqueidentifier(),
	TestID = ft_ignore(),
	TestDate = ft_timepoint(),
	TestName = ft_categorical(aggregate_by_each_category = FALSE),
	TestResult = ft_numeric(),
	ResultDate = ft_datetime(),
	ResultComment = ft_freetext(),
	Location = ft_categorical())

versus:

fts <- fieldtypes(
	PatientID = field_type_uniqueidentifier(),
	TestID = field_type_ignore(),
	TestDate = field_type_timepoint(),
	TestName = field_type_categorical(aggregate_by_each_category = FALSE),
	TestResult = field_type_numeric(),
	ResultDate = field_type_datetime(),
	ResultComment = field_type_freetext(),
	Location = field_type_categorical())

I therefore have a preference to not change this unless it is considered absolutely necessary.

• [ ] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.
Response: I have added both reviewers to the DESCRIPTION file. (ropensci/daiquiri@d7511a1)

• (r2-20) Recently I have seen a push towards quarto. Consider the pros and cons
of quarto versus rmarkdown.
Response: I have seen and read a little bit about quarto and my understanding is that currently it doesn't add very much to existing rmarkdown functionality but rather is more valuable for use with other languages such as python. The main drawback is that it requires a separate installation which can be problematic for users in tightly-controlled environments. I will bear it in mind for the future but at present I do not think that it is beneficial to change to using quarto.

• [ ] (r2-21): Rename the example dataset more specifically to preserve the
option of adding more datasets in the future. For example, if later you have two
datasets, the names example_dataset.csv and example_dataset2.csv are less
descriptive than prescriptions.csv and surgeries.csv.
Response: I have renamed it to example_prescriptions.csv (ropensci/daiquiri@0858915)

• [ ] (r2-22): Remove unused code. For example, I see commented-out code in
tests/testthat/test-utilities.R and R/changepoints.R. Should you need that code
later, you can find it with git log -S "some code". And if you plan to work
with that code soon, it may be best to move it to a GitHub issue and follow up
in a GitHub pull requests when you are ready.
Response: Commented-out code has been removed (ropensci/daiquiri@fc55cd9, ropensci/daiquiri@db31870, ropensci/daiquiri@d986e84, ropensci/daiquiri@e4f5b4e, ropensci/daiquiri@579fd90)

• [ ] (r2-23): Consider removing the function read_data(). It seems to
be a thin wrapper around readr::read_delim(). If you keep it please do the
following:

  1. Explain why users should not read the data with readr::read_delim() and
    friends directly.
  2. Consider preserving the name of the arguments to readr::read_delim()
    and their original defaults to avoid confusing users of that popuolar
    function.
  3. Allow users to pass arguments to readr::read_delim() via ....

Response: I created the read_data() function intentionally after user feedback showed that the default parameters used by read_delim() etc led to unintended behaviour (i.e. they interfere with daiquiri's ability to check for non-conformant values), so therefore I feel it is important to keep it.

  1. I have reworded the documentation title and description to try to make the "why" clearer (ropensci/daiquiri@86536cc)
  2. I have prioritised naming consistency within the daiquiri package where there is a conflict, and otherwise have left the read_delim parameter names unchanged. The defaults have been deliberately changed to work well with daiquiri
  3. While this initially looked like it would be a straightforward addition, it has turned out to be far more complicated than expected.
    While I can see the benefit of allowing users to pass in additional parameters via ..., I am unsure how to do this without opening the possibility of conflicts between the parameters I have specified deliberately and which should not be changed (otherwise it defeats the purpose of using the function), and the user potentially trying to (re-)set these parameters themselves. For example, I have preset col_types so that all columns are read in as character values, which is necessary for daiquiri to detect non-conformant values. If I leave this parameter out of the function definition (current behaviour), then the user might try to specify it via ... . If I make the parameter available in the function definition, the user may try to update it. Either way, I would need to ignore what they pass in otherwise it defeats the purpose of using the function, but if I allow them to pass it in but then ignore it, that will be confusing to the user. I have come up with the following options:

Option 1: add custom parameter checking to ensure users don't pass in certain parameters (e.g. by raising an error if they try to pass in col_types, or by ignoring the parameter and returning a warning). If I return warnings I also have to remove the undesired params from ... and then (presumably) use do.call to call read_delim() which adds complexity.

Option 2: don't include ... and state in the documentation that the read_data() function is intended as a helper for beginner users, and if the user wants to use more advanced features of read_delim() then they should use read_delim() directly.

Option 3: include ... but state in the documentation that any additional parameters are set at the user's own risk and may lead to unintended behaviour.

I have a preference for options 2 or 3 since I do not have the resources to test the potential effects of setting other read_delim() parameters such as locale(). Please let me know how you would like me to proceed.

• (r2-24): Consider printing objects of class "fieldtypes" as a tibble. That
would be easy for the reader to understand and for you to maintain (instead of
creating a new print method). For example, this is what this idea could look
like:

 fieldtypes(
 	PrescriptionID = ft_uniqueidentifier(), 
 	PrescriptionDate = ft_timepoint()
 )

 #> # A tibble: 2 × 3
 #>   column           field_type        options      
 #>   <chr>            <chr>             <chr>        
 #> 1 PrescriptionID   unique identifier <NA>         
 #> 2 PrescriptionDate time point        includes time

Response: I experimented with converting fieldtypes to data frames for printing, which worked well when displaying in the console, but worked poorly when writing to the log file. In order not to hold up the review process I have added this as a github issue.

• [ ] (r2-25): For a first release consider exporting the minimum code possible.
For example: a) fieldtypes(), b) all the ft_*() functions, c)
prepare_data(), d) aggregate_data(), and e) report_data(). Here are some
benefits:
o The surface for bugs is smaller.
o Less documentation.
o Fewer features to test, so you can test them more intensively.
o More likely the package will pass the first, human inspected, CRAN
submission. The first CRAN submission is the hardest because it's the only
submission that gets inspected by a human. The more code you have, the more
issues CRAN might find. All other submissions skip human inspection -- unless
the automated checks detect some problem.
o Learn from the first CRAN submission before you work on a bigger submission.
You will learn from the rigor that CRAN enforces, and from the feedback you'll
gather from the first users of the package. It's always easier to add features
than it is to remove them ( see Package evolution - changing stuff in your
package).
Response: I will consider excluding the "functions for advanced usage" when it is time to submit to CRAN, but would like to include all current functionality for this review.

• [ ] (r2-26): In report_data() allow users to pass arguments to
rmarkdown::render() via the ... argument.
Response: This is similar to point (r2-23, item 3) above, except that only options 1 and 3 are possible. Please let me know how you would like me to proceed.

• [ ] (r2-27): In prepare_data(), aggregate_data(), and report_data(),
print messages with dedicated functions like message() or warning() -- not
with print(), cat() or similar. This allows users to handle messages with
condition-handling features like tryCatch(), suppressMessages(), and
friends. Also, the primary goal of these functions is to compute something, so
printing a message is only a side effect. Thus by default messages should likely
be quiet. See Side effect
soup.
Response: The only "messages" currently returned from prepare_data(), aggregate_data(), and report_data() are progress updates, and can be easily switched off using the show_progress parameter. I understand that message() is preferred to print() or cat() for diagnostic messages, but I feel that printing progress serves a different purpose to informational messages and so should be kept distinct from them. In particular, since messages print in red, if the progress updates print in red it would then make it much harder to see any real conditions. Of note, rmarkdown prints its progress updates using cat().
Re defaulting to quiet, I initially set the show_progress parameter to default to FALSE, but deliberately changed it to TRUE after user feedback. The printing of the progress tends to be reassuring, particularly when processing large datasets. It is also the default behaviour of rmarkdown::render() to show the progress.

• [ ] (r2-28): Consider handling missing arguments using R's default. The custom
code for this seems to add complexity and maintenance code with relatively
little value.

# Good enough?
f <- function(required) {
  required
}

f()
#> Error in f(): argument "required" is missing, with no default

# Overly complicated?
g <- function(required) {
  validate_params_required(match.call())
  required
}

g()
#> Error in validate_params_required(match.call()): could not find function "validate_params_required"

Response: The purpose of adding the validate_params_required() function is to check the input parameters right at the start (as a guard clause), rather than only at the point they are accessed. I find this useful in potentially-long-running functions, so that the user is informed of (and can correct) any (user-side) issues immediately, rather than only being told there is a problem half way through processing the data. This becomes more and more valuable the larger the datasets being processed.

• (r2-29): I see a heavy use of custom classes. In R this is relatively
uncommon. Ensure their value justify the complexity they add. Consider using
the vctrs package (https://vctrs.r-lib.org/articles/s3-vector.html).
Response: I currently use custom classes for the purpose of parameter validation (i.e. checking the correct types of objects are being passed into the functions), though plan to refactor the fieldtypes conditional logic into OO in the future. I have removed one instance where a custom class was unnecessary. I have also renamed the classes to be prefixed with the package name. (ropensci/daiquiri@2d2d988)

• (r2-30): I see deep nested logic (see Jenny Bryan's Code smells and
feels). In the
goodpractice section of the rOpenSci's bot checks I see that the following
functions have cyclocomplexity >= 15:
function cyclocomplexity

validate_params_type 84

aggregatefield 56
Response: Thank you for pointing me to Jenny Bryan's video, I have watched it and can see some things I can do to try to reduce the complexity of these functions. This is something I don't currently have the time to commit to but will certainly aim to do in the future.

• (r2-31): In report_data() I see a call to package = utils::packageName().
Please ensure to test this works as you expect when the user calls the function
from anywhere -- particularly from outside the source of the daiquiry package.
Response: I have tested this outside the source of the package and can confirm it works as expected

• (r2-31): Ensure the code complies with the general principles in the Mozilla
reviewing guide.
Response: I have done my best!

@maurolepore
Copy link
Member

Dear @phuongquan, many thanks to you for your hard work!

@mbcann01, the ball is now in our side of the court. Please check if the changes were made to your satisfaction and indicate if you request more changes or recommend approving this package. If you approve please use this template.

@phuongquan, I'll do my best to respond within one week.

@maurolepore
Copy link
Member

maurolepore commented Oct 23, 2022

Thanks @phuongquan,

  • r2-08: As long as you've considered this issue and understand, I'm happy.

You may place slow tests in tests/testthta/slow/ (or similar) so that your
frequent calls to devtools::test() ignore them. And you may run them
occasionally on demand with testthat::test_dir(testthat::test_path("slow")) (or a
test_slow() helper).

CRAN (or external collaborators) may be unwilling/unable to use as much
resources as you have. To test this package in a limited system you may
use a free project on https://rstudio.cloud/.

  • r2-14: RE "Function names are duplicated in other packages", the bot
    now says addressing this issue is optional.

  • r2-23: As you considered this deeply, I'm confident you'll make a good
    decision. Go ahead with what you think is more user friendly and easier
    to maintain, or the best balance between the two.

Please remove me from DESCRIPTION. I added my agreement by accident. It's
out policy that editors shouldn't be acknowledged. Your contribution is
enough to make us all happy. Thank you!

Reviewer Response

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 12

@maurolepore
Copy link
Member

@mbcann01,

Please remember to check if the changes were made to your satisfaction and indicate if you request more changes or recommend approving this package. If you approve please use this template.

@mbcann01
Copy link

Hi @maurolepore and @phuongquan,

I'm so sorry for the delay!

Reviewer Response

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3

@phuongquan
Copy link
Member Author

Hi @mbcann01 and @maurolepore,

That's great news, many thanks!

@maurolepore
Copy link
Member

@ropensci-review-bot approve daiquiri

@ropensci-review-bot
Copy link
Collaborator

Approved! Thanks @phuongquan for submitting and @brad-cannell, @elinw for your reviews! 😁

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You will need to enable two-factor authentication for your GitHub account.
    This invitation will expire after one week. If it happens write a comment @ropensci-review-bot invite me to ropensci/<package-name> which will re-send an invitation.
  • After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar, https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the new repository URL.
  • Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
  • We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
  • You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. They will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

@phuongquan
Copy link
Member Author

@ropensci-review-bot finalize transfer of daiquiri

@ropensci-review-bot
Copy link
Collaborator

Transfer completed.
The daiquiri team is now owner of the repository and the author has been invited to the team

@maurolepore
Copy link
Member

maurolepore commented Nov 2, 2022

@phuongquan, I think you can't check the boxes in the "To-dos" task-list, right? If so, here is the source of that task-list. Can you please copy-paste it into a new comment and check the boxes? That will help me see if there is anything else I could help you with.

To-dos:
- [ ] Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo.  I have invited you to a team that should allow you to do so. You will need to [enable two-factor authentication](https://docs.github.com/en/authentication/securing-your-account-with-two-factor-authentication-2fa/configuring-two-factor-authentication) for your GitHub account.
**This invitation will expire after one week. If it happens write a comment `@ropensci-review-bot invite me to ropensci/<package-name>` which will re-send an invitation.**
- [ ] After transfer write a comment `@ropensci-review-bot finalize transfer of <package-name>` where `<package-name>` is the repo/package name. This will give you admin access back.
- [ ] Fix all links to the GitHub repo to point to the repo under the ropensci organization.
- [ ] Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
- [ ] If you already had a `pkgdown` website **and are ok relying only on [rOpenSci central docs building and branding](https://devguide.ropensci.org/ci.html#even-more-ci-ropensci-docs)**,
    * deactivate the automatic deployment you might have set up
    * remove styling tweaks from your pkgdown config but keep that config file
    * replace the whole current `pkgdown` website with a [redirecting page](https://devguide.ropensci.org/redirect.html)
    * replace your package docs URL with `https://docs.ropensci.org/package_name`
    * In addition, in your DESCRIPTION file, include the docs link in the `URL` field alongside the link to the GitHub repository, e.g.: `URL: https://docs.ropensci.org/foobar, https://github.com/ropensci/foobar`
- [ ] Fix any links in badges for CI and coverage to point to the new repository URL. 
- [ ] Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
- [ ] We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running `codemetar::write_codemeta()` in the root of your package.
- [ ] You can add this installation method to your package README `install.packages("<package-name>", repos = "https://ropensci.r-universe.dev")` thanks to [R-universe](https://ropensci.org/blog/2021/06/22/setup-runiverse/).

@phuongquan
Copy link
Member Author

Hi @maurolepore, I have done everything except the central docs item, please see the comment below.

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You will need to enable two-factor authentication for your GitHub account.
    This invitation will expire after one week. If it happens write a comment @ropensci-review-bot invite me to ropensci/<package-name> which will re-send an invitation.
  • After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar, https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the new repository URL.
  • Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
  • We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
  • You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

I found an issue with the rOpenSci central docs. I have a link to an example report in the menu bar which works in the github pages site https://ropensci.github.io/daiquiri/misc/example_data_report.html but returns a 404 error in the central docs https://docs.ropensci.org/daiquiri/misc/example_data_report.html. I have therefore kept the pkgdown site for the time being.

@maurolepore
Copy link
Member

maurolepore commented Nov 3, 2022

@phuongquan, that's great progress, thanks!

A good place for the "Example report" is under vignettes/ as you did with the "Walkthrough for the daiquiri package". You may place it there with usethis::use_vignette() or usethis::use_article(). Or you can host it elsewhere and point to it from the navbar section of your _pkgdown.yml (see last line has a full URL not a relative path).

navbar:
  structure:
    left:  [intro, reference, example-report, news]
    right: [search, github]
  components:
    example-report:
      text: Example report
      href: https://path/to/example_data_report.html

@phuongquan
Copy link
Member Author

@maurolepore @mbcann01,

I'm currently in the process of preparing a JOSS paper, are you happy for me to acknowledge you as reviewers in it?

@phuongquan
Copy link
Member Author

Hi @maurolepore @mbcann01,

I just wanted to ask again if you are happy to be acknowledged as reviewers in a JOSS paper? I'd like to try to submit it by the end of the week so a quick response either way would be much appreciated. I won't include you if I don't hear from you.

Thanks

@maurolepore
Copy link
Member

Oh, sorry for the delay. Your contribution is enough, thanks! rOpenSci's guidlines is to not acknowledge editors -- which I think it applies here because I was both reviewer and editor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants