Review {cfr} v0.1.0 for CRAN release #78

pratikunterwegs · 2023-08-21T16:34:43Z

This PR is for a full package review prior to a CRAN release, and is also working towards items in the release checklist in #68.

This PR will be used to collect and address reviwer comments. Either this branch, or issue-specific branches will be used to push changes. The changes will be merged into main prior to release.

The proposed timeline is for all reviews to be in by 8th September 2023. The goal is to have {cfr} released to CRAN soon after. This schedule is contingent on releasing {epiparameter} to CRAN.

I would encourage/request two reviewers to sign up to do the full review. Please tick a box and assign yourself in the Reviewers panel on the right to indicate that you have taken on the review.

Reviewer 1
Reviewer 2

Following review this PR also fixes #80 and fixes #81.

pratikunterwegs · 2023-08-24T13:52:49Z

Thanks to @abdoelnaser-mahmood-degoot for sending me these comments over Slack - posting them here for all others to read.

I went through the {cfr} package and have some comments that I'd want to share with you.

1- I see an entanglement between two arguments: correct_for_delays and epidist, across several functions in the package resulting from how you designed/implemented the code. If correct_for_delays is set to TRUE, then epidist must be specified. The question is, what are the chances that the user will provide a value for the epidist parameter while ignoring to correct for the delays? When a user enters a value for epidist, he or she implicitly intends to account for delays. The correct_for_delays parameter may aid in the design and implementation of the package, but it may confuse the user.
2- In the quickest example, remove the comma after the data argument on the rolling_cfr_naive function.
3- In the readme, it would be better to offer the reader a brief code chunk that generates the ggplot2 figure.
4- Include a link to the source Camacho et al. in the documentation of ebola1976 data.
5- Mention the data type of severity_baseline and severity_baseline in the documentation of the estimate_ascertainment function parameters, just as you did for the other parameters. Also, what is the default value for the type argument, and why does max_date accept a string instead of a Date?
6- The default value for poisson_threshold, in many functions, is 100 and not 200.
7- Is it correct that you set the default value for the epidist argument to NULL in the known_outcomes and estimate_ascertainment functions? Likewise, for severity_baseline to 0.014; is this true for all diseases?

R/estimate_ascertainment.R

DESCRIPTION

pratikunterwegs · 2023-08-31T15:45:57Z

Thanks @Bisaloo, will wait for other reivews to come in and move to fix in the week of 11th Sept.

adamkucharski

Looking good overall. There are few issues I had with pulling from epiparameter – it may be a version problem, but wanted to flag.

General comment: the documentation uses a mix of 'case fatality rate', 'case fatality risk' and 'case fatality ratio', sometimes within the same document (e.g. vignette). We should pick one for consistency. 'Case fatality rate' is most commonly used, even though it is technically inaccurate (the value calculated is a risk rather than a rate). I would therefore go with 'case fatality risk' (or 'case fatality ratio' if this requires less refactoring).

Similarly, 'ascertainment rate' isn't really a rate - perhaps 'ascertainment ratio' or 'ascertainment probability' would be better?

vignettes/cfr.Rmd

vignettes/estimate_ascertainment.Rmd

vignettes/estimate_time_varying_severity.Rmd

jamesmbaazam

{cfr} full package review

I think {cfr} is in good shape and will be ready to be submitted to CRAN following this full package review. I have added general comments below as well as line-by-line comments on most of the changed files. Feel free to request my replies under the conversations I've started.

Routine automatic checks

The following routine checks were performed on the package and did not raise any issues.

README

General comments

The README looks fine to me. It's clear and concise, and all the badges are appropriate.

Improvements

I have added line specific comments to the README file.
I suggest to provide a brief overview of the core functions and what they do. This will help users to quickly understand what the package does and how to use it.

Functions

General comments

estimate_static():
- One major comment is for you to add details on the methods and references for the CFR estimation. This is important for people who are not familiar with CFR and want to learn more about it. Same comment goes for known_outcomes().
- Additionally, I would suggest adding a _cfr suffix to this function and others, i.e. estimate_static_cfr(). This will help users to quickly identify the functions that estimate CFRs.
known_outcomes():
estimate_ascertainment(): I would be interested to see the methods for this function as I am a little worried about its inner workings. For example, its results are highly sensitive to the severity_baseline argument, which is user supplied rather than estimated. I also have a little quibble about replacing all ascertainment ratios > 1 with 1. It seems hacky and I wonder why it is not dealt with by the model but rather as a post-processing step.

Improvements

I have added line specific comments to each function.

Vignettes

General comments

The vignettes lay out the core functionality in the package and are insightful. I have added line specific comments.
As a general comment, since vignettes are basically literate programming, I usually prefer not to use redundant phrases like "using the code below" or "the code below", "the following code", etc. The accompanying code should reinforce the written words.

Specific comments

"Estimating the proportion of cases that are ascertained during an outbreak":
- Another way to visualize the results to remove the effect of the severity_baseline argument is to plot the relative ratios between the countries so that the baseline severity cancels out, ruling out its effect.
"Calculating a static, delay-adjusted estimate of disease severity":
- I'm not convinced that Fig 3 and 7 are necessary. It's not clear what it adds to the text. I would suggest to remove it. It looks like something that belongs in {epiparameter} rather than {cfr}.
- I would suggest moving the section "Details: Adjusting for delays between two time series" to a theory vignette like in bpmodels and the details section of the respective functions. To be honest, it looks good here and seems to fit the context. Another way to go about it would be to hyperlink to the theory vignette at the beginning and end of this vignette like you have for the Getting Started vignette.
- I would suggest replacing the generic use of "known outcomes" with the specifics. I think in this vignette, it is the known death outcomes.
"Estimating how disease severity varies over the course of an outbreak":
- Title seems long. Suggest shortening to, for example, "Estimating time-varying CFRs"
- Same as before, I'm not convinced that Fig 3 is necessary. It's not clear what it adds to the text. I would suggest to remove it.
- Suggest adding a legend to Fig 4 and 5 indicate what the grey and tomato lines/ribbons represent.
- Also, what is going on before April in Figs 4 and 5? The ribbons look weird and are hard to interpret.
- Refer to my earlier comment above on moving the section "Details: Adjusting for delays between two time series".

Improvements

I have added line specific comments to each vignette.

Other infrastructure

DESCRIPTION

Package version is missing the patch number, i.e., currently 0.1 instead of 0.1.0. I would suggest to addding it to conform with the rest of our packages.

Tests

Might be worthwhile to provide more descriptive contexts for the tests, for example, test_that("multiplication works", ....
I did not inspect the tests in detail so do not have detailed comments.

README.Rmd

vignettes/estimate_static_severity.Rmd

vignettes/estimate_time_varying_severity.Rmd

joshwlambert · 2023-09-10T10:36:14Z

Full package review for {cfr}.

Overall the package looks good. It is functionally to the point which should help new users get to grips with the aims and applications of the package. The documentation is clearly written and provides a comprehensive overview of the functions and data requirements.

I like that the package is minimal in terms of namespace but contains a good amount of functionality. The inclusion of two datasets provides two clear use-cases for the package.

I've left some comments below. Some of these overlap with those already made by other reviewers but I thought I'd leave them in just to ensure they weren't missed.

Functions

Why does the severity_baseline argument in estimate_ascertainment() have a default of 0.014? This seems to me to be an argument you want to ensure the user specifies to prevent accidentally leaving the default value.
In estimate_ascertainment() is the apply() on L127 needed. From what I can see it is will always be a single row data.frame and can also have vectorised operations if it has more than one row.
In estimate_ascertainment() (and other functions) there seems to be an interaction of function arguments which is not documentation or caught by input checking. The setup appears to give type precedence so if type = "static" and an <epidist> is supplied it is assumed the user wants to use the "static" estimate. Not sure if this is a problem, and if it is I'm not sure I have a solution. One option (not sure how I feel about as it removes some of the clarity/verbosity of the function arguments) is to remove the type argument and instead control the flow of the function with is.null(epidist).
I think he first part of stopifnot() in estimate_rolling() (L61) is already checked by checkmate::assert_names() on L54-57
In estimate_static() is there a reason severity_estimate is assigned as an empty numeric? It seems that the if else that follows means that severity_estimate will always be assigned before the function returns
In estimate_time_varying() there are two input checks on correct_for_delays
The ordering of arguments seems to varying across the estimate_*() functions. Would be good to provide users with consistent argument ordering
estimate_time_varying() has a # TODO input checking comment but it seems that all input is checked
epidist is checked twice in input checking in estimate_time_varying()
In get_default_burn_in() it is possible to have an <epidist> object with a mean of NA. I would try and account for this. If you think it would be useful I can write an S3 method for mean() (thankfully it's a S3 generic in base) which can try and calculate a mean even if it is not given in the summary stats of the object.
The epidist default in known_outcomes is NULL. However, if an epidist is not supplied the function will error on L60 at assert_class(epidist, "epidist"). I would suggest removing the argument default.
The prepare_data.incidence2() function definition contains .... I suggest adding chkDots(...) to the function which will prevent users supplying arguments to be silently swallowed by .... I would also document that the dots are not used as currently the documentation is somewhat misleading and suggests these arguments will be used by the method.

Documentation

estimate_ascertainment() and estimate_time_varying() example fails due to changes in {epiparameter}. Adding single_epidist = TRUE to epidist_db() will solve the issue
It would be good to add the full reference of Camacho et al. (2014) to the ebola1976.R file, as you did for the covid_data.R file.
Check consistent use of imperative function description and full sentences (e.g. estimate_ascertainment.R)
Inconsistent use of data frame, data.frame & <data.frame> in documentation
Is estimate_rolling() using a method described or used in a paper, if so could it be referenced in the documentation?
known_outcomes can @inheritParams from estimate_static() for @param epidist
My assumption is that when users want to read the documentation for a method they use ?generic, in this case ?prepare_data, as this is the function they are interacting with and the dispatch takes place under the hood. I think the documentation for prepare_data.incidence2() is better and should be merged with the documentation for the generic so the user can find it all in one place. As this generic currently only has one method I would reword the documentation as it currently states: "it has methods for classes commonly found in epidemiological data", instead you can say that more methods for common epidemiological data classes will be added over time.
Due to documenting both the generic and the method for prepare_data() it appears twice in the package reference on the pkgdown site. If you want to keep both that is perfectly fine. I think the usual practise is to not document the method and just have the generic in the reference (e.g dplyr), but if you plan to add more methods you may want each of these to be shown. It can be quiet useful to have the class name in the function to show which classes the function applies to, in this case document the methods and just export the generic.
The code for the plot in the README is completely hidden. Is it worth having it hidden but with an option to reveal the code if users are interested in using it?
Is there a reason for the double asterisk in the figure caption of Fig 3 in estimate_static_severity.Rmd
Why does library(outbreaks) require a # nolint flag in the data_from_incidence2.Rmd vignette?

Testing

Testing coverage for all functions is high except for prepare_data() which has 72.41% coverage.

Package infrastructure

update GitHub action workflows to match those in {packagetemplate}
Add citation (.cff) file, CITATION file, and update-citation-cff.yml workflow, (this will require an addition to the Rbuildignore which is automatic if I remember correctly)
There are some folders in the .gitignore that are missing from the .gitignore from the {packagetemplate} and vice versa, would be good to check if these should be added to.
Update .lintr to match {packagetemplate}
In data-raw you save the ebola data with usethis::use_data(..., version = 3) and save the covid data with usethis::use_data(..., version = 2). Is there a reason for the difference?
A styler run updates a few files

Speculative idea: what if prepare_data() was moved internally to the estimate_*() functions? The estimate_*() functions remain normal (i.e. not S3 generics) but can accept multiple data types in data and prepate_data() does the work for the user. The way I see it the benefits to this approach are: 1) one less exported function (prepare_data() becomes internal); 2) less documentation around wrangling different data sources as the user does not need to know all the details. Downsides to this approach is potentially more arguments in the estimate_*() functions. But the current defaults of prepare_data() would match the requires documented to have a "cases" and a "deaths" column name.

pratikunterwegs · 2023-09-11T08:19:33Z

Thanks all for your feedback - I shall work on reconciling your suggestions and making changes this week. I would say the aim is to be ready for a CRAN submission as soon as possible, ideally within this week.

This commit adds a number of edits bundled together in a batch. Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk> Co-authored-by: James Azam <james.m.azam@gmail.com>

Co-authored-by: James Azam <james.m.azam@gmail.com>

Fix get_default_burn_in() Use {epiparameter} fns in get_default_burn_in()

jamesmbaazam · 2023-09-14T12:05:49Z

This is an interesting idea, and I think it would not take too much effort to implement. Still, I think it's good to have the conceptual separation between preparing data for CFR calculation, and the estimation step, reflected in the code for now. Happy to hear more thoughts on this from others.

The following linked issue in {serofoi} might be relevant to this conversation. @zmcucunuba also thinks that it makes sense to have two separate routines for preparing the data and modelling. See epiverse-trace/serofoi#61.

pratikunterwegs · 2023-09-14T15:32:09Z

Thanks again everyone for your feedback. The proposed and (some) scheduled changes have now been implemented. This PR will be retargeted to main and merged tomorrow morning. Please leave any final suggestions by then.

Bisaloo · 2023-09-14T16:03:06Z

These functions now have the epidist argument set to missing by default, the previous default was NULL

What's the motivation for this change? NULL probably signals more clearly that this argument is optional.

This is discussed in the tidyverse design guide as well: https://design.tidyverse.org/required-no-defaults.html

pratikunterwegs · 2023-09-15T10:15:54Z

These functions now have the epidist argument set to missing by default, the previous default was NULL

What's the motivation for this change? NULL probably signals more clearly that this argument is optional.

This is discussed in the tidyverse design guide as well: https://design.tidyverse.org/required-no-defaults.html

This has been changed to NULL by default in a5cdca3

Fix #78

pratikunterwegs self-assigned this Aug 21, 2023

pratikunterwegs added the Full-package review label Aug 21, 2023

joshwlambert self-requested a review August 21, 2023 16:39

adamkucharski self-requested a review August 23, 2023 10:13

jamesmbaazam self-requested a review August 29, 2023 10:10

Bisaloo reviewed Aug 31, 2023

View reviewed changes

R/estimate_ascertainment.R Outdated Show resolved Hide resolved

R/estimate_ascertainment.R Outdated Show resolved Hide resolved

DESCRIPTION Outdated Show resolved Hide resolved

adamkucharski reviewed Sep 8, 2023

View reviewed changes

jamesmbaazam reviewed Sep 8, 2023

View reviewed changes

pratikunterwegs and others added 18 commits September 11, 2023 12:22

Edits to vignettes from code review

e2720bb

This commit adds a number of edits bundled together in a batch. Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk> Co-authored-by: James Azam <james.m.azam@gmail.com>

Add further vignette suggestions from review

b5cffaf

Co-authored-by: James Azam <james.m.azam@gmail.com>

Automatic readme update

aa744f3

Remove dev packages from Suggests

4b42fb7

Account for changes to epidist_db(), WIP #80

b4db0ca

Remove creation of empty objects

cebac16

Style vignettes and Readme, also WIP #80

0201b75

Update fn get_default_burn_in() for better param access

0a1ae05

Fix get_default_burn_in() Use {epiparameter} fns in get_default_burn_in()

CFR: case fatality risk, use 'ascertainment ratio'

240f022

Update Readme.Rmd with details on {epiparameter}

6ba4d69

Correct for delays is default, epidist has no default value

6ed85d9

Change outputs to use full names, e.g. 'mean' instead of 'me'

bc59789

Update documentation and miscellaneous fixes

ea1567a

Updates to vignettes

040ef87

Update Readme.Rmd and DESCRIPTION

0b88c8f

Correct Readme.Rmd for correction by default

706b4de

Automatic readme update

d855f2c

Update pkg data prep and documentation

e0a96e6

pratikunterwegs and others added 3 commits September 14, 2023 15:32

Rm logical function args

c0eb9f2

Update Readme and add text to vignettes

b390dfc

Automatic readme update

1dfbdfc

pratikunterwegs added 2 commits September 15, 2023 11:14

Fn arg epidist defaults to NULL

a5cdca3

Update fn arg documentation and maths

683b893

This was linked to issues Sep 15, 2023

Account for changes to epiparameter::epidist_db() #80

Closed

Get epidist mean using S3 method #81

Closed

pratikunterwegs changed the base branch from empty to main September 15, 2023 10:28

pratikunterwegs added 10 commits September 15, 2023 11:47

Update math formatting in vignettes

4faa35b

Rename burn_in_value to burn_in, allow 0s

9e755ff

Minor formatting

8774741

Update wordlist

b576c3c

Update documentation for burn_in arg

eb02092

Update example for cfr_time_varying

c00d74e

Update formatting for cfr_time_varying

3dc8a45

Small possible speed up to cfr_time_varying

636522d

Test that known_outcomes gives no NAs

63a6759

Vectorise loop in cfr_time_varying()

c5c6527

pratikunterwegs mentioned this pull request Sep 15, 2023

Fill NAs with 0s by default in prepare_data.incidence2 #82

Closed

pratikunterwegs added 4 commits September 15, 2023 14:14

Remove DS_Store file

4135d2d

Rename MLE to maximum likelihood estimate

78f0bf7

Add citation for Covid-19 CFR estimate, add to ref library

eebc2c4

Update Rd files with new documentation

3a40679

pratikunterwegs merged commit 2b6171a into main Sep 15, 2023
8 checks passed

pratikunterwegs deleted the review branch September 15, 2023 13:44

pratikunterwegs pushed a commit that referenced this pull request Nov 10, 2023

Run testthat in parallel

7f3835c

Fix #78

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review {cfr} v0.1.0 for CRAN release #78

Review {cfr} v0.1.0 for CRAN release #78

pratikunterwegs commented Aug 21, 2023 •

edited

pratikunterwegs commented Aug 24, 2023

pratikunterwegs commented Aug 31, 2023

adamkucharski left a comment

jamesmbaazam left a comment •

edited

joshwlambert commented Sep 10, 2023

pratikunterwegs commented Sep 11, 2023

jamesmbaazam commented Sep 14, 2023

pratikunterwegs commented Sep 14, 2023

Bisaloo commented Sep 14, 2023

pratikunterwegs commented Sep 15, 2023

Review {cfr} v0.1.0 for CRAN release #78

Review {cfr} v0.1.0 for CRAN release #78

Conversation

pratikunterwegs commented Aug 21, 2023 • edited

pratikunterwegs commented Aug 24, 2023

pratikunterwegs commented Aug 31, 2023

adamkucharski left a comment

Choose a reason for hiding this comment

jamesmbaazam left a comment • edited

Choose a reason for hiding this comment

{cfr} full package review

Routine automatic checks

README

General comments

Improvements

Functions

General comments

Improvements

Vignettes

General comments

Specific comments

Improvements

Other infrastructure

DESCRIPTION

Tests

joshwlambert commented Sep 10, 2023

Full package review for {cfr}.

Functions

Documentation

Testing

Package infrastructure

pratikunterwegs commented Sep 11, 2023

jamesmbaazam commented Sep 14, 2023

pratikunterwegs commented Sep 14, 2023

Bisaloo commented Sep 14, 2023

pratikunterwegs commented Sep 15, 2023

pratikunterwegs commented Aug 21, 2023 •

edited

jamesmbaazam left a comment •

edited