{superspreading} v.0.1.0 full pkg review #31

joshwlambert · 2023-06-14T09:01:48Z

This PR is to provide a platform to review the entirety of the package.

This PR is unconventional as it is not intended for merging or for additional commits (unless minor) and instead comments will be converted to issues and these will be addressed in their own PRs.

Add probability of epidemic in single-type branching process, and example based on Ebola data

since it is now handled at the org level

…piparameter as remote dep, WIP #2

Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>

Co-authored-by: James Azam <james.m.azam@gmail.com>

pratikunterwegs

Hi @joshwlambert thanks for opening this and trying out this alternative format for full package reviews, looks like a nice job overall. I've just put down some comments here, and I'm happy to discuss them as well.

General

This is a full package review for v0.1.0 of {superspreading} and my first review of this package. I think the use cases for the package are clear, and the exported functions are suitable to the use cases.

On the technical side, there could be better integration with {epiparameter} in terms of passing <epidist>s directly to some functions rather than, or as well as $R$ and $k$. Some of the input checking could be improved, especially regarding limits for numeric values.

The vignettes could be a bit clearer and better organised, and the latter half of the "Epidemic risk" vignette could probably be split off into a separate vignette on policy decisions. Both in the vignettes and the Readme, there could be a better explanation of how the {fitdistrplus} and {distributional} packages integrate with {superspreading} as they are prominently featured there.

One minor point: the package hex logo appears a bit flattened, while the letters package name in the logo looks a bit squashed together - it might help to compare against one the other logos we have.

Most of my comments are on the vignettes. I've been happy to look at the tests as a whole, but have not really gone into them in detail to check if some functionality is not explicitly tested, so this could be a good place for other reviewers to begin.

I will add more line-by-line comments on the files for issues for which there are small and easy fixes.

General technical

Great that code coverage is at 100%, and the package website has already been prepared,
Interesting use of Vectorise() --- looks alright to me, but not something I'm familiar with.
I see that there's a dependency on {bpmodels} for chain_sim() - something to flag for when it is superseded by {epichains}
I like the names of the functions and the use cases which are clear; make sure that function titles are imperatives (see e.g. proportion_cluster_size())
Suggest a {styler} run over the source and vignette files
In probability_*() and proportion_*(), should assert_number() and assert_numeric() (where appropriate) specify a lower limit on $R$ and $k$? Similarly, consider len = 1L for assert_logical(), and assert_count() for case_threshold in probability_contain()
Consider whether arguments (such as a in probability_*()) could/should probably be renamed to something more informative, such as "initial_cases". An informative name for k would be great too.
It would probably be good to allow users to pass an offspring distribution as an <epidist> in place of R and k where relevant This would reduce the burden on users to find which of the members of an <epidist> should be passed to R and which to k.

Functions and files

Readme

Not sure that "superspreading" in the Readme needs to be linked to Github - there's already a GH button on the top bar of the pkgdown website.
~~3. Not sure we need to link to {pak} either - we don't really want people to click away from the text unless it's a key dependency (e.g. {epiparameter} for {datadelay}).~~
Opening sentence could have a bit more detail - see e.g. the opening sections for {finalsize} and {epidemics} - this is a biased view admittedly. E.g. "...provides a set of functions to estimate and understand individual-level variation in the transmission of infectious diseases from data on secondary cases."
Suggest also adding key references such as Lloyd-Smith.
In the Quick start: Calculate the heterogeneity of transmission, consider using full sentences - current text is a minimal set of descriptions of three datasets. Suggest writing what this example is doing, and which functions are showcased here.
1. Where in this example is a function from {superspreading} actually used?
2. What models is {fitdistrplus} being used to fit?
3. What are index_case_transmission and secondary_case_transmission --- add an informative comment.
4. Suggest unifying terminology --- use one of "non-index" or "secondary", if these are the same, and explain (and link) the two either way.
5. Clarify code comments - what is R and k in the example? Suggest explaining what {fitdistrplus} is being used for here - we know, but new users might not.
In the Quick start: Calculate the probability of large epidemic, consider whether if this is the actual example of using {superspreading}, it should not be better explained or more prominent that the section above, which does not really seem to be an example of {superspreading} per se.
1. Add an "a" before "epidemic",
2. Add a few sentences explaining the use case and the function used.
3. Add comments explaining how the fitted nbinom distribution is being used.
Suggest using numbered or bulleted list for the differences with {bpmodels} under "Related projects" --- but also consider whether we want to direct users there since it is being superseded.

Vignettes

General

Repeat yourself, rather than referring to a concept as "that" etc. --- this makes it clearer for new users who are unfamiliar with the subject.
Break sentences after 2 -- 3 clauses, and split paragraphs so that it is easier to read little chunks of information.
Explain the epidemiological or public health implications of statements to make the maths clearer for readers. E.g. "Values of k less than one indicate overdispersion of disease transmission, a signature of superspreading." --- this could be explained in lay terms to make the concept clearer. Refer to papers as appropriate for more advanced or adventurous readers.
Also explain terms encountered for the first time, e.g. "offspring distribution" --- more welcoming of non-domain users.
Check the code for references, as you have likely used (@reference) leading to the year being in braces (as for in-line ref.), and remember to add a "References" section at the end
Hide code to generate tables, it's not super important imo.
I would suggest using Tidyverse code for use-cases such as counting the frequency of secondary cases (currently using tally()) --- I appreciate this adds extra dependencies, but this appears to be the background from most people will approach this material.
Unify how you refer to packages, as there are currently superspreading (Readme), {superspreading} (Get started, other vignettes), and superspreading (also Readme).

Get started

Make section or subsection of the Definition --- I liked that there is a definition provided. It could even be a little bit more detailed.
When loading libraries, explain what they are for --- users have not encountered the {epiparameter} or {distributional} packages before in the Readme.
Explain the difference between $R$ and $R_0$ since both are written here; and use mathematical notation for $k$ and $a$ (if appropriate).
Link to {epiparameter} so readers understand what that project is about, esp. how the params are collected

Estimate individual-level transmission

I liked the opening sentences of this vignette - it would be worth copying them to the "Get started" vignette as well.
"Transmission data" heading should probably come before the paragraphs describing the data.
Might be worth explaining model selection using AIC in one sentence, with a reference.
Fig.2 - suggest distinct colours for the two distributions

Epidemic risk

I liked the explanation of the use-case, and it might be good to explicitly state that under a heading, as in {finalsize}.
I did not like the first code chunk - it's a dense block of code with no explanation - ideally this would be separated into small chunks with some explanation so users can follow along and quickly adapt it to their own data.
Fig. 1 - suggest placing points for SARS and MERS, the dashed lines are confusing and suggest rather a range - not sure if that was the intention.
Suggest additional (sub)sections on the effect of the number of initial introductions, and the levels of heterogeneity in transmission, and the probability of outbreak extinction.
Link to CMMID Shiny app might be iffy if they decide to take the app down at some point - this has already been discussed before so there's a good chance it will happen sooner rather than later.
I would separate "Superspreading in decision making" and "Controls on transmission" into a separate vignette (together) on decision-making and policy; with more explanation as to the use-case, the tie-in with {epiparameter}, and the explanations of how such methods have previously been used in policy.
I could not understand very much about the proportion_cluster_size() function from the code chunk in the vignette - suggest explaining more how it is being used and how to interpret the inputs and results.

Bisaloo

Looks good to me!

Two general comments:

you have # nolint comments in many places next to the function argument definition. As far as I can tell, this is not necessary, can we remove them?
in the Roxygen2 docs, you can use \doi{} for the references instead of having the doi as a link.

R/offspring_distributions.R

R/probability_contain.R

R/probability_epidemic.R

R/probability_contain.R

DESCRIPTION

vignettes/superspreading.Rmd

joshwlambert · 2023-07-28T14:43:30Z

Thanks to both reviewers for taking the time to look through the package and making helpful suggestions.

joshwlambert · 2023-07-28T14:43:49Z

The vignettes could be a bit clearer and better organised, and the latter half of the "Epidemic risk" vignette could probably be split off into a separate vignette on policy decisions.

I would separate "Superspreading in decision making" and "Controls on transmission" into a separate vignette (together) on decision-making and policy; with more explanation as to the use-case, the tie-in with {epiparameter}, and the explanations of how such methods have previously been used in policy.

For now there is not much information on policy decisions as there is minimal functionality on containment. As the containment functionality is expanded I will keep in mind that splitting the vignettes could be beneficial.

Both in the vignettes and the Readme, there could be a better explanation of how the {fitdistrplus} and {distributional} packages integrate with {superspreading} as they are prominently featured there.

I have improved the explanation of {fitdistrplus}. {distributional} has been removed with updates to {epiparameter} so does not need explaining.

One minor point: the package hex logo appears a bit flattened, while the letters package name in the logo looks a bit squashed together - it might help to compare against one the other logos we have.

Thanks, it was different from the dimensions of the {epiparameter} logo so I've resized it to match.

I see that there's a dependency on {bpmodels} for chain_sim() - something to flag for when it is superseded by {epichains}

The plan is to transition to using {epichains} once it is stable.

Not sure that "superspreading" in the Readme needs to be linked to Github - there's already a GH button on the top bar of the pkgdown website.

This seems to be an automatic feature on pkgdown as I do not explicitly link it in the README and this is only there on the website and not when reading the README.md on github.

In the Quick start: Calculate the probability of large epidemic, consider whether if this is the actual example of using {superspreading}, it should not be better explained or more prominent that the section above, which does not really seem to be an example of {superspreading} per se.

The Calculate probability of large epidemic follows on from Calculate heterogeneity of transmission as it logically follows that after estimating $R$ and $k$ that these can be used in {superspreading} functions. Therefore I think inverting the order would be more confusing.

Fig. 1 - suggest placing points for SARS and MERS, the dashed lines are confusing and suggest rather a range - not sure if that was the intention.

This figure is a reproduction of a figure from Kucharski et al. (2020) (see fig legend), so I will leave as is.

Suggest additional (sub)sections on the effect of the number of initial introductions, and the levels of heterogeneity in transmission, and the probability of outbreak extinction.

I think the plots make the interpretation of the results relatively straightforward. Will consider splitting if others think it could improve readability.

Link to CMMID Shiny app might be iffy if they decide to take the app down at some point - this has already been discussed before so there's a good chance it will happen sooner rather than later.

Yes, I've considered this and will monitor if it remains active and will remove the link if it gets taken down. For now I think it provides a nice example of the application of these methods in a real outbreak scenario.

Suggest using numbered or bulleted list for the differences with {bpmodels} under "Related projects" --- but also consider whether we want to direct users there since it is being superseded.

I will keep referring to {bpmodels} as long as it is used by the package. Once {epichains} is stable and is taken on as a dependency I will remove {bpmodels} from the related packages and add {epichains}.

joshwlambert · 2023-07-28T14:52:47Z

Closing this PR (without merging) now as comments have been addressed in the PRs and issues linked above and from the responses I've posted.

The package is now ready for v0.1.0 release (after going through the release checklist). 🚀

adamkucharski and others added 30 commits November 13, 2022 16:31

Update file names

ab0df41

Update dependencies and README

910a9ad

Automatic readme update

99e63ff

Add P(epidemic) function

cad9238

Add probability of epidemic in single-type branching process, and example based on Ebola data

Update refs and dependency functions

b5d2983

Update README.md

9f00f97

add Rproj file

c999ae7

deleted test_read_stuff file from package template

7b9cc31

added files to Rbuildignore

bf89763

roxygenize docs

2358c3d

changed version to dev and updated author list in DESCRIPTION

34cdae4

linting probability_epidemic

06eaa02

added tests for probability_epidemic

15373ff

updated github action workflows

1181aa9

Automatic readme update

b75bfd3

fixed old package refs from package template

a313f91

Automatic readme update

0dcd8ca

Delete lint-changed-files.yaml

5a41d4d

since it is now handled at the org level

Delete .DS_Store

05b0af9

added superspreading getting started vignette, references file, and e…

d27fc54

…piparameter as remote dep, WIP #2

added distributional to suggests

d753c4d

added empirical superspreading section to vignette, closes #2

6dc1db7

added alert to vignette, ref #2

304098b

fixed typo in probability_epidemic doc

4427e88

linting vignette

d1c90c5

Update vignettes/superspreading.Rmd

9728695

Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>

Update vignettes/superspreading.Rmd

453ab97

Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>

Update vignettes/superspreading.Rmd

865687b

Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>

Update vignettes/superspreading.Rmd

f3e8777

Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>

Update vignettes/superspreading.Rmd

2b948ce

Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>

joshwlambert and others added 9 commits June 13, 2023 13:30

Fix typo in epidemic_risk code chunk

0156697

Co-authored-by: James Azam <james.m.azam@gmail.com>

fixed spelling

80b0eb7

updated pkgdown reference

7295f2f

add info on vignettes plotting code chunks

0089626

Automatic readme update

75b81b4

added subheader to vignette plotting in readme

bfd1720

Automatic readme update

d340bad

changed assertions from camel case to snake case

1431048

added contributing guide

bf55034

joshwlambert added the Pkg review Full package review label Jun 14, 2023

joshwlambert added this to the v.0.1.0 milestone Jun 14, 2023

joshwlambert requested a review from Bisaloo June 14, 2023 09:13

pratikunterwegs reviewed Jun 15, 2023

View reviewed changes

Bisaloo reviewed Jul 4, 2023

View reviewed changes

joshwlambert closed this Jul 28, 2023

joshwlambert deleted the review branch January 9, 2024 14:35

Bisaloo mentioned this pull request Jan 23, 2024

Add types of review section epiverse-trace/blueprints#67

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{superspreading} v.0.1.0 full pkg review #31

{superspreading} v.0.1.0 full pkg review #31

joshwlambert commented Jun 14, 2023

pratikunterwegs left a comment •

edited

Bisaloo left a comment

joshwlambert commented Jul 28, 2023

joshwlambert commented Jul 28, 2023

joshwlambert commented Jul 28, 2023

{superspreading} v.0.1.0 full pkg review #31

{superspreading} v.0.1.0 full pkg review #31

Conversation

joshwlambert commented Jun 14, 2023

pratikunterwegs left a comment • edited

Choose a reason for hiding this comment

General

General technical

Functions and files

Readme

Vignettes

General

Get started

Estimate individual-level transmission

Epidemic risk

Bisaloo left a comment

Choose a reason for hiding this comment

joshwlambert commented Jul 28, 2023

joshwlambert commented Jul 28, 2023

joshwlambert commented Jul 28, 2023

pratikunterwegs left a comment •

edited