Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{superspreading} v.0.1.0 full pkg review #31

Closed
wants to merge 114 commits into from
Closed

{superspreading} v.0.1.0 full pkg review #31

wants to merge 114 commits into from

Conversation

joshwlambert
Copy link
Member

This PR is to provide a platform to review the entirety of the package.

This PR is unconventional as it is not intended for merging or for additional commits (unless minor) and instead comments will be converted to issues and these will be addressed in their own PRs.

adamkucharski and others added 30 commits November 13, 2022 16:31
Add probability of epidemic in single-type branching process, and example based on Ebola data
since it is now handled at the org level
Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>
Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>
Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>
Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>
Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>
@joshwlambert joshwlambert added the Pkg review Full package review label Jun 14, 2023
@joshwlambert joshwlambert added this to the v.0.1.0 milestone Jun 14, 2023
Copy link
Member

@pratikunterwegs pratikunterwegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @joshwlambert thanks for opening this and trying out this alternative format for full package reviews, looks like a nice job overall. I've just put down some comments here, and I'm happy to discuss them as well.

General

This is a full package review for v0.1.0 of {superspreading} and my first review of this package. I think the use cases for the package are clear, and the exported functions are suitable to the use cases.

On the technical side, there could be better integration with {epiparameter} in terms of passing <epidist>s directly to some functions rather than, or as well as $R$ and $k$. Some of the input checking could be improved, especially regarding limits for numeric values.

The vignettes could be a bit clearer and better organised, and the latter half of the "Epidemic risk" vignette could probably be split off into a separate vignette on policy decisions. Both in the vignettes and the Readme, there could be a better explanation of how the {fitdistrplus} and {distributional} packages integrate with {superspreading} as they are prominently featured there.

One minor point: the package hex logo appears a bit flattened, while the letters package name in the logo looks a bit squashed together - it might help to compare against one the other logos we have.

Most of my comments are on the vignettes. I've been happy to look at the tests as a whole, but have not really gone into them in detail to check if some functionality is not explicitly tested, so this could be a good place for other reviewers to begin.

I will add more line-by-line comments on the files for issues for which there are small and easy fixes.

General technical

  1. Great that code coverage is at 100%, and the package website has already been prepared,
  2. Interesting use of Vectorise() --- looks alright to me, but not something I'm familiar with.
  3. I see that there's a dependency on {bpmodels} for chain_sim() - something to flag for when it is superseded by {epichains}
  4. I like the names of the functions and the use cases which are clear; make sure that function titles are imperatives (see e.g. proportion_cluster_size())
  5. Suggest a {styler} run over the source and vignette files
  6. In probability_*() and proportion_*(), should assert_number() and assert_numeric() (where appropriate) specify a lower limit on $R$ and $k$? Similarly, consider len = 1L for assert_logical(), and assert_count() for case_threshold in probability_contain()
  7. Consider whether arguments (such as a in probability_*()) could/should probably be renamed to something more informative, such as "initial_cases". An informative name for k would be great too.
  8. It would probably be good to allow users to pass an offspring distribution as an <epidist> in place of R and k where relevant This would reduce the burden on users to find which of the members of an <epidist> should be passed to R and which to k.

Functions and files

Readme

  1. Not sure that "superspreading" in the Readme needs to be linked to Github - there's already a GH button on the top bar of the pkgdown website.
    3. Not sure we need to link to {pak} either - we don't really want people to click away from the text unless it's a key dependency (e.g. {epiparameter} for {datadelay}).
  2. Opening sentence could have a bit more detail - see e.g. the opening sections for {finalsize} and {epidemics} - this is a biased view admittedly. E.g. "...provides a set of functions to estimate and understand individual-level variation in the transmission of infectious diseases from data on secondary cases."
  3. Suggest also adding key references such as Lloyd-Smith.
  4. In the Quick start: Calculate the heterogeneity of transmission, consider using full sentences - current text is a minimal set of descriptions of three datasets. Suggest writing what this example is doing, and which functions are showcased here.
    1. Where in this example is a function from {superspreading} actually used?
    2. What models is {fitdistrplus} being used to fit?
    3. What are index_case_transmission and secondary_case_transmission --- add an informative comment.
    4. Suggest unifying terminology --- use one of "non-index" or "secondary", if these are the same, and explain (and link) the two either way.
    5. Clarify code comments - what is R and k in the example? Suggest explaining what {fitdistrplus} is being used for here - we know, but new users might not.
  5. In the Quick start: Calculate the probability of large epidemic, consider whether if this is the actual example of using {superspreading}, it should not be better explained or more prominent that the section above, which does not really seem to be an example of {superspreading} per se.
    1. Add an "a" before "epidemic",
    2. Add a few sentences explaining the use case and the function used.
    3. Add comments explaining how the fitted nbinom distribution is being used.
  6. Suggest using numbered or bulleted list for the differences with {bpmodels} under "Related projects" --- but also consider whether we want to direct users there since it is being superseded.

Vignettes

General

  1. Repeat yourself, rather than referring to a concept as "that" etc. --- this makes it clearer for new users who are unfamiliar with the subject.
  2. Break sentences after 2 -- 3 clauses, and split paragraphs so that it is easier to read little chunks of information.
  3. Explain the epidemiological or public health implications of statements to make the maths clearer for readers. E.g. "Values of k less than one indicate overdispersion of disease transmission, a signature of superspreading." --- this could be explained in lay terms to make the concept clearer. Refer to papers as appropriate for more advanced or adventurous readers.
  4. Also explain terms encountered for the first time, e.g. "offspring distribution" --- more welcoming of non-domain users.
  5. Check the code for references, as you have likely used (@reference) leading to the year being in braces (as for in-line ref.), and remember to add a "References" section at the end
  6. Hide code to generate tables, it's not super important imo.
  7. I would suggest using Tidyverse code for use-cases such as counting the frequency of secondary cases (currently using tally()) --- I appreciate this adds extra dependencies, but this appears to be the background from most people will approach this material.
  8. Unify how you refer to packages, as there are currently superspreading (Readme), {superspreading} (Get started, other vignettes), and superspreading (also Readme).

Get started

  1. Make section or subsection of the Definition --- I liked that there is a definition provided. It could even be a little bit more detailed.
  2. When loading libraries, explain what they are for --- users have not encountered the {epiparameter} or {distributional} packages before in the Readme.
  3. Explain the difference between $R$ and $R_0$ since both are written here; and use mathematical notation for $k$ and $a$ (if appropriate).
  4. Link to {epiparameter} so readers understand what that project is about, esp. how the params are collected

Estimate individual-level transmission

  1. I liked the opening sentences of this vignette - it would be worth copying them to the "Get started" vignette as well.
  2. "Transmission data" heading should probably come before the paragraphs describing the data.
  3. Might be worth explaining model selection using AIC in one sentence, with a reference.
  4. Fig.2 - suggest distinct colours for the two distributions

Epidemic risk

  1. I liked the explanation of the use-case, and it might be good to explicitly state that under a heading, as in {finalsize}.
  2. I did not like the first code chunk - it's a dense block of code with no explanation - ideally this would be separated into small chunks with some explanation so users can follow along and quickly adapt it to their own data.
  3. Fig. 1 - suggest placing points for SARS and MERS, the dashed lines are confusing and suggest rather a range - not sure if that was the intention.
  4. Suggest additional (sub)sections on the effect of the number of initial introductions, and the levels of heterogeneity in transmission, and the probability of outbreak extinction.
  5. Link to CMMID Shiny app might be iffy if they decide to take the app down at some point - this has already been discussed before so there's a good chance it will happen sooner rather than later.
  6. I would separate "Superspreading in decision making" and "Controls on transmission" into a separate vignette (together) on decision-making and policy; with more explanation as to the use-case, the tie-in with {epiparameter}, and the explanations of how such methods have previously been used in policy.
  7. I could not understand very much about the proportion_cluster_size() function from the code chunk in the vignette - suggest explaining more how it is being used and how to interpret the inputs and results.

Copy link
Member

@Bisaloo Bisaloo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Two general comments:

  • you have # nolint comments in many places next to the function argument definition. As far as I can tell, this is not necessary, can we remove them?
  • in the Roxygen2 docs, you can use \doi{} for the references instead of having the doi as a link.

R/offspring_distributions.R Show resolved Hide resolved
R/probability_contain.R Show resolved Hide resolved
R/probability_epidemic.R Show resolved Hide resolved
R/probability_contain.R Show resolved Hide resolved
DESCRIPTION Show resolved Hide resolved
vignettes/superspreading.Rmd Show resolved Hide resolved
@joshwlambert
Copy link
Member Author

Thanks to both reviewers for taking the time to look through the package and making helpful suggestions.

@joshwlambert
Copy link
Member Author

The vignettes could be a bit clearer and better organised, and the latter half of the "Epidemic risk" vignette could probably be split off into a separate vignette on policy decisions.

I would separate "Superspreading in decision making" and "Controls on transmission" into a separate vignette (together) on decision-making and policy; with more explanation as to the use-case, the tie-in with {epiparameter}, and the explanations of how such methods have previously been used in policy.

For now there is not much information on policy decisions as there is minimal functionality on containment. As the containment functionality is expanded I will keep in mind that splitting the vignettes could be beneficial.

Both in the vignettes and the Readme, there could be a better explanation of how the {fitdistrplus} and {distributional} packages integrate with {superspreading} as they are prominently featured there.

I have improved the explanation of {fitdistrplus}. {distributional} has been removed with updates to {epiparameter} so does not need explaining.

One minor point: the package hex logo appears a bit flattened, while the letters package name in the logo looks a bit squashed together - it might help to compare against one the other logos we have.

Thanks, it was different from the dimensions of the {epiparameter} logo so I've resized it to match.

I see that there's a dependency on {bpmodels} for chain_sim() - something to flag for when it is superseded by {epichains}

The plan is to transition to using {epichains} once it is stable.

Not sure that "superspreading" in the Readme needs to be linked to Github - there's already a GH button on the top bar of the pkgdown website.

This seems to be an automatic feature on pkgdown as I do not explicitly link it in the README and this is only there on the website and not when reading the README.md on github.

In the Quick start: Calculate the probability of large epidemic, consider whether if this is the actual example of using {superspreading}, it should not be better explained or more prominent that the section above, which does not really seem to be an example of {superspreading} per se.

The Calculate probability of large epidemic follows on from Calculate heterogeneity of transmission as it logically follows that after estimating $R$ and $k$ that these can be used in {superspreading} functions. Therefore I think inverting the order would be more confusing.

Fig. 1 - suggest placing points for SARS and MERS, the dashed lines are confusing and suggest rather a range - not sure if that was the intention.

This figure is a reproduction of a figure from Kucharski et al. (2020) (see fig legend), so I will leave as is.

Suggest additional (sub)sections on the effect of the number of initial introductions, and the levels of heterogeneity in transmission, and the probability of outbreak extinction.

I think the plots make the interpretation of the results relatively straightforward. Will consider splitting if others think it could improve readability.

Link to CMMID Shiny app might be iffy if they decide to take the app down at some point - this has already been discussed before so there's a good chance it will happen sooner rather than later.

Yes, I've considered this and will monitor if it remains active and will remove the link if it gets taken down. For now I think it provides a nice example of the application of these methods in a real outbreak scenario.

Suggest using numbered or bulleted list for the differences with {bpmodels} under "Related projects" --- but also consider whether we want to direct users there since it is being superseded.

I will keep referring to {bpmodels} as long as it is used by the package. Once {epichains} is stable and is taken on as a dependency I will remove {bpmodels} from the related packages and add {epichains}.

@joshwlambert
Copy link
Member Author

Closing this PR (without merging) now as comments have been addressed in the PRs and issues linked above and from the responses I've posted.

The package is now ready for v0.1.0 release (after going through the release checklist). 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pkg review Full package review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants