New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{superspreading} v.0.1.0 full pkg review #31
Conversation
Add probability of epidemic in single-type branching process, and example based on Ebola data
since it is now handled at the org level
…piparameter as remote dep, WIP #2
Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>
Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>
Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>
Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>
Co-authored-by: Adam Kucharski <adam.kucharski@lshtm.ac.uk>
Co-authored-by: James Azam <james.m.azam@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @joshwlambert thanks for opening this and trying out this alternative format for full package reviews, looks like a nice job overall. I've just put down some comments here, and I'm happy to discuss them as well.
General
This is a full package review for v0.1.0 of {superspreading} and my first review of this package. I think the use cases for the package are clear, and the exported functions are suitable to the use cases.
On the technical side, there could be better integration with {epiparameter} in terms of passing <epidist>
s directly to some functions rather than, or as well as
The vignettes could be a bit clearer and better organised, and the latter half of the "Epidemic risk" vignette could probably be split off into a separate vignette on policy decisions. Both in the vignettes and the Readme, there could be a better explanation of how the {fitdistrplus} and {distributional} packages integrate with {superspreading} as they are prominently featured there.
One minor point: the package hex logo appears a bit flattened, while the letters package name in the logo looks a bit squashed together - it might help to compare against one the other logos we have.
Most of my comments are on the vignettes. I've been happy to look at the tests as a whole, but have not really gone into them in detail to check if some functionality is not explicitly tested, so this could be a good place for other reviewers to begin.
I will add more line-by-line comments on the files for issues for which there are small and easy fixes.
General technical
- Great that code coverage is at 100%, and the package website has already been prepared,
- Interesting use of
Vectorise()
--- looks alright to me, but not something I'm familiar with. - I see that there's a dependency on {bpmodels} for
chain_sim()
- something to flag for when it is superseded by {epichains} - I like the names of the functions and the use cases which are clear; make sure that function titles are imperatives (see e.g.
proportion_cluster_size()
) - Suggest a {styler} run over the source and vignette files
- In
probability_*()
andproportion_*()
, shouldassert_number()
andassert_numeric()
(where appropriate) specify a lower limit on$R$ and$k$ ? Similarly, considerlen = 1L
forassert_logical()
, andassert_count()
forcase_threshold
inprobability_contain()
- Consider whether arguments (such as
a
inprobability_*()
) could/should probably be renamed to something more informative, such as "initial_cases". An informative name fork
would be great too. - It would probably be good to allow users to pass an offspring distribution as an
<epidist>
in place ofR
andk
where relevant This would reduce the burden on users to find which of the members of an<epidist>
should be passed toR
and which tok
.
Functions and files
Readme
- Not sure that "superspreading" in the Readme needs to be linked to Github - there's already a GH button on the top bar of the pkgdown website.
3. Not sure we need to link to {pak} either - we don't really want people to click away from the text unless it's a key dependency (e.g. {epiparameter} for {datadelay}). - Opening sentence could have a bit more detail - see e.g. the opening sections for {finalsize} and {epidemics} - this is a biased view admittedly. E.g. "...provides a set of functions to estimate and understand individual-level variation in the transmission of infectious diseases from data on secondary cases."
- Suggest also adding key references such as Lloyd-Smith.
- In the Quick start: Calculate the heterogeneity of transmission, consider using full sentences - current text is a minimal set of descriptions of three datasets. Suggest writing what this example is doing, and which functions are showcased here.
- Where in this example is a function from {superspreading} actually used?
- What models is {fitdistrplus} being used to fit?
- What are
index_case_transmission
andsecondary_case_transmission
--- add an informative comment. - Suggest unifying terminology --- use one of "non-index" or "secondary", if these are the same, and explain (and link) the two either way.
- Clarify code comments - what is
R
andk
in the example? Suggest explaining what {fitdistrplus} is being used for here - we know, but new users might not.
- In the Quick start: Calculate the probability of large epidemic, consider whether if this is the actual example of using {superspreading}, it should not be better explained or more prominent that the section above, which does not really seem to be an example of {superspreading} per se.
- Add an "a" before "epidemic",
- Add a few sentences explaining the use case and the function used.
- Add comments explaining how the fitted
nbinom
distribution is being used.
- Suggest using numbered or bulleted list for the differences with {bpmodels} under "Related projects" --- but also consider whether we want to direct users there since it is being superseded.
Vignettes
General
- Repeat yourself, rather than referring to a concept as "that" etc. --- this makes it clearer for new users who are unfamiliar with the subject.
- Break sentences after 2 -- 3 clauses, and split paragraphs so that it is easier to read little chunks of information.
- Explain the epidemiological or public health implications of statements to make the maths clearer for readers. E.g. "Values of
k
less than one indicate overdispersion of disease transmission, a signature of superspreading." --- this could be explained in lay terms to make the concept clearer. Refer to papers as appropriate for more advanced or adventurous readers. - Also explain terms encountered for the first time, e.g. "offspring distribution" --- more welcoming of non-domain users.
- Check the code for references, as you have likely used
(@reference)
leading to the year being in braces (as for in-line ref.), and remember to add a "References" section at the end - Hide code to generate tables, it's not super important imo.
- I would suggest using Tidyverse code for use-cases such as counting the frequency of secondary cases (currently using
tally()
) --- I appreciate this adds extra dependencies, but this appears to be the background from most people will approach this material. - Unify how you refer to packages, as there are currently superspreading (Readme), {superspreading} (Get started, other vignettes), and superspreading (also Readme).
Get started
- Make section or subsection of the Definition --- I liked that there is a definition provided. It could even be a little bit more detailed.
- When loading libraries, explain what they are for --- users have not encountered the {epiparameter} or {distributional} packages before in the Readme.
- Explain the difference between
$R$ and$R_0$ since both are written here; and use mathematical notation for$k$ and$a$ (if appropriate). - Link to {epiparameter} so readers understand what that project is about, esp. how the params are collected
Estimate individual-level transmission
- I liked the opening sentences of this vignette - it would be worth copying them to the "Get started" vignette as well.
- "Transmission data" heading should probably come before the paragraphs describing the data.
- Might be worth explaining model selection using AIC in one sentence, with a reference.
- Fig.2 - suggest distinct colours for the two distributions
Epidemic risk
- I liked the explanation of the use-case, and it might be good to explicitly state that under a heading, as in {finalsize}.
- I did not like the first code chunk - it's a dense block of code with no explanation - ideally this would be separated into small chunks with some explanation so users can follow along and quickly adapt it to their own data.
- Fig. 1 - suggest placing points for SARS and MERS, the dashed lines are confusing and suggest rather a range - not sure if that was the intention.
- Suggest additional (sub)sections on the effect of the number of initial introductions, and the levels of heterogeneity in transmission, and the probability of outbreak extinction.
- Link to CMMID Shiny app might be iffy if they decide to take the app down at some point - this has already been discussed before so there's a good chance it will happen sooner rather than later.
- I would separate "Superspreading in decision making" and "Controls on transmission" into a separate vignette (together) on decision-making and policy; with more explanation as to the use-case, the tie-in with {epiparameter}, and the explanations of how such methods have previously been used in policy.
- I could not understand very much about the
proportion_cluster_size()
function from the code chunk in the vignette - suggest explaining more how it is being used and how to interpret the inputs and results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Two general comments:
- you have
# nolint
comments in many places next to the function argument definition. As far as I can tell, this is not necessary, can we remove them? - in the Roxygen2 docs, you can use
\doi{}
for the references instead of having the doi as a link.
Thanks to both reviewers for taking the time to look through the package and making helpful suggestions. |
For now there is not much information on policy decisions as there is minimal functionality on containment. As the containment functionality is expanded I will keep in mind that splitting the vignettes could be beneficial.
I have improved the explanation of {fitdistrplus}. {distributional} has been removed with updates to {epiparameter} so does not need explaining.
Thanks, it was different from the dimensions of the {epiparameter} logo so I've resized it to match.
The plan is to transition to using {epichains} once it is stable.
This seems to be an automatic feature on pkgdown as I do not explicitly link it in the README and this is only there on the website and not when reading the README.md on github.
The Calculate probability of large epidemic follows on from Calculate heterogeneity of transmission as it logically follows that after estimating
This figure is a reproduction of a figure from Kucharski et al. (2020) (see fig legend), so I will leave as is.
I think the plots make the interpretation of the results relatively straightforward. Will consider splitting if others think it could improve readability.
Yes, I've considered this and will monitor if it remains active and will remove the link if it gets taken down. For now I think it provides a nice example of the application of these methods in a real outbreak scenario.
I will keep referring to {bpmodels} as long as it is used by the package. Once {epichains} is stable and is taken on as a dependency I will remove {bpmodels} from the related packages and add {epichains}. |
Closing this PR (without merging) now as comments have been addressed in the PRs and issues linked above and from the responses I've posted. The package is now ready for v0.1.0 release (after going through the release checklist). 🚀 |
This PR is to provide a platform to review the entirety of the package.
This PR is unconventional as it is not intended for merging or for additional commits (unless minor) and instead comments will be converted to issues and these will be addressed in their own PRs.