Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to reintroduce ... to pass arguments to branching process model #43

Open
joshwlambert opened this issue Jan 4, 2024 · 9 comments

Comments

@joshwlambert
Copy link
Member

As part of the full package review (#33) the dots (...) were removed from the simulation functions (sim_linelist(), sim_outbreak() and sim_contacts()), as they were not used and would warn if used (by chkDots()).

The dots could be added to the functions in order to pass arguments to the epidemiological simulation model used to simulate cases (currently bpmodels::chain_sim()). Currently it can only simulate with a Poisson offspring distribution.

The other option for this functionality is to include these parameters to toggle the simulation model within the config.

@joshwlambert
Copy link
Member Author

joshwlambert commented Apr 25, 2024

When this issue was written bpmodels::chain_sim() was being used internally to simulate the branching process. Since #60 and subsequent developments, {simulist} has implemented it's own branching process model.

The original idea was the allow several simulation engines to produce infections and outcomes which {simulist} would append clinical and other contextual information to, in order to produce line list and contact data. However, with the current setup, I think this approach would be difficult and complicate the user-interface for sim_linelist(), sim_contacts() and sim_outbreak() (which is where the current issue proposes using ... to pass arguments to these internal simulation functions).

My current idea to leave the currently exported simulation functions in the {simulist} using the internal branching process and instead add a new function to the package which takes the output of a simulation function, e.g. epichains::simulate_chains() or epidemics::model_default(), and convert them into line list data. This could happen by S3 dispatch if these functions output a recognisable class, or could match columns names, or other mechanism once I've given it some thought. This function could be called as_linelist(), convert_to_linelist(), or something similar.

This issue relates to #34 and if we can agree on which approach is optimal I will close one of these issues and put the development on the package roadmap (likely v0.5.0 after the package is on CRAN, see #100).

Tagging @CarmenTamayo, @Bisaloo, @adamkucharski and @chartgerink for input.

@chartgerink
Copy link
Member

Thanks for the tag @joshwlambert - I leave my comments for your consideration 😊

I would be inclined to agree with leaving the exported functions but would disagree with adding functionality to harmonize the output of other simulation functions at this time. simulist has a clearly defined scope: make simulating linelists easier.

Processing different ways of simulating linelist data into a harmonized format would be beyond that scope, from my understanding. Standardizing the output of other simulation models may certainly be helpful, but a scope expansion needs proportional justification in my experience.

In other words, what amount of work would such a harmonization implementation be estimated to be? Are the use cases concrete enough to warrant that? If these are up to now hypothetical use cases, I would especially caution against expanding scope.

If the scope is expanded, new questions also arise for me - if epichains output would be harmonized, why not utilize it as the base implementation to begin with, as it is the successor of bpmodels? How does epidemics come into play? Are there other packages whose output would be relevant to harmonize too then?


If I missed the mark, I will not be offended to get feedback on that and improve my understanding of simulist's role in the ecosystem 😊

@joshwlambert
Copy link
Member Author

Thanks for the input Chris. I agree that scope creep for this relatively vaguely defined feature needs to be guarded against. To quote from the initial package proposal by @CarmenTamayo:

This R package would provide tools to simulate epidemic's raw case data in the form of line lists, using model outputs, such as compartmental models, branching processes, or network models.

For the MVP of {simulist} we decided to use a single simulation model (bpmodels::chain_sim()), which was then removed for an internal implementation given evolving requirements.

Are the use cases concrete enough to warrant that?

Currently I do not have explicit cases that require the use of another simulation model (e.g. {epidemics}).

Having a method to convert output from models we do not intend to incorporate into our internal simulation model (e.g. fixed network models) might be useful, but it might also be worth waiting for someone to raise this feature request before developing it. I'm open to fixing the scope of {simulist} around a single internal simulation engine and making this explicit in the design-principles.Rmd vignette.

I'll wait for others tagged to input before making a final decision.

@CarmenTamayo
Copy link
Contributor

hi @joshwlambert thanks for tagging me in this conversation, as you mentioned on your previous response, the original idea for the package was to offer functionality for users to simulate linelist data from different model outputs, in fact initially I was using the output of an SIR model to simulate linelist data for the 100 days workshop, but I decided to use bpmodels instead because I also wanted contact data, which I could easily obtain with bpmodels.
I think in the future it would be beneficial for users to be able to simulate data using different methods to suit their needs and expertise, or to simulate a linelist easily from their own models as well
If this is too complex or not a priority for now, the current functionality is adequate for users- maybe we could focus on reaching out to potential users for them to test the package and provide feedback on whether other methods are needed and, if so, which would be the priority

@joshwlambert
Copy link
Member Author

Thanks for input and context @CarmenTamayo.

I think we're all in agreement that having the ability to utilise other epi simulation models to generate line list (and potentially contacts) data would be ideal. However, some points raised in the thread thus far are suggesting that we should not pursue generalising the package to other simulations until we have a stable version with a user base that can inform development direction and request the ability to use other models.

@chartgerink's points:

simulist has a clearly defined scope: make simulating linelists easier.

... but a scope expansion needs proportional justification in my experience.

If these are up to now hypothetical use cases, I would especially caution against expanding scope.

@CarmenTamayo' point:

we could focus on reaching out to potential users for them to test the package and provide feedback on whether other methods are needed and, if so, which would be the priority

I'll leave this issue open for a few more days in case anyone else has input. If not I will close this issue and tag it in in #34 where we can continue this discussion around other epi simulation models (given the initial scope of this issue was around reimplementing ...).

@joshwlambert
Copy link
Member Author

To answer an unresolved question raised

If the scope is expanded, new questions also arise for me - if epichains output would be harmonized, why not utilize it as the base implementation to begin with, as it is the successor of bpmodels? How does epidemics come into play? Are there other packages whose output would be relevant to harmonize too then?

The internal simulation model would not be directly replaced by {epichains} as this package has the same single-type branching process as {bpmodels} and is parameterised in the same way. Therefore, it does not have the network effect, or parameterised with the infectious period like the current simulist:::.sim_network_bp(). @jamesmbaazam please correct me if this is incorrect.

If the package were generalised to accept other simulation models, {epichains} would likely be the first package we would look to harmonise with, likely followed by {epidemics} to document to user how to use either a branching process or compartmental model. (Note neither of these would be taken on as dependencies, we would write methods that recognise the output without requiring the packages).

In terms of what other packages would be integrated I haven't given it much thought but it definitely needs to be considered as it might not scale well if lots of data wrangling code needs adding to {simulist} for compatibility with many other simulation packages.

@adamkucharski
Copy link
Member

Agree {epichains} would be a natural next option (I guess it's just a matter of aligning inputs, e.g. a data.frame with 'infector' and 'infectee' and timing of infection, that could feed into {simulist}?)

Once we get the upcoming round of packages released, will be good to get {ringbp} updated and ready for wider use, given its ability to implement contact tracing etc. (@sbfnk has created issues for the main edits required, which overlap quite a bit with {epichains})

@jamesmbaazam
Copy link
Member

The internal simulation model would not be directly replaced by {epichains} as this package has the same single-type branching process as {bpmodels} and is parameterised in the same way. Therefore, it does not have the network effect, or parameterised with the infectious period like the current simulist:::.sim_network_bp(). @jamesmbaazam please correct me if this is incorrect.

You're right. We don't implement the network effects nor use the infectious period parameterisation although it is being considered for a future version. See epiverse-trace/epichains#204.

@jamesmbaazam
Copy link
Member

If the scope is expanded, new questions also arise for me - if epichains output would be harmonized, why not utilize it as the base implementation to begin with, as it is the successor of bpmodels? How does epidemics come into play? Are there other packages whose output would be relevant to harmonize too then?

From a user perspective, when I've worked on analyses that required some kind of simulated linelist, we've started from a case time series that provides exact counts per date. This has then been disaggregated into a linelist with more variables to describe each individual. That is what I would want to use {simulist} for and that's where {epidemics} comes in because it can simulate case time series. An as_linelist_from_timeseries() function (not necessarily a method) could be provided here that takes in a time series with fixed names and extra arguments for customisation (names, age, sex, ct values, etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants