Clarify `stat_max` #237

jamesmbaazam · 2024-03-22T18:11:53Z

This PR closes #193 by renaming stat_max to stat_threshold and improves the documentation to help clarify some nuances between stat_threshold in simulate_chains() and simulate_summary(). It also renames the infinite argument (old name of stat_max) in rborel() to censor_at and improves its documentation. In particular,

stat_threshold in simulate_chains() is a stopping criterion for chains. When the cumulative statistic of chains reaches or surpasses it, they end
stat_threshold in simulate_summary() is both a stopping criterion for chains and a censoring limit. When the cumulative statistics of chains reach or surpass it, they end. Additionally, chain statistic values that are >= stat_threshold are set to Inf.
censor_at in the borel functions is passed to stat_threshold in simulate_summary() as a stopping criterion and for censoring chain sizes but does not refer to chain sizes as used in simulate_summary(), hence the name change and dedicated documentation.

jamesmbaazam · 2024-05-02T11:57:44Z

The need to clarify this argument came up today in the following tutorial. A participant asked, "Why do some chains produce more cumulative cases than what stat_max is?" My answer was roughly, "Chains may still produce more cases than stat_max near the stopping criterion but will not continue to produce more cases after the stopping criterion is met. For example, if the stopping criterion is 10 cases, a chain may produce 12 cases near the stopping criterion." This raises the need to clarify stat_max further as it suggests that chains cannot produce outcomes above it.

@joshwlambert and @avallecam Do you have thoughts on how to clarify this further? I'm tagging you because we had a brief discussion about it after the tutorial session.

R/simulate.r

sbfnk

Why do some chains produce more cumulative cases than what stat_max is?

Would an alternative be to ensure it doesn't, i.e. select cases in the final generation such that the stat is stat_max?

R/borel.r

R/simulate.r

joshwlambert · 2024-05-03T10:06:43Z

Would an alternative be to ensure it doesn't, i.e. select cases in the final generation such that the stat is stat_max?

My preference would be to either 1) keep the name stat_max and make it a hard limit for the simulation so the user cannot exceed this number, or 2) change the argument name to replace max with another word that doesn't imply a hard limit and then improve documentation to how this works.

This is where I think the confusion stemmed from in the tutorial because it is a max that does not behave as a max.

joshwlambert · 2024-05-03T10:10:12Z

FYI we have the same functionality dilemma in {simulist} and have chosen to have a soft limit so the "maximum" can be exceeded and the function returns a warning to let the user know the "maximum" (either default 1e4 or user-specified) has been exceeded and states the number of cases returned.

https://github.com/epiverse-trace/simulist/blob/main/R/sim_linelist.R#L70

This discussion also makes it clear that the documentation in sim_linelist() could also be improved.

jamesmbaazam · 2024-05-03T10:30:05Z

Would an alternative be to ensure it doesn't, i.e. select cases in the final generation such that the stat is stat_max?

I think this might be the desirable behaviour. We could treat the last generation of offspring as potential offspring and sample the number needed to reach stat_max by finding the difference between the cumulative number before the last generation and stat_max.

jamesmbaazam · 2024-05-03T10:50:31Z

Would an alternative be to ensure it doesn't, i.e. select cases in the final generation such that the stat is stat_max?

My preference would be to either 1) keep the name stat_max and make it a hard limit for the simulation so the user cannot exceed this number, or 2) change the argument name to replace max with another word that doesn't imply a hard limit and then improve documentation to how this works.

This is where I think the confusion stemmed from in the tutorial because it is a max that does not behave as a max.

Thanks, Josh.

I think option 1 is better because it can be used to simulate desirable behavior. Allowing simulations to go beyond the limit set will make the results difficult to explain.

jamesmbaazam · 2024-05-03T10:53:23Z

FYI we have the same functionality dilemma in {simulist} and have chosen to have a soft limit so the "maximum" can be exceeded and the function returns a warning to let the user know the "maximum" (either default 1e4 or user-specified) has been exceeded and states the number of cases returned.

epiverse-trace/simulist@main/R/sim_linelist.R#L70

This discussion also makes it clear that the documentation in sim_linelist() could also be improved.

I may be misinterpreting your definition of "soft" but from the code & output, it does seem like you don't return values above the max, so I would still interpret it as a "hard" limit.

joshwlambert · 2024-05-03T11:00:49Z

I think option 1 is better because it can be used to simulate desirable behavior. Allowing simulations to go beyond the limit set will make the results difficult to explain.

Sounds good to me.

I may be misinterpreting your definition of "soft" but from the code & output, it does seem like you don't return values above the max, so I would still interpret it as a "hard" limit.

Here's a reprex to show what I mean. The outbreak_size argument takes a vector of two elements a hard lower limit, and a soft upper limit for the outbreak size.

set.seed(2)
library(simulist)
linelist <- sim_linelist(
  contact_distribution = function(x) dpois(x = x, lambda = 2),
  infect_period = function(x) dgamma(x = x, shape = 3, scale = 3),
  prob_infect = 0.55,
  onset_to_hosp = function(x) dgamma(x = x, shape = 2, scale = 2),
  onset_to_death = function(x) dgamma(x = x, shape = 2, scale = 2),
  outbreak_size = c(5, 20)
)
#> Warning: Number of cases exceeds maximum outbreak size. 
#> Returning data early with 29 cases and 44 total contacts (including cases).

^{Created on 2024-05-03 with reprex v2.1.0}

jamesmbaazam · 2024-05-03T11:08:29Z

I think option 1 is better because it can be used to simulate desirable behavior. Allowing simulations to go beyond the limit set will make the results difficult to explain.

Sounds good to me.

I may be misinterpreting your definition of "soft" but from the code & output, it does seem like you don't return values above the max, so I would still interpret it as a "hard" limit.

Here's a reprex to show what I mean. The outbreak_size argument takes a vector of two elements a hard lower limit, and a soft upper limit for the outbreak size.
set.seed(2)
library(simulist)
linelist <- sim_linelist(
  contact_distribution = function(x) dpois(x = x, lambda = 2),
  infect_period = function(x) dgamma(x = x, shape = 3, scale = 3),
  prob_infect = 0.55,
  onset_to_hosp = function(x) dgamma(x = x, shape = 2, scale = 2),
  onset_to_death = function(x) dgamma(x = x, shape = 2, scale = 2),
  outbreak_size = c(5, 20)
)
#> Warning: Number of cases exceeds maximum outbreak size. 
#> Returning data early with 29 cases and 44 total contacts (including cases).
Created on 2024-05-03 with reprex v2.1.0

Ah! Thanks for this. That makes sense. I don't think we should have a hard lower limit in {epichains} as it is by definition the seeing cases, if the outbreak doesn't take off. The soft upper limit is what is already implemented as stat_max, so it's currently a matter of documentation which can be solved with a name change and a proper wording.

My concern is more about whether it is desirable behaviour. For example, if a user sets stat_max to 500 cases and gets 800 cases, that's quite undesirable. I think the solution would be what I suggested in #237 (comment), where we sample the remainder from the excess cases and return exactly stat_max.

R/borel.r

jamesmbaazam · 2024-05-15T12:46:55Z

Upon further discussion with Seb, we've decided it might be better to rename stat_max to stat_threshold and improve the wording to indicate that it is not a hard limit and can be exceeded due to stochastic effects. The discussion about making it a hard limit has been logged in #243 for further brainstorming.

jamesmbaazam added the documentation Improvements or additions to documentation label Mar 22, 2024

jamesmbaazam marked this pull request as ready for review March 22, 2024 18:13

jamesmbaazam commented May 2, 2024

View reviewed changes

R/simulate.r Outdated Show resolved Hide resolved

sbfnk reviewed May 2, 2024

View reviewed changes

R/borel.r Outdated Show resolved Hide resolved

R/simulate.r Outdated Show resolved Hide resolved

jamesmbaazam force-pushed the change-truncation-to-censoring branch from b517fee to 1a20941 Compare May 7, 2024 13:11

jamesmbaazam commented May 7, 2024

View reviewed changes

R/borel.r Outdated Show resolved Hide resolved

jamesmbaazam mentioned this pull request May 8, 2024

Make chains stop at a hard limit using the stat_* argument #243

Open

jamesmbaazam force-pushed the change-truncation-to-censoring branch 2 times, most recently from 9e3627e to d7d32e9 Compare May 15, 2024 15:40

jamesmbaazam closed this May 15, 2024

jamesmbaazam reopened this May 15, 2024

jamesmbaazam and others added 10 commits May 15, 2024 17:02

Clarify stat_max

d208463

Rename infinite to stat_max in borel functions

400c1bf

Rename stat_max to censor_at for clarity

c919bbe

Fix documentation of pop and percent_immune

0d51d88

Fix return value in borel function

12ee08a

Clarify stat_max documentation

09c58c0

Regenerate Rd files

bfa6e71

Break long line

471f101

Fix test: replace stat_max with censor_at + break long line

55c7394

Add note about potential for stat > stat_max at ending point

3a68fc8

jamesmbaazam and others added 7 commits May 15, 2024 17:02

Inherit stat_max from simulate_summary not simulate_chains

eca3cc3

Improve documentation of censor_at

32136ac

Break long line

8f84d6a

Fix reference to old function post rebase

c00df42

Inherit stat_max definition from simulate_chain_stats()

50dbbcc

Rename stat_max to stat_threshold to reflect code

a4cc3c4

Automatic readme update

5e12628

jamesmbaazam force-pushed the change-truncation-to-censoring branch from 91920fb to 5e12628 Compare May 15, 2024 16:02

jamesmbaazam requested a review from sbfnk May 15, 2024 16:03

sbfnk approved these changes May 16, 2024

View reviewed changes

jamesmbaazam merged commit a86b53d into main May 16, 2024
9 checks passed

jamesmbaazam deleted the change-truncation-to-censoring branch May 16, 2024 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify `stat_max` #237

Clarify `stat_max` #237

jamesmbaazam commented Mar 22, 2024 •

edited

Loading

jamesmbaazam commented May 2, 2024 •

edited

Loading

sbfnk left a comment

joshwlambert commented May 3, 2024

joshwlambert commented May 3, 2024

jamesmbaazam commented May 3, 2024

jamesmbaazam commented May 3, 2024

jamesmbaazam commented May 3, 2024

joshwlambert commented May 3, 2024

jamesmbaazam commented May 3, 2024

jamesmbaazam commented May 15, 2024

Clarify stat_max #237

Clarify stat_max #237

Conversation

jamesmbaazam commented Mar 22, 2024 • edited Loading

jamesmbaazam commented May 2, 2024 • edited Loading

sbfnk left a comment

Choose a reason for hiding this comment

joshwlambert commented May 3, 2024

joshwlambert commented May 3, 2024

jamesmbaazam commented May 3, 2024

jamesmbaazam commented May 3, 2024

jamesmbaazam commented May 3, 2024

joshwlambert commented May 3, 2024

jamesmbaazam commented May 3, 2024

jamesmbaazam commented May 15, 2024

Clarify `stat_max` #237

Clarify `stat_max` #237

jamesmbaazam commented Mar 22, 2024 •

edited

Loading

jamesmbaazam commented May 2, 2024 •

edited

Loading