implement `dist_spec` interface #363

sbfnk · 2023-02-08T08:55:00Z

Updates the distribution interface to use the dist_spec S3 class. In the process also implements nonparametric distribution specifications (indicated _np vs. _p for parametric) and multiple generation times (e.g. for vector-bourne diseases). All fixed delay distributions are now calculated in R using the existing functionality in dist_skel.

Open questions:

is dist_spec a good title (maybe clearer as delay_dist or perhaps just dist)?
should mean, and sd for the lognormal be interpreted as the natural means/sd? I think this would be less confusing, and we could add logmean and logsd or mu and sigma as additional ways of specifying this in lognormal_dist_def, just as it is in gamma_dist_def.

This fixes #356 and #357. It should also pave the way to implement #313 in the future (as delay distributions can be named and could then be referenced by different observation specifications).

sbfnk · 2023-02-08T19:38:08Z

Remaining error is the one that would be fixed by #366 but this didn't occur before so it may point to an inadverted change to the model.

seabbs · 2023-02-08T20:44:49Z

Now all the explained things are resolved I'll have a proper look through + see if I can identify what is leading to the change in behaviour.

sbfnk · 2023-02-09T14:11:19Z

Now all the explained things are resolved I'll have a proper look through + see if I can identify what is leading to the change in behaviour.

Should be all fixed now.

github-actions · 2023-02-11T11:21:07Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 0043b32 is merged into develop:

:ballot_box_with_check:default: 48.6s -> 51.4s [NANaN%, NANaN%]
:ballot_box_with_check:no_delays: 30.9s -> 51s [NANaN%, NANaN%]
:ballot_box_with_check:random_walk: 13.4s -> 14.2s [NANaN%, NANaN%]
:ballot_box_with_check:stationary: 29.4s -> 31.9s [NANaN%, NANaN%]
:ballot_box_with_check:uncertain: 51s -> 48.5s [NANaN%, NANaN%]
Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions · 2023-02-11T13:59:00Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f6a0334 is merged into develop:

:ballot_box_with_check:default: 50.9s -> 58.5s [-0.09%, +30.03%]
❗🐌no_delays: 39.1s -> 54.5s [+24.63%, +53.86%]
❗🐌random_walk: 12s -> 16.1s [+20.95%, +47.79%]
:ballot_box_with_check:stationary: 31.6s -> 32.7s [-6.39%, +13.3%]
❗🐌uncertain: 45.3s -> 54.1s [+1.28%, +37.71%]
Further explanation regarding interpretation and methodology can be found in the documentation.

sbfnk · 2023-02-13T09:43:26Z

Looks like performance is worsening with this - will need to investigate.

github-actions · 2023-02-13T16:07:06Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f6a0334 is merged into develop:

❗🐌default: 46.6s -> 54.4s [+6.94%, +26.69%]
:ballot_box_with_check:no_delays: 39.1s -> 46.5s [-5.07%, +42.5%]
❗🐌random_walk: 13.7s -> 16.7s [+2.2%, +42.16%]
:ballot_box_with_check:stationary: 29.9s -> 32.9s [-2.43%, +22.41%]
❗🐌uncertain: 48.4s -> 59.4s [+14.31%, +31.07%]
Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions · 2023-02-13T18:12:53Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f6a0334 is merged into develop:

:ballot_box_with_check:default: 50.4s -> 53.6s [-10.81%, +23.67%]
❗🐌no_delays: 38.4s -> 54.1s [+28.3%, +53.05%]
❗🐌random_walk: 12.4s -> 15.4s [+16.17%, +33.24%]
:ballot_box_with_check:stationary: 30.8s -> 31.9s [-11.19%, +18.26%]
❗🐌uncertain: 46.4s -> 54.5s [+6.11%, +29.03%]
Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions · 2023-02-20T15:36:15Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f6a0334 is merged into develop:

:ballot_box_with_check:default: 49.9s -> 54.7s [-7.09%, +26.28%]
❗🐌no_delays: 37.4s -> 57.6s [+31.99%, +76.11%]
❗🐌random_walk: 11.9s -> 15.6s [+13.92%, +48.79%]
:ballot_box_with_check:stationary: 30.6s -> 32.3s [-4.48%, +15.97%]
:ballot_box_with_check:uncertain: 48.8s -> 55.1s [-3.14%, +29.27%]
Further explanation regarding interpretation and methodology can be found in the documentation.

seabbs · 2023-02-20T23:30:11Z

is dist_spec a good title (maybe clearer as delay_dist or perhaps just dist)?

I like dist and I can't think of a reason that using it would be a problem.

should mean, and sd for the lognormal be interpreted as the natural means/sd? I think this would be less confusing, and we could add logmean and logsd or mu and sigma as additional ways of specifying this in lognormal_dist_def, just as it is in gamma_dist_def.

Yes I think this is fine to add but it needs to be very clearly communicated that this change has happened or it will break lots of code. As I have said many times I think the default should be to use the actual parameters of the distribution as the default as this will limit the information loss in a standard modelling pipeline. This should also be the case for the gamma distribution as well really and perhaps we can make that transition in this release as well.

pdates the distribution interface to use the dist_spec S3 class. In the process also implements nonparametric distribution specifications (indicated _np vs. _p for parametric) and multiple generation times (e.g. for vector-bourne diseases). All fixed delay distributions are now calculated in R using the existing functionality in dist_skel.

This is all nice. I need to go through the code in detail and kick some tires (so far I have just sight read) but in principle I am a big fan of the proposed changes. As I mention in the comments below we could go further in unifying distributions across delays, generation times, and truncations etc and if we did this now it could save us pain in the future. I also really think we want to be combining fixed PMFs in R vs in stan now we have code to generate them as this will reduce model complexity and make it easier to test everything is working as expected (is this already happening? I think not but may have missed). epinowcast has a function we can pillage (https://github.com/epinowcast/epinowcast/blob/e101de003b2ad611245bb73c0697ecbb329398c4/R/model-module-helpers.R#L173) to do this (which was nicely refactored by @pearsonca and @pratikunterwegs who we should remember to credit as we steal their work)). Note the obvious that we could import this and use it directly but given the evolving nature of epinowcast and the simplicity of this function it doesn't seem like a good idea. I do think a light weight 📦 for this kind of job (and the convolution matrix etc) would be nice (so maybe it will be a epiverse project in a month or so 😉 ).

R/create.R

R/dist.R

R/estimate_infections.R

R/opts.R

inst/stan/data/delays.stan

inst/stan/data/observation_model.stan

inst/stan/data/simulation_delays.stan

inst/stan/estimate_infections.stan

inst/stan/functions/pmfs.stan

seabbs · 2023-06-12T22:38:05Z

@sbfnk waiting on you now I think. I have had to crank adapt delta on some of the unit tests which isn't a great sign in terms of numerical stability. Did you see that in your testing?

R/create.R

R/estimate_infections.R

R/get.R

R/opts.R

R/dist.R

sbfnk · 2023-06-13T06:43:30Z

@sbfnk waiting on you now I think. I have had to crank adapt delta on some of the unit tests which isn't a great sign in terms of numerical stability. Did you see that in your testing?

Not really - does this cause an error or just a warning somewhere? I've looked through this again and it looks good to me, apart of course from the performance hit.

sbfnk · 2023-06-13T06:44:48Z

I'm also very unhappy about the synthetic recovery of infections as that is definitely a new issue vs legacy. That indicates that a bug has crept in somewhere to me and that is not ideal.

I agree it's not ideal - though I think could also be down to the specific scenario I've used for testing this. Definitely worth exploring further.

Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>

seabbs · 2023-06-13T14:22:18Z

does this cause an error or just a warning somewhere?

The issue I was seeing is failure to recover the mean which appeared to be due to low effective sample size.

Right I think this is ready to merge if we are happy with the potential performance impact (which I am - hoping it gets resolved).

seabbs · 2023-06-14T08:19:49Z

LGTM - merging!

This reverts commit d0bea23.

* implement `dist_spec` interface * Add skipping of stan tests in the expected places * fix checks * fix epinow example * update syntax in more places * update another example * move parenthesis to the right place * update uncertainty in estimate_infections example * fix use of generation_time_opts as resource to call get_generation_time() * break line to make R CMD CHECK happy * fix uses of `trunc_opts` * fix updating of `cur_len` in ragged convolution * remove bounds on mean parameters * use dist_spec syntax in `estimate_delays` * add print function for `dist_spec` * add names to printing if given * clarify printouts * reduce unnecessary function calls * Revert "reduce unnecessary function calls" This reverts commit 4d7e5666ddf054652de5c18a060aabf161be0214. * fix typo * add default option for generation time * simplify delay inits * update pmf doc * dist -> distribution * fix max of np dist logic * simplify pmf truncation syntax * fix typos and use `is` * extract function for stan code conversion * fix variable name * fix function name * do truncnorm with appropriate lengths * fix initial condition sampling * update `to_stan` documentation * fix typo * stan model with unified delays * update R access to unified dist interface * update tests * ensure arrays are arrays * simplify stan seq (and avoid conflict with R) * fix test * fix simulation models * fix final tests * update usage of c -> + * Automatic readme update * update examples/doc and re-doc * linting * update docs * final requested lint * update return type of bootstrapped_dist_fit * redoc * update estimate_delay to reflect changes * dot product for all convolutions * report gt mean and var * bug fix in calculation of max delays * Automatic readme update * update tests * clean whitespace * reduce number of calculations by precomputing len * optional head/tail * Revert "optional head/tail" This reverts commit 8c59db1. * don't convolve first pmf * reduce vector copying * fix reversing * fix printing of combined distributions * add exampples, export, and add basic dist plotting * Automatic readme update * add some tests for dist_spec * add tests for +.dist_spec * add tests for mean.dist_spec * add some basic additional tests and docs * linting * fix linting * export c * fix plotting to work with c() method for dist_spec * more linting fixes * remove extract line in generation_time.stan * add a check in convolve_rev_pmf when len >= xlen + ylen and update tests * be more efficient when calc discrete pmfs * catch missing indexes in omf calc * clarify comments * add tolerance for +.dist_spec * don't load testthat * trigger benchmarking * remove benchmark trigger * linting * Automatic readme update * trigger benchmarking * remove benchmark trigger * refine tolerance checks for convolution * fix example * add an internal function * trigger benchmarking * benchmarking off * add back in missing tolerance docs * fix edge case check for length 1 pmfs * whitespace linting * test more carefully * use commas like a smart boy * crank that adapt delta handle * Update R/create.R Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk> * Update R/get.R Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk> * Update R/opts.R Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk> * Update R/dist.R Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk> * fixed @internal and brackets + fcase * don't export c.dist_spec * drop c() examle from plot --------- Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com> Co-authored-by: GitHub Action <action@github.com>

* implement `dist_spec` interface * Add skipping of stan tests in the expected places * fix checks * fix epinow example * update syntax in more places * update another example * move parenthesis to the right place * update uncertainty in estimate_infections example * fix use of generation_time_opts as resource to call get_generation_time() * break line to make R CMD CHECK happy * fix uses of `trunc_opts` * fix updating of `cur_len` in ragged convolution * remove bounds on mean parameters * use dist_spec syntax in `estimate_delays` * add print function for `dist_spec` * add names to printing if given * clarify printouts * reduce unnecessary function calls * Revert "reduce unnecessary function calls" This reverts commit 9037e21. * fix typo * add default option for generation time * simplify delay inits * update pmf doc * dist -> distribution * fix max of np dist logic * simplify pmf truncation syntax * fix typos and use `is` * extract function for stan code conversion * fix variable name * fix function name * do truncnorm with appropriate lengths * fix initial condition sampling * update `to_stan` documentation * fix typo * stan model with unified delays * update R access to unified dist interface * update tests * ensure arrays are arrays * simplify stan seq (and avoid conflict with R) * fix test * fix simulation models * fix final tests * update usage of c -> + * Automatic readme update * update examples/doc and re-doc * linting * update docs * final requested lint * update return type of bootstrapped_dist_fit * redoc * update estimate_delay to reflect changes * dot product for all convolutions * report gt mean and var * bug fix in calculation of max delays * Automatic readme update * update tests * clean whitespace * reduce number of calculations by precomputing len * optional head/tail * Revert "optional head/tail" This reverts commit 8c59db1. * don't convolve first pmf * reduce vector copying * fix reversing * fix printing of combined distributions * add exampples, export, and add basic dist plotting * Automatic readme update * add some tests for dist_spec * add tests for +.dist_spec * add tests for mean.dist_spec * add some basic additional tests and docs * linting * fix linting * export c * fix plotting to work with c() method for dist_spec * more linting fixes * remove extract line in generation_time.stan * add a check in convolve_rev_pmf when len >= xlen + ylen and update tests * be more efficient when calc discrete pmfs * catch missing indexes in omf calc * clarify comments * add tolerance for +.dist_spec * don't load testthat * trigger benchmarking * remove benchmark trigger * linting * Automatic readme update * trigger benchmarking * remove benchmark trigger * refine tolerance checks for convolution * fix example * add an internal function * trigger benchmarking * benchmarking off * add back in missing tolerance docs * fix edge case check for length 1 pmfs * whitespace linting * test more carefully * use commas like a smart boy * crank that adapt delta handle * Update R/create.R Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk> * Update R/get.R Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk> * Update R/opts.R Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk> * Update R/dist.R Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk> * fixed @internal and brackets + fcase * don't export c.dist_spec * drop c() examle from plot --------- Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com> Co-authored-by: GitHub Action <action@github.com>

seabbs changed the base branch from main to develop February 8, 2023 10:38

sbfnk marked this pull request as draft February 8, 2023 19:38

sbfnk marked this pull request as ready for review February 9, 2023 14:09

This was referenced Feb 17, 2023

Fit to multiple time series and fold in secondary model #313

Closed

deal with zeroes in generation times #370

Closed