Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement dist_spec interface #363

Merged
merged 106 commits into from
Jun 14, 2023
Merged

implement dist_spec interface #363

merged 106 commits into from
Jun 14, 2023

Conversation

sbfnk
Copy link
Contributor

@sbfnk sbfnk commented Feb 8, 2023

Updates the distribution interface to use the dist_spec S3 class. In the process also implements nonparametric distribution specifications (indicated _np vs. _p for parametric) and multiple generation times (e.g. for vector-bourne diseases). All fixed delay distributions are now calculated in R using the existing functionality in dist_skel.

Open questions:

  • is dist_spec a good title (maybe clearer as delay_dist or perhaps just dist)?
  • should mean, and sd for the lognormal be interpreted as the natural means/sd? I think this would be less confusing, and we could add logmean and logsd or mu and sigma as additional ways of specifying this in lognormal_dist_def, just as it is in gamma_dist_def.

This fixes #356 and #357. It should also pave the way to implement #313 in the future (as delay distributions can be named and could then be referenced by different observation specifications).

@seabbs seabbs changed the base branch from main to develop February 8, 2023 10:38
@sbfnk
Copy link
Contributor Author

sbfnk commented Feb 8, 2023

Remaining error is the one that would be fixed by #366 but this didn't occur before so it may point to an inadverted change to the model.

@sbfnk sbfnk marked this pull request as draft February 8, 2023 19:38
@seabbs
Copy link
Contributor

seabbs commented Feb 8, 2023

Now all the explained things are resolved I'll have a proper look through + see if I can identify what is leading to the change in behaviour.

@sbfnk sbfnk marked this pull request as ready for review February 9, 2023 14:09
@sbfnk
Copy link
Contributor Author

sbfnk commented Feb 9, 2023

Now all the explained things are resolved I'll have a proper look through + see if I can identify what is leading to the change in behaviour.

Should be all fixed now.

@github-actions
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 0043b32 is merged into develop:

  •   :ballot_box_with_check:default: 48.6s -> 51.4s [NANaN%, NANaN%]
  •   :ballot_box_with_check:no_delays: 30.9s -> 51s [NANaN%, NANaN%]
  •   :ballot_box_with_check:random_walk: 13.4s -> 14.2s [NANaN%, NANaN%]
  •   :ballot_box_with_check:stationary: 29.4s -> 31.9s [NANaN%, NANaN%]
  •   :ballot_box_with_check:uncertain: 51s -> 48.5s [NANaN%, NANaN%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@github-actions
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f6a0334 is merged into develop:

  •   :ballot_box_with_check:default: 50.9s -> 58.5s [-0.09%, +30.03%]
  • ❗🐌no_delays: 39.1s -> 54.5s [+24.63%, +53.86%]
  • ❗🐌random_walk: 12s -> 16.1s [+20.95%, +47.79%]
  •   :ballot_box_with_check:stationary: 31.6s -> 32.7s [-6.39%, +13.3%]
  • ❗🐌uncertain: 45.3s -> 54.1s [+1.28%, +37.71%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@sbfnk
Copy link
Contributor Author

sbfnk commented Feb 13, 2023

Looks like performance is worsening with this - will need to investigate.

@github-actions
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f6a0334 is merged into develop:

  • ❗🐌default: 46.6s -> 54.4s [+6.94%, +26.69%]
  •   :ballot_box_with_check:no_delays: 39.1s -> 46.5s [-5.07%, +42.5%]
  • ❗🐌random_walk: 13.7s -> 16.7s [+2.2%, +42.16%]
  •   :ballot_box_with_check:stationary: 29.9s -> 32.9s [-2.43%, +22.41%]
  • ❗🐌uncertain: 48.4s -> 59.4s [+14.31%, +31.07%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@github-actions
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f6a0334 is merged into develop:

  •   :ballot_box_with_check:default: 50.4s -> 53.6s [-10.81%, +23.67%]
  • ❗🐌no_delays: 38.4s -> 54.1s [+28.3%, +53.05%]
  • ❗🐌random_walk: 12.4s -> 15.4s [+16.17%, +33.24%]
  •   :ballot_box_with_check:stationary: 30.8s -> 31.9s [-11.19%, +18.26%]
  • ❗🐌uncertain: 46.4s -> 54.5s [+6.11%, +29.03%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@github-actions
Copy link
Contributor

This is how benchmark results would change (along with a 95% confidence interval in relative change) if f6a0334 is merged into develop:

  •   :ballot_box_with_check:default: 49.9s -> 54.7s [-7.09%, +26.28%]
  • ❗🐌no_delays: 37.4s -> 57.6s [+31.99%, +76.11%]
  • ❗🐌random_walk: 11.9s -> 15.6s [+13.92%, +48.79%]
  •   :ballot_box_with_check:stationary: 30.6s -> 32.3s [-4.48%, +15.97%]
  •   :ballot_box_with_check:uncertain: 48.8s -> 55.1s [-3.14%, +29.27%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@seabbs
Copy link
Contributor

seabbs commented Feb 20, 2023

is dist_spec a good title (maybe clearer as delay_dist or perhaps just dist)?

I like dist and I can't think of a reason that using it would be a problem.

should mean, and sd for the lognormal be interpreted as the natural means/sd? I think this would be less confusing, and we could add logmean and logsd or mu and sigma as additional ways of specifying this in lognormal_dist_def, just as it is in gamma_dist_def.

Yes I think this is fine to add but it needs to be very clearly communicated that this change has happened or it will break lots of code. As I have said many times I think the default should be to use the actual parameters of the distribution as the default as this will limit the information loss in a standard modelling pipeline. This should also be the case for the gamma distribution as well really and perhaps we can make that transition in this release as well.

pdates the distribution interface to use the dist_spec S3 class. In the process also implements nonparametric distribution specifications (indicated _np vs. _p for parametric) and multiple generation times (e.g. for vector-bourne diseases). All fixed delay distributions are now calculated in R using the existing functionality in dist_skel.

This is all nice. I need to go through the code in detail and kick some tires (so far I have just sight read) but in principle I am a big fan of the proposed changes. As I mention in the comments below we could go further in unifying distributions across delays, generation times, and truncations etc and if we did this now it could save us pain in the future. I also really think we want to be combining fixed PMFs in R vs in stan now we have code to generate them as this will reduce model complexity and make it easier to test everything is working as expected (is this already happening? I think not but may have missed). epinowcast has a function we can pillage (https://github.com/epinowcast/epinowcast/blob/e101de003b2ad611245bb73c0697ecbb329398c4/R/model-module-helpers.R#L173) to do this (which was nicely refactored by @pearsonca and @pratikunterwegs who we should remember to credit as we steal their work)). Note the obvious that we could import this and use it directly but given the evolving nature of epinowcast and the simplicity of this function it doesn't seem like a good idea. I do think a light weight 📦 for this kind of job (and the convolution matrix etc) would be nice (so maybe it will be a epiverse project in a month or so 😉 ).

R/create.R Outdated Show resolved Hide resolved
R/dist.R Outdated Show resolved Hide resolved
R/dist.R Outdated Show resolved Hide resolved
R/dist.R Show resolved Hide resolved
R/dist.R Show resolved Hide resolved
R/opts.R Outdated Show resolved Hide resolved
R/opts.R Outdated Show resolved Hide resolved
R/opts.R Outdated Show resolved Hide resolved
@seabbs
Copy link
Contributor

seabbs commented Jun 12, 2023

@sbfnk waiting on you now I think. I have had to crank adapt delta on some of the unit tests which isn't a great sign in terms of numerical stability. Did you see that in your testing?

R/create.R Show resolved Hide resolved
R/estimate_infections.R Outdated Show resolved Hide resolved
R/get.R Show resolved Hide resolved
R/opts.R Outdated Show resolved Hide resolved
R/dist.R Outdated Show resolved Hide resolved
R/dist.R Outdated Show resolved Hide resolved
@sbfnk
Copy link
Contributor Author

sbfnk commented Jun 13, 2023

@sbfnk waiting on you now I think. I have had to crank adapt delta on some of the unit tests which isn't a great sign in terms of numerical stability. Did you see that in your testing?

Not really - does this cause an error or just a warning somewhere? I've looked through this again and it looks good to me, apart of course from the performance hit.

@sbfnk
Copy link
Contributor Author

sbfnk commented Jun 13, 2023

I'm also very unhappy about the synthetic recovery of infections as that is definitely a new issue vs legacy. That indicates that a bug has crept in somewhere to me and that is not ideal.

I agree it's not ideal - though I think could also be down to the specific scenario I've used for testing this. Definitely worth exploring further.

seabbs and others added 6 commits June 13, 2023 08:16
Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>
Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>
Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>
Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>
@seabbs
Copy link
Contributor

seabbs commented Jun 13, 2023

does this cause an error or just a warning somewhere?

The issue I was seeing is failure to recover the mean which appeared to be due to low effective sample size.

Right I think this is ready to merge if we are happy with the potential performance impact (which I am - hoping it gets resolved).

@seabbs
Copy link
Contributor

seabbs commented Jun 14, 2023

LGTM - merging!

@seabbs seabbs merged commit d0bea23 into main Jun 14, 2023
10 of 11 checks passed
@seabbs seabbs deleted the update_dist_interface branch June 14, 2023 08:19
@sbfnk sbfnk mentioned this pull request Jul 3, 2023
sbfnk added a commit that referenced this pull request Jul 17, 2023
sbfnk added a commit that referenced this pull request May 3, 2024
* implement `dist_spec` interface

* Add skipping of stan tests in the expected places

* fix checks

* fix epinow example

* update syntax in more places

* update another example

* move parenthesis to the right place

* update uncertainty in estimate_infections example

* fix use of generation_time_opts as resource to call get_generation_time()

* break line to make R CMD CHECK happy

* fix uses of `trunc_opts`

* fix updating of `cur_len` in ragged convolution

* remove bounds on mean parameters

* use dist_spec syntax in `estimate_delays`

* add print function for `dist_spec`

* add names to printing if given

* clarify printouts

* reduce unnecessary function calls

* Revert "reduce unnecessary function calls"

This reverts commit 4d7e5666ddf054652de5c18a060aabf161be0214.

* fix typo

* add default option for generation time

* simplify delay inits

* update pmf doc

* dist -> distribution

* fix max of np dist logic

* simplify pmf truncation syntax

* fix typos and use `is`

* extract function for stan code conversion

* fix variable name

* fix function name

* do truncnorm with appropriate lengths

* fix initial condition sampling

* update `to_stan` documentation

* fix typo

* stan model with unified delays

* update R access to unified dist interface

* update tests

* ensure arrays are arrays

* simplify stan seq (and avoid conflict with R)

* fix test

* fix simulation models

* fix final tests

* update usage of c -> +

* Automatic readme update

* update examples/doc and re-doc

* linting

* update docs

* final requested lint

* update return type of bootstrapped_dist_fit

* redoc

* update estimate_delay to reflect changes

* dot product for all convolutions

* report gt mean and var

* bug fix in calculation of max delays

* Automatic readme update

* update tests

* clean whitespace

* reduce number of calculations by precomputing len

* optional head/tail

* Revert "optional head/tail"

This reverts commit 8c59db1.

* don't convolve first pmf

* reduce vector copying

* fix reversing

* fix printing of combined distributions

* add exampples, export, and add basic dist plotting

* Automatic readme update

* add some tests for dist_spec

* add tests for +.dist_spec

* add tests for mean.dist_spec

* add some basic additional tests and docs

* linting

* fix linting

* export c

* fix plotting to work with c() method for dist_spec

* more linting fixes

* remove extract line in generation_time.stan

* add a check in convolve_rev_pmf when len >= xlen + ylen and update tests

* be more efficient when calc discrete pmfs

* catch missing indexes in omf calc

* clarify comments

* add tolerance for +.dist_spec

* don't load testthat

* trigger benchmarking

* remove benchmark trigger

* linting

* Automatic readme update

* trigger benchmarking

* remove benchmark trigger

* refine tolerance checks for convolution

* fix example

* add an internal function

* trigger benchmarking

* benchmarking off

* add back in missing tolerance docs

* fix edge case check for length 1 pmfs

* whitespace linting

* test more carefully

* use commas like a smart boy

* crank that adapt delta handle

* Update R/create.R

Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>

* Update R/get.R

Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>

* Update R/opts.R

Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>

* Update R/dist.R

Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>

* fixed @internal and brackets + fcase

* don't export c.dist_spec

* drop c() examle from plot

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Co-authored-by: GitHub Action <action@github.com>
sbfnk added a commit that referenced this pull request May 3, 2024
* implement `dist_spec` interface

* Add skipping of stan tests in the expected places

* fix checks

* fix epinow example

* update syntax in more places

* update another example

* move parenthesis to the right place

* update uncertainty in estimate_infections example

* fix use of generation_time_opts as resource to call get_generation_time()

* break line to make R CMD CHECK happy

* fix uses of `trunc_opts`

* fix updating of `cur_len` in ragged convolution

* remove bounds on mean parameters

* use dist_spec syntax in `estimate_delays`

* add print function for `dist_spec`

* add names to printing if given

* clarify printouts

* reduce unnecessary function calls

* Revert "reduce unnecessary function calls"

This reverts commit 9037e21.

* fix typo

* add default option for generation time

* simplify delay inits

* update pmf doc

* dist -> distribution

* fix max of np dist logic

* simplify pmf truncation syntax

* fix typos and use `is`

* extract function for stan code conversion

* fix variable name

* fix function name

* do truncnorm with appropriate lengths

* fix initial condition sampling

* update `to_stan` documentation

* fix typo

* stan model with unified delays

* update R access to unified dist interface

* update tests

* ensure arrays are arrays

* simplify stan seq (and avoid conflict with R)

* fix test

* fix simulation models

* fix final tests

* update usage of c -> +

* Automatic readme update

* update examples/doc and re-doc

* linting

* update docs

* final requested lint

* update return type of bootstrapped_dist_fit

* redoc

* update estimate_delay to reflect changes

* dot product for all convolutions

* report gt mean and var

* bug fix in calculation of max delays

* Automatic readme update

* update tests

* clean whitespace

* reduce number of calculations by precomputing len

* optional head/tail

* Revert "optional head/tail"

This reverts commit 8c59db1.

* don't convolve first pmf

* reduce vector copying

* fix reversing

* fix printing of combined distributions

* add exampples, export, and add basic dist plotting

* Automatic readme update

* add some tests for dist_spec

* add tests for +.dist_spec

* add tests for mean.dist_spec

* add some basic additional tests and docs

* linting

* fix linting

* export c

* fix plotting to work with c() method for dist_spec

* more linting fixes

* remove extract line in generation_time.stan

* add a check in convolve_rev_pmf when len >= xlen + ylen and update tests

* be more efficient when calc discrete pmfs

* catch missing indexes in omf calc

* clarify comments

* add tolerance for +.dist_spec

* don't load testthat

* trigger benchmarking

* remove benchmark trigger

* linting

* Automatic readme update

* trigger benchmarking

* remove benchmark trigger

* refine tolerance checks for convolution

* fix example

* add an internal function

* trigger benchmarking

* benchmarking off

* add back in missing tolerance docs

* fix edge case check for length 1 pmfs

* whitespace linting

* test more carefully

* use commas like a smart boy

* crank that adapt delta handle

* Update R/create.R

Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>

* Update R/get.R

Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>

* Update R/opts.R

Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>

* Update R/dist.R

Co-authored-by: Sebastian Funk <sebastian.funk@lshtm.ac.uk>

* fixed @internal and brackets + fcase

* don't export c.dist_spec

* drop c() examle from plot

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Co-authored-by: GitHub Action <action@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distribution interface
4 participants