Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distribution interface to dist_spec #504

Merged
merged 136 commits into from Mar 12, 2024
Merged

distribution interface to dist_spec #504

merged 136 commits into from Mar 12, 2024

Conversation

sbfnk
Copy link
Contributor

@sbfnk sbfnk commented Nov 10, 2023

The aim of this PR is to introduce a new distribution interface to replace the current dist_spec interface. This should make user interaction more intuitive and be more easily extensible if developers want to add more distributions (Weibull seems an obvious candidate). Eventually (following a deprecation cycle as dist_spec became part of a release necessitated by updates to rstan) dist_spec would become an internal function and the user would only interact with it via the distribution functions.

Examples of the new interface:

library("EpiNow2")
#> 
#> Attaching package: 'EpiNow2'
#> The following object is masked from 'package:stats':
#> 
#>     Gamma
Gamma(mean = 3, sd = 1)
#> - gamma distribution:
#>   shape:
#>     9
#>   rate:
#>     3
Gamma(mean = 3, sd = 1, max = 14)
#> - gamma distribution (max: 14):
#>   shape:
#>     9
#>   rate:
#>     3
Gamma(mean = Normal(mean = 3, sd = 1), sd = 4, max = 14)
#> Warning in new_dist_spec(params, "gamma"): Uncertain gamma distribution
#> specified in terms of parameters that are not the "natural" parameters of the
#> distribution (shape, rate). Converting using a crude and very approximate
#> method that is likely to produce biased results. If possible, it is preferable
#> to specify the distribution directly in terms of the natural parameters.
#> - gamma distribution (max: 14):
#>   shape:
#>     - normal distribution:
#>       mean:
#>         0.56
#>       sd:
#>         0.31
#>   rate:
#>     - normal distribution:
#>       mean:
#>         0.19
#>       sd:
#>         0.18
Gamma(shape = Normal(mean = 3, sd = 1), rate = Normal(mean = 2, sd = 0.5), max = 14)
#> - gamma distribution (max: 14):
#>   shape:
#>     - normal distribution:
#>       mean:
#>         3
#>       sd:
#>         1
#>   rate:
#>     - normal distribution:
#>       mean:
#>         2
#>       sd:
#>         0.5
Gamma(shape = 2, rate = 1, max = 14)
#> - gamma distribution (max: 14):
#>   shape:
#>     2
#>   rate:
#>     1
LogNormal(meanlog = 1.5, sdlog = 1)
#> - lognormal distribution:
#>   meanlog:
#>     1.5
#>   sdlog:
#>     1
LogNormal(mean = Normal(1.5, 0.1), sd = Normal(1, 0.1))
#> Warning in new_dist_spec(params, "lognormal"): Uncertain lognormal distribution
#> specified in terms of parameters that are not the "natural" parameters of the
#> distribution (meanlog, sdlog). Converting using a crude and very approximate
#> method that is likely to produce biased results. If possible, it is preferable
#> to specify the distribution directly in terms of the natural parameters.
#> - lognormal distribution:
#>   meanlog:
#>     - normal distribution:
#>       mean:
#>         0.22
#>       sd:
#>         0.043
#>   sdlog:
#>     - normal distribution:
#>       mean:
#>         0.61
#>       sd:
#>         0.071
pmf(c(0.2, 0.4, 0.4))
#> - nonparametric distribution
#>   PMF: [0.2 0.4 0.4]
pmf(c(0.2, 0.4, 0.2, 0.05))
#> - nonparametric distribution
#>   PMF: [0.24 0.47 0.24 0.059]

Created on 2024-01-11 with reprex v2.0.2

Somewhat sad to be masking base::gamma and stats::sd but perhaps unavoidable - or of course could name them differently.

To do following initial review:

  • finalise changes in the stan models to support delay distributions with arbitrary numbers of parameters
  • document all new functions
  • add appropriate input and other checks (e.g. everything passed to the stan model needs to have a max)
  • add warnings when using non-natural uncertain parameterisations
  • update tests
  • update all uses of dist_spec in examples / vignettes
  • update adjust_infection_to_report to use this and deprecate gamma_dist_def and lnorm_dist_def
  • write a brief guide for developers on how to add other distributions
  • add news item

Tests etc are expected to fail at initial state.

Closes #382
Closes #498

@sbfnk

This comment was marked as outdated.

@sbfnk

This comment was marked as outdated.

This comment was marked as outdated.

@sbfnk

This comment was marked as outdated.

@sbfnk sbfnk marked this pull request as ready for review November 27, 2023 15:22

This comment was marked as outdated.

seabbs

This comment was marked as outdated.

@sbfnk

This comment was marked as outdated.

@sbfnk

This comment was marked as outdated.

@seabbs

This comment was marked as outdated.

@sbfnk sbfnk force-pushed the dist-interface branch 2 times, most recently from 053f3e4 to 2135b4b Compare January 11, 2024 16:51
@sbfnk

This comment was marked as outdated.

@sbfnk

This comment was marked as outdated.

@sbfnk sbfnk requested a review from seabbs January 12, 2024 12:46
@sbfnk

This comment was marked as outdated.

This comment was marked as outdated.

@sbfnk

This comment was marked as outdated.

This comment was marked as outdated.

@sbfnk

This comment was marked as outdated.

@seabbs

This comment was marked as outdated.

@sbfnk

This comment was marked as outdated.

This comment was marked as outdated.

@sbfnk

This comment was marked as outdated.

@sbfnk

This comment was marked as outdated.

This comment was marked as outdated.

inst/dev/design_dist.md Outdated Show resolved Hide resolved
R/checks.R Outdated Show resolved Hide resolved
R/create.R Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
@seabbs seabbs enabled auto-merge (squash) March 12, 2024 10:52
@seabbs seabbs self-requested a review March 12, 2024 10:56
seabbs
seabbs previously approved these changes Mar 12, 2024
Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to be merged in after both another site read and a functionality review. A few minor comments outstanding.

I do think that any porting of this for use elsewhere needs to come with a refactor/redesign as the current implementation seems quite brittle (all the ifelse and magic string based detection of distributions).

I also expect there to be a few edge cases this review has missed that will need fixing once this is on main but we will see (and 🤞🏻) .

Nice work on this - a lot of work but worth it as much nicer to use!

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 3a3d8d8 is merged into main:

  • ✔️default: 32.9s -> 32.4s [-21.11%, +18.35%]
  • ✔️no_delays: 33.5s -> 1.92m [-280.87%, +768.08%]
  • ✔️random_walk: 9.05s -> 10.2s [-0.02%, +24.76%]
  • ✔️stationary: 19.4s -> 18.7s [-19.75%, +11.86%]
  • ✔️uncertain: 57.5s -> 59.6s [-31.31%, +38.81%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
@seabbs seabbs merged commit 19b5707 into main Mar 12, 2024
11 checks passed
@seabbs seabbs deleted the dist-interface branch March 12, 2024 13:54
Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 3a3d8d8 is merged into main:

  • ✔️default: 32.8s -> 32.2s [-16.71%, +13.3%]
  • ✔️no_delays: 34.9s -> 41.2s [-8.22%, +44.68%]
  • ✔️random_walk: 9.41s -> 9.35s [-15.52%, +14.39%]
  • ✔️stationary: 19.9s -> 20.3s [-24.85%, +28.5%]
  • ✔️uncertain: 55.5s -> 53.6s [-20.51%, +13.72%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@sbfnk sbfnk mentioned this pull request Mar 26, 2024
7 tasks
sbfnk added a commit that referenced this pull request May 3, 2024
* add distribution functions

* deprecate "empty" distribution

* make sd S3

* only generate samples if any params aren't natural

* update stan model with new dist interface

* update lognormal parameters

* return mean function to previous functionality

* update data

* deprecate dist_def functions

* use natural parametrisations in dist_def functions

* deprecate dist_spec

* extract_single_dist function

* update fix_dist to work with compsosite dists

* extract squash

* update parameters to extract

* specify lower bounds in function

* pass lower bounds to stan model

* update sample/report functions

* max squash adjust report

* update dist functions to new syntax

* re-create data

* update get_dist to new syntax

* fully deprecate get fnuctions

* create delay inits separately

* max squash again

* return correct dist in estimate_truncation

* few more examples/docs

* fix tests

* add documentation to dist interface

* add input checks

* sd function to work with composite dists

* warn when not using natural parameters

* ensure bounds are respected in stan

* add empty distribution for legacy reasons

* add checks to dist_skel

* use lapply for parameters

* don't calculate sd if length 1

* use uncertain reporting in example

* don't add one to sd

* return correct parameters

* dist_skel: calculate rate everywhere

* update dist_skel examples

* add missing man file

* don't run internal examples

* demote warning to message

* update syntax everywhere

* add news item

* turn sd into an internal function

* fix distribution documentation

* remove obselete default

* spell checking

* use correct sd function

* linting

* remove obsolete tests

* loop over all parameters

* update touchstone arguments

* linting

* fix regex search/replace gone wrong

* remove obsolete space

* update strategy for estimating uncertainty

* update uncertain parameter transformations

* add missing sd to parameter sampling

* update / recompile vignettes

* update var names

* rename argument in docs

* update man pages

* update test result

* add reviewer

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

* base scaling on variance, not sd

* re-render vignettes

* full text capitalisation of distributions

* separate dist_spec from stan model

* adjust tests/code for new dist_spec set up

* re-create examples

* re-doc

* update tests

* new dist_spec in estimate_truncation example

* update get_seeding_time with updated dist_spec

* estimate_truncation and seeding time tests

* update truncation dist in estimate_truncation

* remove more uses of old dist_spec

* SD explicitly to zero for fixed

* give names

* fix typo

* fix indent

* fix another typo

* squash bugs highlighted by tests

* remove missing variable

* linting

* add missing docs

* import transpose

* ensure sd is positive

* fix estimate_truncation example

* make tolerance user-settable

* use purrr::map instead of lapply

* fix stan dist test

* fix plotting

* Apply suggestions from code review

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

* rate and scale examples for Gamma

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

* capitalise gamma and lognormal

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

* change to single hash

* use bar in normal_cdf

* remove estraneous backticks

* remove space before left parenthesis

* split up dist.R

* move deprecated `dist_spec` function

* add examples

* initial design sketch

* make parameter conversion more flexible

* add test for alternative gama params

* update syntax in simulate_infections

* add missing tag

* update man pages

* update estimate_secondary tests

* update simulate_infections for new interface

* udpate snapshots

* get_dist deprecation test with natural params

* update phi syntax

* hide internal example

* update deprecations

* use toString

* pmf -> NonParametric

* add american spelling

* fix gamma deprecation

* add new functions to pkgdown

* update vignette

* recompile vignettes

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
sbfnk added a commit that referenced this pull request May 3, 2024
* add distribution functions

* deprecate "empty" distribution

* make sd S3

* only generate samples if any params aren't natural

* update stan model with new dist interface

* update lognormal parameters

* return mean function to previous functionality

* update data

* deprecate dist_def functions

* use natural parametrisations in dist_def functions

* deprecate dist_spec

* extract_single_dist function

* update fix_dist to work with compsosite dists

* extract squash

* update parameters to extract

* specify lower bounds in function

* pass lower bounds to stan model

* update sample/report functions

* max squash adjust report

* update dist functions to new syntax

* re-create data

* update get_dist to new syntax

* fully deprecate get fnuctions

* create delay inits separately

* max squash again

* return correct dist in estimate_truncation

* few more examples/docs

* fix tests

* add documentation to dist interface

* add input checks

* sd function to work with composite dists

* warn when not using natural parameters

* ensure bounds are respected in stan

* add empty distribution for legacy reasons

* add checks to dist_skel

* use lapply for parameters

* don't calculate sd if length 1

* use uncertain reporting in example

* don't add one to sd

* return correct parameters

* dist_skel: calculate rate everywhere

* update dist_skel examples

* add missing man file

* don't run internal examples

* demote warning to message

* update syntax everywhere

* add news item

* turn sd into an internal function

* fix distribution documentation

* remove obselete default

* spell checking

* use correct sd function

* linting

* remove obsolete tests

* loop over all parameters

* update touchstone arguments

* linting

* fix regex search/replace gone wrong

* remove obsolete space

* update strategy for estimating uncertainty

* update uncertain parameter transformations

* add missing sd to parameter sampling

* update / recompile vignettes

* update var names

* rename argument in docs

* update man pages

* update test result

* add reviewer

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

* base scaling on variance, not sd

* re-render vignettes

* full text capitalisation of distributions

* separate dist_spec from stan model

* adjust tests/code for new dist_spec set up

* re-create examples

* re-doc

* update tests

* new dist_spec in estimate_truncation example

* update get_seeding_time with updated dist_spec

* estimate_truncation and seeding time tests

* update truncation dist in estimate_truncation

* remove more uses of old dist_spec

* SD explicitly to zero for fixed

* give names

* fix typo

* fix indent

* fix another typo

* squash bugs highlighted by tests

* remove missing variable

* linting

* add missing docs

* import transpose

* ensure sd is positive

* fix estimate_truncation example

* make tolerance user-settable

* use purrr::map instead of lapply

* fix stan dist test

* fix plotting

* Apply suggestions from code review

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

* rate and scale examples for Gamma

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

* capitalise gamma and lognormal

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

* change to single hash

* use bar in normal_cdf

* remove estraneous backticks

* remove space before left parenthesis

* split up dist.R

* move deprecated `dist_spec` function

* add examples

* initial design sketch

* make parameter conversion more flexible

* add test for alternative gama params

* update syntax in simulate_infections

* add missing tag

* update man pages

* update estimate_secondary tests

* update simulate_infections for new interface

* udpate snapshots

* get_dist deprecation test with natural params

* update phi syntax

* hide internal example

* update deprecations

* use toString

* pmf -> NonParametric

* add american spelling

* fix gamma deprecation

* add new functions to pkgdown

* update vignette

* recompile vignettes

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

gamma_dist_def: scale is inverse scale? Reconsider distribution interface.
3 participants