Issue 328: option to remove within-chain parallelisation #366

sbfnk · 2023-11-15T12:44:50Z

Description

This PR closes #328.

This removes within-chain parallelisation by calling likelihood functions directly. Looking for touchstone benchmarking results here (without multi-threading disabled) and will test locally with multi-threading enabled. If there are performance improvements at any stage will add an option for the user to disable within-chain parallelisation.

This enables compilation with multithreading by default (as this did not seem to negatively affect performance) and adds an explicit threads_per_chain option that can be used to enable within-chain parallelisation (if set to >1). If it is set to 1 (default) then the likelihood will be calculated directly and without using reduce_sum.

Checklist

github-actions · 2023-11-15T13:58:38Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if d4a65aa is merged into main:

:rocket:day_of_week_model: 26.3s -> 20s [-28.51%, -18.91%]
:rocket:latent_renewal_model: 30.6s -> 23.7s [-29.46%, -15.41%]
:rocket:missingness_model: 1.35m -> 1.31m [-5.04%, -0.8%]
:ballot_box_with_check:multi_group_latent_renewal_model: 7.32s -> 6.5s [-26.32%, +4.11%]
:ballot_box_with_check:preprocessing: 706ms -> 714ms [-2.11%, +4.31%]
:rocket:simple_model: 5.85s -> 3.88s [-48.4%, -18.96%]
:ballot_box_with_check:simple_negbin_model_with_pp: 5.58s -> 4.14s [-56.71%, +5.12%]
These benchmarks are based on package examples which are available here. Further explanation regarding interpretation and methodology can be found in the documentation of touchstone.

sbfnk · 2023-11-15T15:42:56Z

Running everything locally (4 cores) with threads = TRUE and 2 chains:

`threads_per_chain = 2, parallel_chains = 2`

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 1e8ad04 is merged into main:

:ballot_box_with_check:day_of_week_model: 33.6s -> 29.4s [-35.13%, +10.08%]
:ballot_box_with_check:latent_renewal_model: 36.3s -> 45.3s [-3.59%, +53.51%]
:ballot_box_with_check:missingness_model: 2.23m -> 2.37m [-34.56%, +46.87%]
:ballot_box_with_check:multi_group_latent_renewal_model: 13.2s -> 9.73s [-59.75%, +6.77%]
:ballot_box_with_check:preprocessing: 1.13s -> 902ms [-93.86%, +53.02%]
:ballot_box_with_check:simple_model: 5.75s -> 4.13s [-103.93%, +47.47%]
:ballot_box_with_check:simple_negbin_model_with_pp: 7.23s -> 6.53s [-49.44%, +30.1%]

`threads_per_chain = 4, parallel_chains = 1`:

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 1e8ad04 is merged into main:

:ballot_box_with_check:day_of_week_model: 35.8s -> 38.5s [-57.93%, +72.65%]
:ballot_box_with_check:latent_renewal_model: 43.6s -> 49.6s [-40.98%, +68.36%]
:ballot_box_with_check:missingness_model: 2.89m -> 2.76m [-72.64%, +63.87%]
:ballot_box_with_check:multi_group_latent_renewal_model: 13.8s -> 14.4s [-91.09%, +99.42%]
:ballot_box_with_check:preprocessing: 1.09s -> 921ms [-64.25%, +33.29%]
:ballot_box_with_check:simple_model: 5.75s -> 4.34s [-72.54%, +23.62%]
:ballot_box_with_check:simple_negbin_model_with_pp: 7.5s -> 6.79s [-36.25%, +17.57%]

I would conclude that there is a case for an option to remove within-chain parallelisation (perhaps even as a default?), unless I'm missing something on the kind of scenario where we're expecting to see improvement.

seabbs · 2023-11-15T15:47:31Z

I would conclude that there is a case for an option to remove within-chain parallelisation (perhaps even as a default?), unless I'm missing something on the kind of scenario where we're expecting to see improvement.

Yes, I agree with both. I think we should expect to see improvement for long running models (such as those in the germany example vignette) where setup costs are a smaller proportion of the total run time (unless reduce_sum has some kind of resource requirement that also scales with number of groups?).

The other question is whether compiling the model with threads = TRUE and then not using that functionality has any performance impact? If it doesn't we could always compile this way and then the user doesn't need to recompile to turn on multi-threading but just change around a few options.

I think the slightly more complex question is what gives you more effective samples per second if you have a fixed CPU budget (due to the warmup) and are happy to run only a few chains (i.e. 2) but have many cores.

sbfnk · 2023-11-15T15:53:01Z

The other question is whether compiling the model with threads = TRUE and then not using that functionality has any performance impact? If it doesn't we could always compile this way and then the user doesn't need to recompile to turn on multi-threading but just change around a few options.

On that note is there a particular reason the models are all compiled with threads=FALSE for the touchstone benchmarks?

I think the slightly more complex question is what gives you more effective samples per second if you have a fixed CPU budget (due to the warmup) and are happy to run only a few chains (i.e. 2) but have many cores.

Yes, and then there's also potentially https://mc-stan.org/cmdstanr/articles/opencl.html

github-actions · 2023-11-17T10:47:07Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if ef6c955 is merged into main:

:rocket:day_of_week_model: 26.4s -> 19.5s [-30.05%, -22.1%]
:rocket:latent_renewal_model: 29.6s -> 25.6s [-23.73%, -3.56%]
:ballot_box_with_check:missingness_model: 1.3m -> 1.31m [-2.17%, +4.59%]
:rocket:multi_group_latent_renewal_model: 7.57s -> 6.14s [-24.39%, -13.36%]
:ballot_box_with_check:preprocessing: 676ms -> 681ms [-0.65%, +2.04%]
:ballot_box_with_check:simple_model: 6.51s -> 5.98s [-39.58%, +23.04%]
:ballot_box_with_check:simple_negbin_model_with_pp: 5.49s -> 4.08s [-58.24%, +6.99%]
These benchmarks are based on package examples which are available here. Further explanation regarding interpretation and methodology can be found in the documentation of touchstone.

codecov · 2023-11-17T12:06:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (60495c7) 96.85% compared to head (438ab0b) 96.85%.
Report is 1 commits behind head on main.

❗ Current head 438ab0b differs from pull request most recent head 4ef9b86. Consider uploading reports for the commit 4ef9b86 to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #366   +/-   ##
=======================================
  Coverage   96.85%   96.85%           
=======================================
  Files          15       15           
  Lines        1875     1876    +1     
=======================================
+ Hits         1816     1817    +1     
  Misses         59       59

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2023-11-17T13:02:20Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if ef6c955 is merged into main:

:rocket:day_of_week_model: 26s -> 19.9s [-27.29%, -19.71%]
:rocket:latent_renewal_model: 28.9s -> 23s [-33.68%, -7.08%]
:ballot_box_with_check:missingness_model: 1.32m -> 1.32m [-4.87%, +5.34%]
:rocket:multi_group_latent_renewal_model: 7.72s -> 6.13s [-30.43%, -10.71%]
:ballot_box_with_check:preprocessing: 673ms -> 674ms [-1.18%, +1.36%]
:ballot_box_with_check:simple_model: 5.28s -> 4.51s [-37.24%, +8.11%]
:ballot_box_with_check:simple_negbin_model_with_pp: 4.94s -> 4.09s [-37.7%, +3.5%]
These benchmarks are based on package examples which are available here. Further explanation regarding interpretation and methodology can be found in the documentation of touchstone.

seabbs

Nice - thanks @sbfnk this looks great. I think we can simplify how we approach the if else setup in the stan code and potentially slightly better communicate exactly what has happened in this PR.

R/model-modules.R

README.Rmd

inst/stan/epinowcast.stan

NEWS.md

tests/testthat/test-epinowcast.R

vignettes/germany-age-stratified-nowcasting.Rmd.orig

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>

github-actions · 2023-11-17T15:20:29Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 60495c7 is merged into main:

:rocket:day_of_week_model: 26.2s -> 19.7s [-33.19%, -16.6%]
:rocket:latent_renewal_model: 29.7s -> 24.4s [-34.1%, -1.84%]
:ballot_box_with_check:missingness_model: 1.33m -> 1.31m [-5.31%, +2.91%]
:rocket:multi_group_latent_renewal_model: 7.85s -> 5.8s [-42.26%, -10.08%]
:ballot_box_with_check:preprocessing: 691ms -> 690ms [-1.98%, +1.7%]
:rocket:simple_model: 5.83s -> 4.1s [-54.24%, -4.89%]
:ballot_box_with_check:simple_negbin_model_with_pp: 6.59s -> 4.08s [-87.73%, +11.65%]
These benchmarks are based on package examples which are available here. Further explanation regarding interpretation and methodology can be found in the documentation of touchstone.

github-actions · 2023-11-20T11:01:22Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if d5798f5 is merged into main:

:rocket:day_of_week_model: 26.1s -> 20.3s [-26.85%, -17.42%]
:rocket:latent_renewal_model: 28.4s -> 22.7s [-27.35%, -12.65%]
:ballot_box_with_check:missingness_model: 1.43m -> 1.3m [-30.2%, +12.16%]
:rocket:multi_group_latent_renewal_model: 7.66s -> 5.98s [-28.03%, -15.95%]
:ballot_box_with_check:preprocessing: 673ms -> 674ms [-3.04%, +3.32%]
:ballot_box_with_check:simple_model: 5.42s -> 5.04s [-23.08%, +9.1%]
:ballot_box_with_check:simple_negbin_model_with_pp: 5.44s -> 4.37s [-61.43%, +22.17%]
These benchmarks are based on package examples which are available here. Further explanation regarding interpretation and methodology can be found in the documentation of touchstone.

sbfnk · 2023-11-20T11:13:22Z

One thing we haven't done is checking whether manually tuning the grain size makes any difference. But that's for another time perhaps.

seabbs

LGMT. We can ignore the linting issues as unrelated to this PR and resolved in #382. Thanks again for this @sbfnk!

seabbs

LGMT. We can ignore the linting issues as unrelated to this PR and resolved in #382. Thanks again for this @sbfnk!

sbfnk marked this pull request as ready for review November 17, 2023 12:12

sbfnk requested a review from seabbs November 17, 2023 12:12

sbfnk added 13 commits November 17, 2023 13:06

remove within-chain parallelisation

7ca4b4c

option to parallelise likelihood

94c24c1

update touchstone to enable multithreading

bb3089a

default threads to TRUE

82ac1f1

remove precompilation section from README

29b3a0f

clarify when multithreading is done

9846978

update test to reflect changed default for threads

b35c7e8

remove multi-threading arguments from touchstone

bf3ad41

remove explicit multi-threading from vignettes

d73307a

remove explicit multithreading from examples

67c657c

add tests

725bd5d

add news item

6872b95

update snapshot

4e24659

seabbs force-pushed the no-reduce-sum branch from 5347176 to 4e24659 Compare November 17, 2023 13:06

seabbs and others added 2 commits November 17, 2023 13:16

Update NEWS.md

e0072fd

fix test for within-chain parallelisation

6b37cbb

seabbs requested changes Nov 17, 2023

View reviewed changes

sbfnk and others added 4 commits November 17, 2023 14:13

apply suggestions from code review

5e05e05

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>

remove aggregation option when not parallelising

540064d

clarify performance improvement in news

81f304a

add missing argument name

6db6f57

update vignettes

4ef9b86

sbfnk requested a review from seabbs November 20, 2023 11:13

seabbs reviewed Nov 20, 2023

View reviewed changes

seabbs approved these changes Nov 20, 2023

View reviewed changes

seabbs added this pull request to the merge queue Nov 20, 2023

Merged via the queue into epinowcast:main with commit fe3529a Nov 20, 2023
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 328: option to remove within-chain parallelisation #366

Issue 328: option to remove within-chain parallelisation #366

sbfnk commented Nov 15, 2023 •

edited

github-actions bot commented Nov 15, 2023

sbfnk commented Nov 15, 2023 •

edited

seabbs commented Nov 15, 2023 •

edited

sbfnk commented Nov 15, 2023

github-actions bot commented Nov 17, 2023

codecov bot commented Nov 17, 2023 •

edited

github-actions bot commented Nov 17, 2023

seabbs left a comment

github-actions bot commented Nov 17, 2023

github-actions bot commented Nov 20, 2023

sbfnk commented Nov 20, 2023

seabbs left a comment

seabbs left a comment

Issue 328: option to remove within-chain parallelisation #366

Issue 328: option to remove within-chain parallelisation #366

Conversation

sbfnk commented Nov 15, 2023 • edited

Description

Checklist

github-actions bot commented Nov 15, 2023

sbfnk commented Nov 15, 2023 • edited

threads_per_chain = 2, parallel_chains = 2

threads_per_chain = 4, parallel_chains = 1:

seabbs commented Nov 15, 2023 • edited

sbfnk commented Nov 15, 2023

github-actions bot commented Nov 17, 2023

codecov bot commented Nov 17, 2023 • edited

Codecov Report

github-actions bot commented Nov 17, 2023

seabbs left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 17, 2023

github-actions bot commented Nov 20, 2023

sbfnk commented Nov 20, 2023

seabbs left a comment

Choose a reason for hiding this comment

seabbs left a comment

Choose a reason for hiding this comment

sbfnk commented Nov 15, 2023 •

edited

sbfnk commented Nov 15, 2023 •

edited

`threads_per_chain = 2, parallel_chains = 2`

`threads_per_chain = 4, parallel_chains = 1`:

seabbs commented Nov 15, 2023 •

edited

codecov bot commented Nov 17, 2023 •

edited