Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 328: option to remove within-chain parallelisation #366

Merged
merged 20 commits into from Nov 20, 2023

Conversation

sbfnk
Copy link
Collaborator

@sbfnk sbfnk commented Nov 15, 2023

Description

This PR closes #328.

This removes within-chain parallelisation by calling likelihood functions directly. Looking for touchstone benchmarking results here (without multi-threading disabled) and will test locally with multi-threading enabled. If there are performance improvements at any stage will add an option for the user to disable within-chain parallelisation.

This enables compilation with multithreading by default (as this did not seem to negatively affect performance) and adds an explicit threads_per_chain option that can be used to enable within-chain parallelisation (if set to >1). If it is set to 1 (default) then the likelihood will be calculated directly and without using reduce_sum.

Checklist

  • My PR is based on a package issue and I have explicitly linked it.
  • I have included the target issue or issues in the PR title in the for Issue(s) issue-numbers: PR title
  • I have read the contribution guidelines.
  • I have tested my changes locally.
  • I have added or updated unit tests where necessary.
  • I have updated the documentation if required.
  • My code follows the established coding standards.
  • I have added a news item linked to this PR.
  • I have updated the package development version by one increment in both NEWS.md and the DESCRIPTION.
  • I have reviewed CI checks for this PR and addressed them as far as I am able.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if d4a65aa is merged into main:

  •   :rocket:day_of_week_model: 26.3s -> 20s [-28.51%, -18.91%]
  •   :rocket:latent_renewal_model: 30.6s -> 23.7s [-29.46%, -15.41%]
  •   :rocket:missingness_model: 1.35m -> 1.31m [-5.04%, -0.8%]
  •   :ballot_box_with_check:multi_group_latent_renewal_model: 7.32s -> 6.5s [-26.32%, +4.11%]
  •   :ballot_box_with_check:preprocessing: 706ms -> 714ms [-2.11%, +4.31%]
  •   :rocket:simple_model: 5.85s -> 3.88s [-48.4%, -18.96%]
  •   :ballot_box_with_check:simple_negbin_model_with_pp: 5.58s -> 4.14s [-56.71%, +5.12%]
    These benchmarks are based on package examples which are available here. Further explanation regarding interpretation and methodology can be found in the documentation of touchstone.

@sbfnk
Copy link
Collaborator Author

sbfnk commented Nov 15, 2023

Running everything locally (4 cores) with threads = TRUE and 2 chains:

threads_per_chain = 2, parallel_chains = 2

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 1e8ad04 is merged into main:

  •   :ballot_box_with_check:day_of_week_model: 33.6s -> 29.4s [-35.13%, +10.08%]
  •   :ballot_box_with_check:latent_renewal_model: 36.3s -> 45.3s [-3.59%, +53.51%]
  •   :ballot_box_with_check:missingness_model: 2.23m -> 2.37m [-34.56%, +46.87%]
  •   :ballot_box_with_check:multi_group_latent_renewal_model: 13.2s -> 9.73s [-59.75%, +6.77%]
  •   :ballot_box_with_check:preprocessing: 1.13s -> 902ms [-93.86%, +53.02%]
  •   :ballot_box_with_check:simple_model: 5.75s -> 4.13s [-103.93%, +47.47%]
  •   :ballot_box_with_check:simple_negbin_model_with_pp: 7.23s -> 6.53s [-49.44%, +30.1%]

threads_per_chain = 4, parallel_chains = 1:

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 1e8ad04 is merged into main:

  •   :ballot_box_with_check:day_of_week_model: 35.8s -> 38.5s [-57.93%, +72.65%]
  •   :ballot_box_with_check:latent_renewal_model: 43.6s -> 49.6s [-40.98%, +68.36%]
  •   :ballot_box_with_check:missingness_model: 2.89m -> 2.76m [-72.64%, +63.87%]
  •   :ballot_box_with_check:multi_group_latent_renewal_model: 13.8s -> 14.4s [-91.09%, +99.42%]
  •   :ballot_box_with_check:preprocessing: 1.09s -> 921ms [-64.25%, +33.29%]
  •   :ballot_box_with_check:simple_model: 5.75s -> 4.34s [-72.54%, +23.62%]
  •   :ballot_box_with_check:simple_negbin_model_with_pp: 7.5s -> 6.79s [-36.25%, +17.57%]

I would conclude that there is a case for an option to remove within-chain parallelisation (perhaps even as a default?), unless I'm missing something on the kind of scenario where we're expecting to see improvement.

@seabbs
Copy link
Collaborator

seabbs commented Nov 15, 2023

I would conclude that there is a case for an option to remove within-chain parallelisation (perhaps even as a default?), unless I'm missing something on the kind of scenario where we're expecting to see improvement.

Yes, I agree with both. I think we should expect to see improvement for long running models (such as those in the germany example vignette) where setup costs are a smaller proportion of the total run time (unless reduce_sum has some kind of resource requirement that also scales with number of groups?).

The other question is whether compiling the model with threads = TRUE and then not using that functionality has any performance impact? If it doesn't we could always compile this way and then the user doesn't need to recompile to turn on multi-threading but just change around a few options.

I think the slightly more complex question is what gives you more effective samples per second if you have a fixed CPU budget (due to the warmup) and are happy to run only a few chains (i.e. 2) but have many cores.

@sbfnk
Copy link
Collaborator Author

sbfnk commented Nov 15, 2023

The other question is whether compiling the model with threads = TRUE and then not using that functionality has any performance impact? If it doesn't we could always compile this way and then the user doesn't need to recompile to turn on multi-threading but just change around a few options.

On that note is there a particular reason the models are all compiled with threads=FALSE for the touchstone benchmarks?

I think the slightly more complex question is what gives you more effective samples per second if you have a fixed CPU budget (due to the warmup) and are happy to run only a few chains (i.e. 2) but have many cores.

Yes, and then there's also potentially https://mc-stan.org/cmdstanr/articles/opencl.html

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if ef6c955 is merged into main:

  •   :rocket:day_of_week_model: 26.4s -> 19.5s [-30.05%, -22.1%]
  •   :rocket:latent_renewal_model: 29.6s -> 25.6s [-23.73%, -3.56%]
  •   :ballot_box_with_check:missingness_model: 1.3m -> 1.31m [-2.17%, +4.59%]
  •   :rocket:multi_group_latent_renewal_model: 7.57s -> 6.14s [-24.39%, -13.36%]
  •   :ballot_box_with_check:preprocessing: 676ms -> 681ms [-0.65%, +2.04%]
  •   :ballot_box_with_check:simple_model: 6.51s -> 5.98s [-39.58%, +23.04%]
  •   :ballot_box_with_check:simple_negbin_model_with_pp: 5.49s -> 4.08s [-58.24%, +6.99%]
    These benchmarks are based on package examples which are available here. Further explanation regarding interpretation and methodology can be found in the documentation of touchstone.

Copy link

codecov bot commented Nov 17, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (60495c7) 96.85% compared to head (438ab0b) 96.85%.
Report is 1 commits behind head on main.

❗ Current head 438ab0b differs from pull request most recent head 4ef9b86. Consider uploading reports for the commit 4ef9b86 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #366   +/-   ##
=======================================
  Coverage   96.85%   96.85%           
=======================================
  Files          15       15           
  Lines        1875     1876    +1     
=======================================
+ Hits         1816     1817    +1     
  Misses         59       59           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sbfnk sbfnk marked this pull request as ready for review November 17, 2023 12:12
@sbfnk sbfnk requested a review from seabbs November 17, 2023 12:12
Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if ef6c955 is merged into main:

  •   :rocket:day_of_week_model: 26s -> 19.9s [-27.29%, -19.71%]
  •   :rocket:latent_renewal_model: 28.9s -> 23s [-33.68%, -7.08%]
  •   :ballot_box_with_check:missingness_model: 1.32m -> 1.32m [-4.87%, +5.34%]
  •   :rocket:multi_group_latent_renewal_model: 7.72s -> 6.13s [-30.43%, -10.71%]
  •   :ballot_box_with_check:preprocessing: 673ms -> 674ms [-1.18%, +1.36%]
  •   :ballot_box_with_check:simple_model: 5.28s -> 4.51s [-37.24%, +8.11%]
  •   :ballot_box_with_check:simple_negbin_model_with_pp: 4.94s -> 4.09s [-37.7%, +3.5%]
    These benchmarks are based on package examples which are available here. Further explanation regarding interpretation and methodology can be found in the documentation of touchstone.

Copy link
Collaborator

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - thanks @sbfnk this looks great. I think we can simplify how we approach the if else setup in the stan code and potentially slightly better communicate exactly what has happened in this PR.

R/model-modules.R Outdated Show resolved Hide resolved
R/model-modules.R Outdated Show resolved Hide resolved
README.Rmd Outdated Show resolved Hide resolved
inst/stan/epinowcast.stan Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
tests/testthat/test-epinowcast.R Outdated Show resolved Hide resolved
Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 60495c7 is merged into main:

  •   :rocket:day_of_week_model: 26.2s -> 19.7s [-33.19%, -16.6%]
  •   :rocket:latent_renewal_model: 29.7s -> 24.4s [-34.1%, -1.84%]
  •   :ballot_box_with_check:missingness_model: 1.33m -> 1.31m [-5.31%, +2.91%]
  •   :rocket:multi_group_latent_renewal_model: 7.85s -> 5.8s [-42.26%, -10.08%]
  •   :ballot_box_with_check:preprocessing: 691ms -> 690ms [-1.98%, +1.7%]
  •   :rocket:simple_model: 5.83s -> 4.1s [-54.24%, -4.89%]
  •   :ballot_box_with_check:simple_negbin_model_with_pp: 6.59s -> 4.08s [-87.73%, +11.65%]
    These benchmarks are based on package examples which are available here. Further explanation regarding interpretation and methodology can be found in the documentation of touchstone.

Copy link

This is how benchmark results would change (along with a 95% confidence interval in relative change) if d5798f5 is merged into main:

  •   :rocket:day_of_week_model: 26.1s -> 20.3s [-26.85%, -17.42%]
  •   :rocket:latent_renewal_model: 28.4s -> 22.7s [-27.35%, -12.65%]
  •   :ballot_box_with_check:missingness_model: 1.43m -> 1.3m [-30.2%, +12.16%]
  •   :rocket:multi_group_latent_renewal_model: 7.66s -> 5.98s [-28.03%, -15.95%]
  •   :ballot_box_with_check:preprocessing: 673ms -> 674ms [-3.04%, +3.32%]
  •   :ballot_box_with_check:simple_model: 5.42s -> 5.04s [-23.08%, +9.1%]
  •   :ballot_box_with_check:simple_negbin_model_with_pp: 5.44s -> 4.37s [-61.43%, +22.17%]
    These benchmarks are based on package examples which are available here. Further explanation regarding interpretation and methodology can be found in the documentation of touchstone.

@sbfnk
Copy link
Collaborator Author

sbfnk commented Nov 20, 2023

One thing we haven't done is checking whether manually tuning the grain size makes any difference. But that's for another time perhaps.

@sbfnk sbfnk requested a review from seabbs November 20, 2023 11:13
Copy link
Collaborator

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT. We can ignore the linting issues as unrelated to this PR and resolved in #382. Thanks again for this @sbfnk!

Copy link
Collaborator

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT. We can ignore the linting issues as unrelated to this PR and resolved in #382. Thanks again for this @sbfnk!

@seabbs seabbs added this pull request to the merge queue Nov 20, 2023
Merged via the queue into epinowcast:main with commit fe3529a Nov 20, 2023
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide an option that does not use reduce_sum
2 participants