New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 328: option to remove within-chain parallelisation #366
Conversation
This is how benchmark results would change (along with a 95% confidence interval in relative change) if d4a65aa is merged into main:
|
Running everything locally (4 cores) with
|
Yes, I agree with both. I think we should expect to see improvement for long running models (such as those in the germany example vignette) where setup costs are a smaller proportion of the total run time (unless The other question is whether compiling the model with I think the slightly more complex question is what gives you more effective samples per second if you have a fixed CPU budget (due to the warmup) and are happy to run only a few chains (i.e. 2) but have many cores. |
On that note is there a particular reason the models are all compiled with
Yes, and then there's also potentially https://mc-stan.org/cmdstanr/articles/opencl.html |
This is how benchmark results would change (along with a 95% confidence interval in relative change) if ef6c955 is merged into main:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #366 +/- ##
=======================================
Coverage 96.85% 96.85%
=======================================
Files 15 15
Lines 1875 1876 +1
=======================================
+ Hits 1816 1817 +1
Misses 59 59 ☔ View full report in Codecov by Sentry. |
This is how benchmark results would change (along with a 95% confidence interval in relative change) if ef6c955 is merged into main:
|
5347176
to
4e24659
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - thanks @sbfnk this looks great. I think we can simplify how we approach the if else setup in the stan code and potentially slightly better communicate exactly what has happened in this PR.
Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 60495c7 is merged into main:
|
This is how benchmark results would change (along with a 95% confidence interval in relative change) if d5798f5 is merged into main:
|
One thing we haven't done is checking whether manually tuning the grain size makes any difference. But that's for another time perhaps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
This PR closes #328.
This removes within-chain parallelisation by calling likelihood functions directly. Looking for touchstone benchmarking results here (without multi-threading disabled) and will test locally with multi-threading enabled. If there are performance improvements at any stage will add an option for the user to disable within-chain parallelisation.This enables compilation with multithreading by default (as this did not seem to negatively affect performance) and adds an explicit
threads_per_chain
option that can be used to enable within-chain parallelisation (if set to >1). If it is set to 1 (default) then the likelihood will be calculated directly and without usingreduce_sum
.Checklist
NEWS.md
and theDESCRIPTION
.