New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenCL device selection at runtime and a OpenCL vignette #439
Conversation
Codecov Report
@@ Coverage Diff @@
## master #439 +/- ##
==========================================
- Coverage 93.41% 91.48% -1.93%
==========================================
Files 12 12
Lines 2930 2948 +18
==========================================
- Hits 2737 2697 -40
- Misses 193 251 +58
Continue to review full report at Codecov.
|
Ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. I am going to read/edit the vignette now so more comments incoming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some comments and some edits. Hopefully they're not too confusing. I didn't feel super great about the edits I made so if you want to revert anything that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rok-cesnovar This looks really good. I haven't gone through the vignette yet, but I made a few minor comments on the code.
# Conflicts: # R/model.R
Co-authored-by: Jonah Gabry <jgabry@gmail.com>
…_runtime_args # Conflicts: # R/model.R
@rok-cesnovar Are there still more review comments to address or other changes to make to this PR? Do you need me to do another review of this? |
There are a few comments that I need to address. I have put this one on the backburner for a day or two to think it through. Will tag when ready. |
Ok sounds good
…On Tue, Mar 23, 2021 at 11:49 AM Rok Češnovar ***@***.***> wrote:
There are a few comments that I need to address. I have put this one on
the backburner for a day or two to think it through. Will tag when ready.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#439 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3PQQ2DNVQEYMTKG4PMF3LTFDICBANCNFSM4WZTAJ5Q>
.
|
Should be ready for another look. Not sure how to exclude the vignette from coverage tests. |
vignettes/opencl.Rmd
Outdated
As of version 2.26.1, users can expect speedups with OpenCL when using vectorized | ||
probability distribution/mass functions (functions with the `_lpdf` or `_lpmf` | ||
suffix). You can expect speedups when the input variables contain 20.000 or more elements. | ||
|
||
The actual speedup for a model will depend on whether the `lpdf/lpmf` functions | ||
are the bottlenecks of the model and the `lpdf/lpmf` function used. | ||
The more computationally complex the function is, the larger the expected speedup. | ||
The biggest speedups are expected when using the GLM functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man this is great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one clarification, what are 20.000 elements? Is that data or input parameters? I just want to make sure that the vignette has no ambiguities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 20k elements here are meant as N of the input vectors/row_vectors/arrays for the worst-case.
If most of them are data, the speedups will be bigger and typically a GPU will be faster than a CPU at smaller N.
The worst-case is all of the inputs are vector parameters as there is a lot more copies to be done there.
The best case is the non-data inputs are scalars:
The theoretical best-case is all inputs are data, but an all-data lpdf/lpmf call in the model/transformed parameters blocks are just inefficient anyways. Its still shown on the above charts as that will be achievable once we turn on support for non-lpdf/lpmf functions as well (the inputs are calculated on the GPU -> less copies).
This is a "ballpark" number, as mentioned it will depend on the exact CPU/GPU/lpdf/lpmf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Ready for review. |
Code changes look good. I'll take a look at the vignette soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rok-cesnovar This is really good. I pushed a few very minor edits and made just a few minor review comments, but this looks pretty much ready to go.
crap I just noticed one more thing. can you regenerate it again but set refresh=0 in the fit_cl line? Right now it's printing all the iteration updates for that one but has refresh=0 for the cpu one. Sorry I didn't notice that before! Otherwise I think it's ready to go. |
Good call. Fixed. Thanks! |
Ok cool, thanks. Should we go ahead and merge this or is there anything else you wanted to add/change? |
I am good. nothing else to add. |
ok great, i'll merge now |
Summary
Add OpenCL device selection at runtime and a OpenCL vignette
This isnt quite ready yet, but wanted to get comments.
Copyright and Licensing
Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Rok Češnovar, Uni. of Ljubljana
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses: