-
-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add run time options for threads and gpu stuff #825
Comments
That would be fantastic. Let me know if you need any help or need a review. |
At least the threads part of this would be great to do for 2.24. For backward compatiblity we need to still support the environment variable. |
but we should deprecate the environment variable stuff once we have this feature. |
OpenCL side of this was fixed, so this just requires threading runtime option which is blocked by stan-dev/math#1949 |
what exactly do you mean by "users could do something"? via CmdStan's extremely clunky argument parser? |
This was merged already. “opencl device=1 platform=0” will select the device if the model was compiled appropriately. Once we prepare stuff for threading downstream in Math it should be “threads=4” instead of the more clunky environment variable. |
sorry, I'm confused here - I'm a user, I have a Stan model, I want to take advantage of threading. this needs to be explained clearly in the CmdStan manual for similarly confused and naive users like me. |
Currently (its been this way since theading was introduced):
This has been documented in cmdstan guide for some time and I would say its quite clear: https://mc-stan.org/docs/2_25/cmdstan-guide/parallelization.html Once we figure out a way to close this issue:
So instead of environment variable its a cmdstan argument. The environment variable approach will be deprecated, not removed. We havent gotten to the stage where this is doable, but yes, this will be documented once its closed. For OpenCL (GPU stuff) a chapter/section for at least the Cmdstan guide is in the works and is part of the checklist here and will be done for the release. |
alas, not clear enough - that's a documentation issue - what's needed are a few more examples on a case-by-case basis, cases being:
is argument when editing together the CmdStan manual, I thought I understood parallelization - when Bob wanted to run CmdStan to use |
I don't think that is thing that should be in the Cmdstan Guide. In my opinion the role of the Cmdstan Guide in this case is to answer the question: "I have a model that can use parallelization. How do I use that with Cmdstan?" And I think the link answers that concisely and clearly. Answers to the questions like:
should be part of Stan's User guide and Functions reference. These are not limited to Cmdstan. Some of the answers to these questions are already there, some are scattered in other places like case studies and some are missing. The answers to the OpenCL side of things will be added in a form of a section or a chapter before the release and hopefully address all these questions. GPU and reduce_sum are not really related. More will be explained in docs. For now I can just give a short description for anyone coming here at any later point via Google/Github search:
Use reduce_sum or map_rect in your model. The former is recommended as its easier to use. Then compile your model with STAN_THREADS and set the number of threads via environment variable STAN_NUM_THREADS.
Use map_rect and MPI, which can be enabled with STAN_MPI. The number of MPI processes is defined with the -n flag to the MPI launcher like mpirun, mpiexec.
For now, refer to the provisional and unofficial install documentation: https://github.com/bstatcomp/stan_gpu_install_docs |
there is always a conflict between making things crystal clear and being concise. also - looking at that link again now- https://mc-stan.org/docs/2_25/cmdstan-guide/parallelization.html - unfinished edit? - the sentence that ends with:
the next paragraph says the same thing? |
Ok yeah, re-read it again. We are maybe trying to do too much in that page. Not sure why we are referring to both threading an OpenCL at the same time before we even introduce basic threading. Its unlikely anyone will use both. This page should just be about threading. Will fix that once we have a separate GPU section. |
We can close this now with the new |
Summary:
It would be nice to have a
parallel
option so users could do something likeparallel threads=4 opencl_device=0 opencl_platform=0
or something of that ilk. What should it look like?Description:
Right now we need to set threading info as an environment variable and the OpenCL stuff has to be known at compile time. I think at the math level we need
opencl_context
to set the platform and device dynamically.Then cmdstan users (and the other upstreams) can pass in the options at runtime.
The OpenCL kernels are compiled dynamically so we'd need to set the device/platform before we do any of the kernel compilation. Which I think will just happen if we do it at the cmdstan level before any of the kernels are called.
Current Version:
v2.22.0
The text was updated successfully, but these errors were encountered: