New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi card support #356
multi card support #356
Conversation
We should add documentation about how to use multiple devices. It wasn't clear to me at first how a Device slice could be set on anything but the default device. |
Maybe because setting a new device in a context automatically invalidates the context and setting a different stream invalidates the context if this stream was created for a different device, we should re-design or at least encapsulate the context somehow? Though I guess it's a breaking change technically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like the idea of one CPU thread = one device. Maybe we'll have some inconveniences integrating into third party applications that already have their own multithreading but stand-alone it seems very elegant and natural to me.
The only thing I don't really like is adding new functions for context and config creation that accept device_id
. I feel like we can always just infer device id using cudaGetDevice
especially considering that we want device id to be set once for the entire duration of the CPU thread. More generally, I feel the need to push back against all non-essential free floating functions being added to the public API. As discussed with @ChickenLover at some point, it feels like once there are too many of them - we should create a struct, a trait etc.
@vhnatyk okay last counter-proposal. Recently we moved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
## Contents of this release [FEAT]: support for multi-device execution: #356 [FEAT]: full support for new mixed-radix NTT: #367, #368 and #371 [FEAT]: examples for Poseidon hash and tree builder based on it (currently only on C++ side): #375 [PERF]: MSM performance upgrades & zero point handling: #372
Describe the changes
This PR enables multi-gpu support
Linked Issues
Resolves #135