Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi card support #356

Merged
merged 47 commits into from Feb 14, 2024
Merged

Conversation

vhnatyk
Copy link
Contributor

@vhnatyk vhnatyk commented Jan 30, 2024

Describe the changes

This PR enables multi-gpu support

Linked Issues

Resolves #135

@vhnatyk vhnatyk marked this pull request as ready for review January 31, 2024 02:47
@jeremyfelder
Copy link
Collaborator

jeremyfelder commented Jan 31, 2024

We should add documentation about how to use multiple devices. It wasn't clear to me at first how a Device slice could be set on anything but the default device.
Its also unclear how set_device() works with multithreading

wrappers/rust/icicle-core/src/ntt/mod.rs Outdated Show resolved Hide resolved
wrappers/rust/icicle-cuda-runtime/src/device_context.rs Outdated Show resolved Hide resolved
wrappers/rust/icicle-cuda-runtime/src/memory.rs Outdated Show resolved Hide resolved
wrappers/rust/icicle-core/src/field.rs Show resolved Hide resolved
@DmytroTym
Copy link
Contributor

DmytroTym commented Feb 4, 2024

Maybe because setting a new device in a context automatically invalidates the context and setting a different stream invalidates the context if this stream was created for a different device, we should re-design or at least encapsulate the context somehow? Though I guess it's a breaking change technically

Copy link
Contributor

@DmytroTym DmytroTym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the idea of one CPU thread = one device. Maybe we'll have some inconveniences integrating into third party applications that already have their own multithreading but stand-alone it seems very elegant and natural to me.

The only thing I don't really like is adding new functions for context and config creation that accept device_id. I feel like we can always just infer device id using cudaGetDevice especially considering that we want device id to be set once for the entire duration of the CPU thread. More generally, I feel the need to push back against all non-essential free floating functions being added to the public API. As discussed with @ChickenLover at some point, it feels like once there are too many of them - we should create a struct, a trait etc.

icicle/appUtils/ntt/ntt.cu Outdated Show resolved Hide resolved
wrappers/rust/icicle-core/src/field.rs Show resolved Hide resolved
wrappers/rust/icicle-core/src/ntt/mod.rs Outdated Show resolved Hide resolved
wrappers/rust/icicle-core/src/ntt/mod.rs Show resolved Hide resolved
wrappers/rust/icicle-core/src/msm/tests.rs Outdated Show resolved Hide resolved
@DmytroTym
Copy link
Contributor

DmytroTym commented Feb 13, 2024

@vhnatyk okay last counter-proposal. Recently we moved get_default_ntt_config into Rust, this allowed us to actually implement it generically. Using this function seems to me much better than the free-floating functions as before. Can we from now on use NTTConfig::default_config and NTTConfig::default_config_for_device, not add get_default_ntt_config_for_device and schedule get_default_ntt_config for deprecation? Maybe the same can be done for the DeviceContext struct, not sure why would I put a free-floating function there, for consistency with configs I guess, but now since we added another initialiser I would definitely prefer DeviceContext::default_context and DeviceContext::default_context_for_device.

Copy link
Contributor

@DmytroTym DmytroTym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@vhnatyk vhnatyk merged commit 7742509 into ingonyama-zk:dev Feb 14, 2024
14 checks passed
@DmytroTym DmytroTym mentioned this pull request Feb 15, 2024
DmytroTym added a commit that referenced this pull request Feb 15, 2024
## Contents of this release

[FEAT]: support for multi-device execution:
#356
[FEAT]: full support for new mixed-radix NTT:
#367,
#368 and
#371
[FEAT]: examples for Poseidon hash and tree builder based on it
(currently only on C++ side):
#375
[PERF]: MSM performance upgrades & zero point handling:
#372
@vhnatyk vhnatyk deleted the develop/vhnat/Multi-card-support branch February 16, 2024 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants