-
Notifications
You must be signed in to change notification settings - Fork 25
Distributed GEMM #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Distributed GEMM #302
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
985e5cb to
7e97598
Compare
1491a50 to
8f64897
Compare
Hardcode84
reviewed
Sep 25, 2025
Contributor
Hardcode84
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any lit test for codegen changes? Also, please fix merge conflicts.
Contributor
|
Please fix the DCO and merge conflicts. |
f9a9485 to
875041d
Compare
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Side-effect, device constraint on multiple dimensions functional Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Hardcode84
approved these changes
Oct 2, 2025
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Megan0704-1
pushed a commit
to Megan0704-1/wave
that referenced
this pull request
Oct 28, 2025
Enable distributed matrix multiplication operations across multiple GPU devices in wave, where the distribution factor for each dimension is managed through DeviceConstraint ### Key Changes - In the host_codegen.py, we split the input tensors across devices based on device constraint, dispatch computation to each device, and merge results back into the original tensor shape - We now use a `HostSignature` class to manage full problem-size buffers (e.g., 1024x8192 matrix), while `KernelSignature` handles per-device tile buffers (e.g., 512x4096 tiles for 2x2 distribution). Non-distributed workloads have no impact, because in absence of `DeviceConstraint`, `HostSignature` is the same as` KernelSignature` - In the `host_utils.py`, added functions that help split input and output tensors per device, manage tensor distribution across devices using `device_constraint_map`. - Updated distributed GEMM template to accept` device_m` and `device_n` parameters, with test coverage for distribution factors from 1x1 up to 4x2 across problem sizes up to 4096x20480x2560 - Use MultiDeviceLaunchable class to orchestrate execution across multiple GPU devices --------- Signed-off-by: Sanket Pandit <sanketp@amd.com> Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enable distributed matrix multiplication operations across multiple GPU devices in wave, where the distribution factor for each dimension is managed through DeviceConstraint
Key Changes
HostSignatureclass to manage full problem-size buffers (e.g., 1024x8192 matrix), whileKernelSignaturehandles per-device tile buffers (e.g., 512x4096 tiles for 2x2 distribution). Non-distributed workloads have no impact, because in absence ofDeviceConstraint,HostSignatureis the same asKernelSignaturehost_utils.py, added functions that help split input and output tensors per device, manage tensor distribution across devices usingdevice_constraint_map.device_manddevice_nparameters, with test coverage for distribution factors from 1x1 up to 4x2 across problem sizes up to 4096x20480x2560