Skip to content

Conversation

@panditsa
Copy link
Contributor

@panditsa panditsa commented Sep 16, 2025

Enable distributed matrix multiplication operations across multiple GPU devices in wave, where the distribution factor for each dimension is managed through DeviceConstraint

Key Changes

  • In the host_codegen.py, we split the input tensors across devices based on device constraint, dispatch computation to each device, and merge results back into the original tensor shape
  • We now use a HostSignature class to manage full problem-size buffers (e.g., 1024x8192 matrix), while KernelSignature handles per-device tile buffers (e.g., 512x4096 tiles for 2x2 distribution). Non-distributed workloads have no impact, because in absence of DeviceConstraint, HostSignature is the same as KernelSignature
  • In the host_utils.py, added functions that help split input and output tensors per device, manage tensor distribution across devices using device_constraint_map.
  • Updated distributed GEMM template to accept device_m and device_n parameters, with test coverage for distribution factors from 1x1 up to 4x2 across problem sizes up to 4096x20480x2560
  • Use MultiDeviceLaunchable class to orchestrate execution across multiple GPU devices

@panditsa panditsa force-pushed the sanket/dist_gemm branch 7 times, most recently from 985e5cb to 7e97598 Compare September 18, 2025 04:06
@panditsa panditsa marked this pull request as ready for review September 19, 2025 04:06
@panditsa panditsa requested a review from Hardcode84 September 19, 2025 04:06
@panditsa panditsa force-pushed the sanket/dist_gemm branch 9 times, most recently from 1491a50 to 8f64897 Compare September 22, 2025 16:58
Copy link
Contributor

@Hardcode84 Hardcode84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any lit test for codegen changes? Also, please fix merge conflicts.

@Hardcode84
Copy link
Contributor

Please fix the DCO and merge conflicts.

@panditsa panditsa force-pushed the sanket/dist_gemm branch 4 times, most recently from f9a9485 to 875041d Compare October 2, 2025 19:24
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
panditsa and others added 19 commits October 2, 2025 12:27
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Side-effect, device constraint on multiple dimensions functional

Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
@panditsa panditsa merged commit 6d60f87 into iree-org:main Oct 3, 2025
19 checks passed
Megan0704-1 pushed a commit to Megan0704-1/wave that referenced this pull request Oct 28, 2025
Enable distributed matrix multiplication operations across multiple GPU
devices in wave, where the distribution factor for each dimension is
managed through DeviceConstraint

### Key Changes

- In the host_codegen.py, we split the input tensors across devices
based on device constraint, dispatch computation to each device, and
merge results back into the original tensor shape
- We now use a `HostSignature` class to manage full problem-size buffers
(e.g., 1024x8192 matrix), while `KernelSignature` handles per-device
tile buffers (e.g., 512x4096 tiles for 2x2 distribution).
Non-distributed workloads have no impact, because in absence of
`DeviceConstraint`, `HostSignature` is the same as` KernelSignature`
- In the `host_utils.py`, added functions that help split input and
output tensors per device, manage tensor distribution across devices
using `device_constraint_map`.
- Updated distributed GEMM template to accept` device_m` and `device_n`
parameters, with test coverage for distribution factors from 1x1 up to
4x2 across problem sizes up to 4096x20480x2560
- Use MultiDeviceLaunchable class to orchestrate execution across
multiple GPU devices

---------

Signed-off-by: Sanket Pandit <sanketp@amd.com>
Signed-off-by: Sanket Pandit <sanket.pandit@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants