[mlir][xegpu] Add support for `vector.multi_reduction` and `vector.shape_cast` SIMT distribution. #157560

charithaintc · 2025-09-08T21:46:38Z

Add support for distributing the vector.multi_reduction operation across lanes in a warp. Currently only 2D to 1D reductions are supported. Given layouts for the source and accumulator vectors,

If the reduction dimension is distributed across lanes, the reduction is non-lane-local and the reduction is done using warp shuffles. Here we simply rewrite the MultiDimReductionOp to a sequence of ReductionOps inside the warp op body. Actual distribution will be done by WarpOpReduction pattern.
If the reduction dimension is not distributed across lanes, the reduction is lane-local. In this case, we yield the source and accumulator vectors from the warp op and perform the lane-local reduction outside the warp op using a sequence of ReductionOps.

PR also adds support for distributing vector.shape_cast based on layouts.

charithaintc · 2025-09-08T21:50:52Z

@adam-smnk We decided to move the muti reduction distribution inside XeGPU subgroup distribution after some discussion. Reason being certain cases of reduction require accessing the layouts of the source of reduction which upstream vector distribution does not allow us to do. I will close #154438

mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp

Jianhui-Li · 2025-09-09T01:38:19Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp

  // dimensions are not distributed.
-  unsigned distributionStart = originalType.getRank() - laneLayout.size();
+  unsigned distributionStart =
+      originalType.getRank() - effectiveLaneLayout.size();


Should here assert "originalType.getRank()== effectiveLaneLayout.size()"?

I think caller should take care of that. This function simply distribute the innermost laneLayout.size() dimensions.

Jianhui-Li · 2025-09-09T01:39:22Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp


    // Check if the dimension can be distributed evenly.
-    if (dim % laneLayout[i - distributionStart] != 0)
+    if (dim % effectiveLaneLayout[i - distributionStart] != 0)


How to handle when dim size is 1, as the result of shape_cast?

calling this function for such cases will result in failure. caller (i.e. pattern) should handle the error and decide how to proceed.

adam-smnk · 2025-09-09T16:13:36Z

@charithaintc Just as a side note, couldn't shape_cast part of that PR still go through? It looked largely complete.
Either way, it's fine to have it here too if it's better, easier etc..

charithaintc · 2025-09-09T18:22:56Z

@charithaintc Just as a side note, couldn't shape_cast part of that PR still go through? It looked largely complete. Either way, it's fine to have it here too if it's better, easier etc..

We found that shape cast also need to access xegpu layouts in most cases. So we can not rely on vector distribution infra (plus the pattern there is naive, does not fit our use cases). The shape cast here has high pattern benefit, so it should go first. I think we will have to adopt this approach until we have something working.

mlir/lib/Dialect/XeGPU/Utils/XeGPUUtils.cpp

Jianhui-Li · 2025-09-11T00:27:21Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp

+    if (!sourceLayout)
+      return rewriter.notifyMatchFailure(
+          warpOp, "the source of shape_cast op lacks distribution layout");
+    FailureOr<VectorType> sourceDistTypeOrFailure =


How getDistVectTypeBasedOnLaneLayout work for the layout_in0? Could you please add the following as a test case and handle it?

layout_in0 = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 1]>
%res0 = vector.shape_cast %in0 {layout_res0 = #xegpu.slice<#xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>, dims = [0]> }
: vector<1x2xf32> to vector<2xf32>

Or maybe we just limit shape_cast only support shorter vector (with slice attribute) to wider vector (parent attribute)?

In any case, we should check inputLayout is a slice of resultLayout, or the reverse if we allow shape_cast to narrow vector.

In this code example getDistVectTypeBasedOnLaneLayout will return a failure. 1x2 is not distributable with 16, 1.

I added a check with isSliceOf. please have a look.

Also planning to add few more test cases.

Jianhui-Li

LGTM

nbpatel · 2025-09-12T17:13:03Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp

+      return rewriter.notifyMatchFailure(
+          warpOp, "shape_cast is rank increasing but result layout is not a "
+                  "slice of source layout");
+


Shape cast pattern needs to check only unit dims can be squeezed/expanded

charithaintc added 20 commits August 19, 2025 16:42

save

eaaca7f

save

56c3441

Merge branch 'main' into vector_multi_reduction_distr

8da99e4

save

01880b5

save

53da992

save

affd4aa

save

df59c20

Merge branch 'main' into vector_multi_reduction_distr

70a5a49

save

5579731

save

07c0364

save work

4b031de

save work

4ed74d8

Merge branch 'main' into vector_multi_reduction_distr

66e105f

save work

116e4bc

Merge branch 'main' into vector_multi_reduction_distr

360543e

move work

a78aec5

save work

2ba43fc

save test

3dea80c

save work

3c06f28

save work

8728eee

charithaintc requested review from adam-smnk, Jianhui-Li, silee2 and nbpatel September 8, 2025 21:46

charithaintc mentioned this pull request Sep 8, 2025

[mlir][vector] Add support for vector.multi_reduction and vector.shape_cast distribution. #154438

Closed

charithaintc changed the title ~~[mlir][xegpu] Add support for vector.multi_reduction SIMT distribution.~~ [mlir][xegpu] Add support for vector.multi_reduction and vector.shape_cast SIMT distribution. Sep 8, 2025

charithaintc added 3 commits September 8, 2025 22:17

save work

9b72ac0

Merge branch 'main' into vector_multi_reduction_distr_refactor

ce9dd27

save work

1854713

save work

797aa3e

Jianhui-Li reviewed Sep 9, 2025

View reviewed changes

Merge branch 'main' into vector_multi_reduction_distr_refactor

017e12b

charithaintc added 6 commits September 9, 2025 18:32

save work

232808e

add transpose function

be1c00c

fix test

2ebe31e

Merge branch 'main' into vector_multi_reduction_distr_refactor

f9b3933

Merge branch 'main' into slice_utils

82486fa

add slice attribute utils

916c75f

Jianhui-Li reviewed Sep 10, 2025

View reviewed changes

mlir/lib/Dialect/XeGPU/Utils/XeGPUUtils.cpp Outdated Show resolved Hide resolved

fix name

77e8a94

Jianhui-Li reviewed Sep 11, 2025

View reviewed changes

charithaintc added 4 commits September 11, 2025 17:52

Merge branch 'slice_utils' into vector_multi_reduction_distr_refactor

0550d4b

fix func naming

6e2f420

fix func naming

1c4f06f

Merge branch 'main' into vector_multi_reduction_distr_refactor

c7114c8

Jianhui-Li approved these changes Sep 11, 2025

View reviewed changes

charithaintc and others added 3 commits September 12, 2025 16:15

remove header

8febca8

Merge branch 'main' into vector_multi_reduction_distr_refactor

72be293

Merge branch 'main' into vector_multi_reduction_distr_refactor

3cf382c

charithaintc merged commit 9b0d7dd into llvm:main Sep 12, 2025
7 of 9 checks passed

nbpatel reviewed Sep 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][xegpu] Add support for `vector.multi_reduction` and `vector.shape_cast` SIMT distribution. #157560

[mlir][xegpu] Add support for `vector.multi_reduction` and `vector.shape_cast` SIMT distribution. #157560

Uh oh!

charithaintc commented Sep 8, 2025 •

edited

Loading

Uh oh!

charithaintc commented Sep 8, 2025

Uh oh!

Uh oh!

Jianhui-Li Sep 9, 2025

Uh oh!

charithaintc Sep 9, 2025

Uh oh!

Jianhui-Li Sep 9, 2025

Uh oh!

charithaintc Sep 9, 2025

Uh oh!

adam-smnk commented Sep 9, 2025

Uh oh!

charithaintc commented Sep 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Jianhui-Li Sep 11, 2025

Uh oh!

charithaintc Sep 11, 2025

Uh oh!

Jianhui-Li left a comment

Uh oh!

Uh oh!

nbpatel Sep 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

[mlir][xegpu] Add support for vector.multi_reduction and vector.shape_cast SIMT distribution. #157560

[mlir][xegpu] Add support for vector.multi_reduction and vector.shape_cast SIMT distribution. #157560

Uh oh!

Conversation

charithaintc commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charithaintc commented Sep 8, 2025

Uh oh!

Uh oh!

Jianhui-Li Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Jianhui-Li Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

adam-smnk commented Sep 9, 2025

Uh oh!

charithaintc commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Jianhui-Li Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Jianhui-Li left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nbpatel Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[mlir][xegpu] Add support for `vector.multi_reduction` and `vector.shape_cast` SIMT distribution. #157560

[mlir][xegpu] Add support for `vector.multi_reduction` and `vector.shape_cast` SIMT distribution. #157560

charithaintc commented Sep 8, 2025 •

edited

Loading

charithaintc commented Sep 9, 2025 •

edited

Loading

nbpatel Sep 12, 2025 •

edited

Loading