-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[mlir][xegpu] Add support for vector.multi_reduction
and vector.shape_cast
SIMT distribution.
#157560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][xegpu] Add support for vector.multi_reduction
and vector.shape_cast
SIMT distribution.
#157560
Conversation
@adam-smnk We decided to move the muti reduction distribution inside XeGPU subgroup distribution after some discussion. Reason being certain cases of reduction require accessing the layouts of the source of reduction which upstream vector distribution does not allow us to do. I will close #154438 |
vector.multi_reduction
SIMT distribution. vector.multi_reduction
and vector.shape_cast
SIMT distribution.
// dimensions are not distributed. | ||
unsigned distributionStart = originalType.getRank() - laneLayout.size(); | ||
unsigned distributionStart = | ||
originalType.getRank() - effectiveLaneLayout.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should here assert "originalType.getRank()== effectiveLaneLayout.size()"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think caller should take care of that. This function simply distribute the innermost laneLayout.size()
dimensions.
|
||
// Check if the dimension can be distributed evenly. | ||
if (dim % laneLayout[i - distributionStart] != 0) | ||
if (dim % effectiveLaneLayout[i - distributionStart] != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to handle when dim size is 1, as the result of shape_cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling this function for such cases will result in failure. caller (i.e. pattern) should handle the error and decide how to proceed.
@charithaintc Just as a side note, couldn't |
We found that shape cast also need to access xegpu layouts in most cases. So we can not rely on vector distribution infra (plus the pattern there is naive, does not fit our use cases). The shape cast here has high pattern benefit, so it should go first. I think we will have to adopt this approach until we have something working. |
if (!sourceLayout) | ||
return rewriter.notifyMatchFailure( | ||
warpOp, "the source of shape_cast op lacks distribution layout"); | ||
FailureOr<VectorType> sourceDistTypeOrFailure = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How getDistVectTypeBasedOnLaneLayout work for the layout_in0? Could you please add the following as a test case and handle it?
layout_in0 = #xegpu.layout<lane_layout = [16, 1], lane_data = [1, 1]>
%res0 = vector.shape_cast %in0 {layout_res0 = #xegpu.slice<#xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>, dims = [0]> }
: vector<1x2xf32> to vector<2xf32>
Or maybe we just limit shape_cast only support shorter vector (with slice attribute) to wider vector (parent attribute)?
In any case, we should check inputLayout is a slice of resultLayout, or the reverse if we allow shape_cast to narrow vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this code example getDistVectTypeBasedOnLaneLayout
will return a failure. 1x2 is not distributable with 16, 1.
I added a check with isSliceOf
. please have a look.
Also planning to add few more test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
return rewriter.notifyMatchFailure( | ||
warpOp, "shape_cast is rank increasing but result layout is not a " | ||
"slice of source layout"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shape cast pattern needs to check only unit dims can be squeezed/expanded
Add support for distributing the
vector.multi_reduction
operation across lanes in a warp. Currently only 2D to 1D reductions are supported. Given layouts for the source and accumulator vectors,MultiDimReductionOp
to a sequence ofReductionOp
s inside the warp op body. Actual distribution will be done byWarpOpReduction
pattern.ReductionOp
s.PR also adds support for distributing
vector.shape_cast
based on layouts.