-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[mlir][gpu] Add gpu.subgroup_uniform
op
#157743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -3255,4 +3255,37 @@ def GPU_SubgroupBroadcastOp : GPU_Op<"subgroup_broadcast", | |||||
let hasVerifier = 1; | ||||||
} | ||||||
|
||||||
def GPU_SubgroupUniformOp : GPU_Op<"subgroup_uniform", | ||||||
[Pure, AllTypesMatch<["result", "src"]>, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
It's not clear to me that this is all speculatable with respect to the property it provides? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See my other comment on uniformity, but yeah, I think we can either make it speculatable if we require all lanes to be active at definition or make it non-speculatable but without requiring all lanes |
||||||
DeclareOpInterfaceMethods<InferIntRangeInterface, ["inferResultRanges"]>] # | ||||||
ElementwiseMappable.traits>, | ||||||
Arguments<(ins AnyType:$src)> { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... Since when? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since never, but I would like to standardize a single form of expressing these upstream. So maybe NIT is not the right request, but rather would you consider? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah Yeah, we might want to discuss that somewhere other than PR comments There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A lot of the code I've seen upstream uses |
||||||
let summary = "Assumes value is unform across the lanes in subgroup"; | ||||||
let description = [{ | ||||||
The "subgroup_uniform" op assumes that the value is uniform across all lanes | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd include "assume" in the op name to make it clear what this op is about. |
||||||
in a subgroup. This means that all active lanes in the subgroup are expected | ||||||
to have the same value. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you clarify the uniformity is for input or output? it is for all lanes, or for all active lanes. |
||||||
|
||||||
This op can be used to inform the compiler that a value is uniform across | ||||||
the subgroup, enabling optimizations. The result is poison if the value | ||||||
is not actually uniform. | ||||||
|
||||||
This op is functionally no-op as no valid program should change its | ||||||
semantics if this op is removed. Backends can choose to ignore it or do | ||||||
some optimizations (e.g. put value into scalar registers). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
This op can be freely speculated across structured control flow as parent | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
active mask is always superset of current mask and if can hoist input | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
calculation you can hoist the operation itself as well. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Example: | ||||||
|
||||||
```mlir | ||||||
%1 = gpu.subgroup_uniform %0 : f32 | ||||||
``` | ||||||
}]; | ||||||
let results = (outs AnyType:$result); | ||||||
let assemblyFormat = "$src attr-dict `:` type($result)"; | ||||||
} | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add a verifier checking There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this fits the local verification only rule: https://mlir.llvm.org/getting_started/DeveloperGuide/#ir-verifier There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And we don't actually need a has-one-use constraint? A value can be made/marked uniform but also not - that's not an error, that's just a missed optimization opportunity There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me,
In However, I do agree this is stretching what the guide says about the verifier being local, because while we are not looking at the producer of an operand, nor traversing the use-def chain, the effect of the constraint is not local. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Has the exact same one-of-these-is-underannotated semantics as
Here, One could, of course, have a rewrite that maximizes assumptions - replacing In terms of analysis, in your example, the value That same interchangability logic from (Operationally, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation, that makes it clear, and I'm dropping the has one use proposal.
I argue that Taking the case of Let's take a %1 = memref.assume_alignment %0, 8 : memref<4x4xf16>
%2 = memref.assume_alignment %0, 32 : memref<4x4xf16> memref.assume_alignment %0, 8 : memref<4x4xf16>
memref.assume_alignment %0, 32 : memref<4x4xf16> On 2, it's clear the assumption is on On the other hand, on 1, I don't think one should combine them. They have different effects that should be scoped to the uses of each of the operations, thus they are more like value promotions. Just to be clear, I'm not saying we can't have that, I'm saying that's not the description of the op. However, since, it appears there's consensus on accepting assumes in such form, I'm not blocking the PR on this issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't see the argument here? You seem to be repeatingly making claims that I don't understand the rationale behind and I don't see much explanation on the fundamentals behind it.
Actually no:
How so? We encode an assumption on a value, I don't quite see why this makes it "not a compiler assumption". You're identifying yourself later that the operation is a "nop": it's only effect is on the compiler itself, as a metadata that is inserted in the dataflow.
Right, one probably shouldn't: the assumption is dataflow based instead of path-based.
which isn't an immediate UB when violated, you can generate later:
at a point where %1 is used in a manner that triggers UB. This dataflow vs path sensitivity, alongside with deferred UB is pretty key to understand the design space here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the property we want to capture is something like this: If the operand is not uniform at definition, you should probably use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
... or poison. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After some internal discussion, even uniform-at-definition requirement may not be enough. Consider the following example:
We can, conservatively, request all lanes to be active at def in addition to input to be uniform at def (and it even covers my original usecase I intended this op for, as it requires all lanes active anyway for unrelated reasons), but I feel there should be a less restrictive definition which is still useful. |
||||||
|
||||||
#endif // GPU_OPS |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -552,3 +552,11 @@ func.func @subgroup_broadcast(%arg0 : f32, %arg1 : i32) -> (f32, f32) { | |
%1 = gpu.subgroup_broadcast %arg0, specific_lane %arg1 : f32 | ||
func.return %0, %1 : f32, f32 | ||
} | ||
|
||
// CHECK-LABEL: func @subgroup_uniform | ||
// CHECK-SAME: (%[[ARG:.*]]: f32) | ||
func.func @subgroup_uniform(%arg0 : f32) -> f32 { | ||
// CHECK: gpu.subgroup_uniform %[[ARG]] : f32 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you please add the following before the operation - just to remind what a typical use case looks like: %arg0 = [thread ID x] / [subgroup size]; // instead of receiving from argument. |
||
%0 = gpu.subgroup_uniform %arg0 : f32 | ||
func.return %0 : f32 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also: Could you add one more test to show the effects of speculation in related pass test? |
||
} |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider
%ret_val = assume_uniform @scope=subgroup %uniform_val?
Rational: 1) prefix assume* imply compiler hint, to be consistent with other dialects like memref.assume*. 2) extensible scope for potential needs.