[mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` #75315

matthias-springer · 2023-12-13T10:59:50Z

Infer the side effects of gpu.launch from its body.

llvmbot · 2023-12-13T11:00:20Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-gpu

Author: Matthias Springer (matthias-springer)

Changes

Infer the side effects of gpu.launch from its body.

Full diff: https://github.com/llvm/llvm-project/pull/75315.diff

1 Files Affected:

(modified) mlir/include/mlir/Dialect/GPU/IR/GPUOps.td (+2-1)

diff --git a/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td b/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
index 7cad1cd89fd63..d7ab397fb771c 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
+++ b/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
@@ -672,7 +672,8 @@ def GPU_LaunchFuncOp :GPU_Op<"launch_func", [
 
 def GPU_LaunchOp : GPU_Op<"launch", [
       AutomaticAllocationScope, AttrSizedOperandSegments, GPU_AsyncOpInterface,
-      DeclareOpInterfaceMethods<InferIntRangeInterface>]>,
+      DeclareOpInterfaceMethods<InferIntRangeInterface>,
+      RecursiveMemoryEffects]>,
     Arguments<(ins Variadic<GPU_AsyncToken>:$asyncDependencies,
                Index:$gridSizeX, Index:$gridSizeY, Index:$gridSizeZ,
                Index:$blockSizeX, Index:$blockSizeY, Index:$blockSizeZ,

grypp · 2023-12-13T20:02:51Z

Shall we have also a test to show the motivation of this PR?

Infer the side effects of `gpu.launch` from its body.

matthias-springer · 2023-12-14T00:25:30Z

Added a test case.

grypp · 2023-12-14T09:10:49Z

mlir/test/Dialect/GPU/canonicalize.mlir

+
+// -----
+
+// The GPU kernel does not have any side effecting ops, so the entire


I'm not very familiar with RecursiveMemoryEffects, what exactly it does?

Is it safe with managed memory or unified memory? In these cases, a pointer address between gpu and cpu remains the same. Once can imagine there is a llvm.ptr and it can used to build memref descriptor. It can be used inside gpu.launch body, or something else, is it safe in these cases?

RecursiveMemoryEffects indicates that the side effects of this op are defined by the side effects of nested ops.

In essence: gpu.launch by itself does not have any side effects. E.g., it does not read/write to memory etc. But the GPU kernel may have ops that read/write to memory. When we use the C++ API to query the side effects of a gpu.launch, it will automatically check the side effects of the ops inside the kernel and return those.

If the ops in the kernel have no side effects, then the gpu.launch has no side effects. That's why the gpu.launch in the newly added test case folds away. It does not read/write to memory etc. It can just be removed.

Once can imagine there is a llvm.ptr and it can used to build memref descriptor. It can be used inside gpu.launch body, or something else, is it safe in these cases?

When you use that memref value, e.g., with a memref.load, you have an op with a side effect and the entire kernel will be considered to have that side effect.

Thanks for your patience and explanation.

gpu.launch doesn't write to memory directly, it performs the following actions. It'll still need to read kernel configurations even wtih empty body. However, I think it's safe to fold.

Reads host memory for kernel configuration and parameters.

Copies kernel parameters to the device memory.

Waits for stream(s).

Thinking about the 3rd option with IR below. If taskC (that has empty body) is folded , could taskD run before taskB due to not waiting for stream2 anymore? (Currently, MLIR utilizes single stream, but one can imagine implementing support for multiple streams .)

%taskA = gpu.launch async [%stream1] ... %taskB = gpu.launch async [%stream2] ... %taskC = gpu.launch async [%stream1, %stream2] ... %taskD = gpu.launch async [%stream1] ...

task execution graph:

a b \ / c | d

Is it overkill to think about this case?

The kernel configuration and parameters (e.g., number of blocks, etc.) are passed to gpu.launch as integer operands, so the op does not read anything from memory (memref) at this point, right? (The op may be lowered to something that reads from memory; that op would then have a memory side effect.)

gpu.launch takes async token arguments and produces an async token result. If there is a particular order in which kernels must be launched (and waited for), I would expect this to be modeled with async tokens. IMO, execution order due to "stream is available or not" does not have to be preserved when folding gpu.launch ops.

%taskA = gpu.launch async [] ... %taskB = gpu.launch async [] ... %taskC = gpu.launch async [%taskA, %taskB] ... %taskD = gpu.launch async [%taskC] ...

In the above example, %taskC can canonicalized away if it has no side effects, but then %taskD will depend on %taskA and %taskB.

We discussed offline. The way CUDA streams are expressed in the GPU dialect is via async tokens. It would not make sense for gpu.launch and other ops to take an additional %stream operand in the future. We use async tokens (%task) for that.

Coming back to RecursiveMemoryEffects, this trait should be safe to attach. There are no side effects wrt. streams, because streams are encoded with async tokens.

Empty GPU kernels like the one in the test case will not simply DCE away if they impose a dependency via async tokens. (A folder or canonicalization pattern may find a way to remove empty kernels in a way that preserves the dependencies expressed by the async tokens.)

matthias-springer requested review from joker-eph, grypp and chsigg December 13, 2023 10:59

llvmbot added mlir:gpu mlir labels Dec 13, 2023

[mlir][GPU] Add RecursiveMemoryEffects to gpu.launch

568697f

Infer the side effects of `gpu.launch` from its body.

matthias-springer force-pushed the gpu_launch_side_effects branch from dcd70f8 to 568697f Compare December 14, 2023 00:24

grypp reviewed Dec 14, 2023

View reviewed changes

matthias-springer requested a review from grypp December 16, 2023 03:05

joker-eph approved these changes Dec 19, 2023

View reviewed changes

matthias-springer merged commit f709642 into llvm:main Dec 20, 2023
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` #75315

[mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` #75315

matthias-springer commented Dec 13, 2023

llvmbot commented Dec 13, 2023 •

edited

Loading

grypp commented Dec 13, 2023

matthias-springer commented Dec 14, 2023

grypp Dec 14, 2023

matthias-springer Dec 15, 2023 •

edited

Loading

grypp Dec 17, 2023 •

edited

Loading

matthias-springer Dec 18, 2023

matthias-springer Dec 19, 2023


		// -----

		// The GPU kernel does not have any side effecting ops, so the entire

[mlir][GPU] Add RecursiveMemoryEffects to gpu.launch #75315

[mlir][GPU] Add RecursiveMemoryEffects to gpu.launch #75315

Conversation

matthias-springer commented Dec 13, 2023

llvmbot commented Dec 13, 2023 • edited Loading

grypp commented Dec 13, 2023

matthias-springer commented Dec 14, 2023

grypp Dec 14, 2023

Choose a reason for hiding this comment

matthias-springer Dec 15, 2023 • edited Loading

Choose a reason for hiding this comment

grypp Dec 17, 2023 • edited Loading

Choose a reason for hiding this comment

matthias-springer Dec 18, 2023

Choose a reason for hiding this comment

matthias-springer Dec 19, 2023

Choose a reason for hiding this comment

[mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` #75315

[mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` #75315

llvmbot commented Dec 13, 2023 •

edited

Loading

matthias-springer Dec 15, 2023 •

edited

Loading

grypp Dec 17, 2023 •

edited

Loading