[LinalgExt] Generalize attention tiling interface implementation #17408

Groverkss · 2024-05-15T14:50:27Z

This patch generalizes tiling implementation for AttentionOp. Before, only the batch and M dimension of attention could be tiled. This patch instead, allows tiling of N dimension as well as allows transposition based on indexing maps (hardcoded for now).

Tiling on dimension N is disabled in CPU backend for now, because TileAndDecomposeAttention pass is hardecoded with dimensions. This will be fixed once we implement reduction tiling interface for it (after llvm/llvm-project#92624)

hanhanW

Some high-level questions:

Why do we need indexing_maps for the attention op?
Could we have more than 1 dimension for batch/m/n/k1/k2?
It looks like tiling on reduction dims in getTiledImplementation() is hard. Do you plan to implement PartialReductionOpInterface for the attention op?

compiler/src/iree/compiler/Dialect/LinalgExt/Utils/IndexingUtils.h

compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/TilingInterfaceImpl.cpp

compiler/src/iree/compiler/Codegen/LLVMCPU/KernelDispatch.cpp

Groverkss · 2024-05-15T18:38:25Z

Some high-level questions:

Why do we need indexing_maps for the attention op?

Could we have more than 1 dimension for batch/m/n/k1/k2?

It looks like tiling on reduction dims in getTiledImplementation() is hard. Do you plan to implement PartialReductionOpInterface for the attention op?

The indexing maps are so we can do more fusions with the attention op. One of the fusions that we want to do is fusing transposes with AttentionOp. This PR just hardcodes them, but eventually, they are going to be replaced with indexing maps on the op.
Once I add indexing maps, it should be possible to have more than 1 dimensions for batch/m/n/k1/k2. So eventually yes, but not right now. The pytorch SDPA op actually has 2 batch dimensions, so this is it is common to have multiple dimensions for these things.
Eventually yes. The plan is to move TileAndDecomposeAttention into Tiling (PartialReductionOpInterface) and Decompose (AggregateOpInterface). This needs some changes upstream, but eventually, this will happen.

hanhanW

Thanks for filling in me the context, it looks more reasonable to me now. The idea sounds okay to me. Is the PR ready for review? I'm seeing some code being commented out in AttentionOp::verify

Groverkss · 2024-05-16T19:45:25Z

Thanks for filling in me the context, it looks more reasonable to me now. The idea sounds okay to me. Is the PR ready for review? I'm seeing some code being commented out in AttentionOp::verify

Fixed. It should be ready to review now

compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/TilingInterfaceImpl.cpp

hanhanW · 2024-05-20T18:03:46Z

compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/TilingInterfaceImpl.cpp

+      outputOffsets.push_back(offsets[dim.getPosition()]);
+      outputSizes.push_back(sizes[dim.getPosition()]);
+    }
+    return {outputOffsets, outputSizes, outputStrides};


nit: use std::make_tuple()

I'm not sure about this. I prefer this since the tuple type is already explicitly typed out in the function definition. So there is no ambiguity. Can change if there is a preference.

compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/TilingInterfaceImpl.cpp

compiler/src/iree/compiler/Codegen/LLVMCPU/KernelDispatch.cpp

hanhanW · 2024-05-20T18:19:44Z

compiler/src/iree/compiler/Dialect/LinalgExt/IR/LinalgExtOps.cpp

-  }
-  if (failed(checkShapeRank(op, "output", outputType, rankToCompareWith))) {
+  // Check shape compatibility based on indexing maps.
+  SmallVector<int64_t> shape(getIterationDomainRank(), -1);


The concern is having magic number -1; expect it to be different from ShapedType::kDynamic. They could have the same value. (Actually ShapedType::kDynamic was -1 in the past, and it is internalized to something else today.) What I'd suggest is having a method to compute the shape first, and then you can compare the result of affineMap.compose(shape) and the target tensor shape. It avoids the -1 magic number.

I don't think I have a way of computing the shape without using these indexing maps. I added a foundDims array so there is no magic number.

compiler/src/iree/compiler/Dialect/LinalgExt/Transforms/TilingInterfaceImpl.cpp

…e-org#17408) This patch generalizes tiling implementation for AttentionOp. Before, only the batch and M dimension of attention could be tiled. This patch instead, allows tiling of N dimension as well as allows transposition based on indexing maps (hardcoded for now). Tiling on dimension N is disabled in CPU backend for now, because TileAndDecomposeAttention pass is hardecoded with dimensions. This will be fixed once we implement reduction tiling interface for it (after llvm/llvm-project#92624)

…e-org#17408) This patch generalizes tiling implementation for AttentionOp. Before, only the batch and M dimension of attention could be tiled. This patch instead, allows tiling of N dimension as well as allows transposition based on indexing maps (hardcoded for now). Tiling on dimension N is disabled in CPU backend for now, because TileAndDecomposeAttention pass is hardecoded with dimensions. This will be fixed once we implement reduction tiling interface for it (after llvm/llvm-project#92624) Signed-off-by: Lubo Litchev <lubol@google.com>

Groverkss requested review from hanhanW and MaheshRavishankar as code owners May 15, 2024 14:50

Groverkss requested review from kuhar and qedawkins May 15, 2024 14:53

hanhanW reviewed May 15, 2024

View reviewed changes

Groverkss force-pushed the generalize-attention-tiling branch from adf5386 to 07da396 Compare May 16, 2024 11:20

hanhanW reviewed May 16, 2024

View reviewed changes

Groverkss requested a review from hanhanW May 16, 2024 19:34

Groverkss force-pushed the generalize-attention-tiling branch 2 times, most recently from 98bc9c9 to db189bc Compare May 16, 2024 19:44

hanhanW reviewed May 20, 2024

View reviewed changes

Groverkss added 6 commits May 22, 2024 09:37

Commit and Sign

2b61085

Fix tests

17c5a83

fix bazel

89c9305

Disable tiling on N dimension for now

dae3c6a

fix fix fix

28ecb42

Address comments

dc6517b

Groverkss force-pushed the generalize-attention-tiling branch from 186a3b0 to dc6517b Compare May 22, 2024 11:12

Groverkss requested a review from hanhanW May 22, 2024 11:12

antiagainst mentioned this pull request May 22, 2024

[attention] Extend attention to fuse transpose nod-ai/SHARK-ModelDev#669

Closed

hanhanW approved these changes May 22, 2024

View reviewed changes

Groverkss merged commit 9fe159d into iree-org:main May 22, 2024
56 of 57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LinalgExt] Generalize attention tiling interface implementation #17408

[LinalgExt] Generalize attention tiling interface implementation #17408

Groverkss commented May 15, 2024 •

edited

Loading

hanhanW left a comment

Groverkss commented May 15, 2024

hanhanW left a comment

Groverkss commented May 16, 2024

hanhanW May 20, 2024

Groverkss May 22, 2024

hanhanW May 20, 2024

Groverkss May 22, 2024

[LinalgExt] Generalize attention tiling interface implementation #17408

[LinalgExt] Generalize attention tiling interface implementation #17408

Conversation

Groverkss commented May 15, 2024 • edited Loading

hanhanW left a comment

Choose a reason for hiding this comment

Groverkss commented May 15, 2024

hanhanW left a comment

Choose a reason for hiding this comment

Groverkss commented May 16, 2024

hanhanW May 20, 2024

Choose a reason for hiding this comment

Groverkss May 22, 2024

Choose a reason for hiding this comment

hanhanW May 20, 2024

Choose a reason for hiding this comment

Groverkss May 22, 2024

Choose a reason for hiding this comment

Groverkss commented May 15, 2024 •

edited

Loading