Add support for tile interchange. by hanhanW · Pull Request #8664 · iree-org/iree

hanhanW · 2022-03-29T00:49:47Z

Introduce tile_interchange attribute in lowering_config.
Plumb it through IREE compilation stack, which includes
- Enhance distribution logic to consider tile interchange
- Make LinalgFuse get tile_interchange from lowering config
- Add an E2E matmul test

This can only be controlled by compilation attribute for experimental
purpose for now. It is not used in heuristic configuraion pipeline.

hanhanW · 2022-03-29T00:50:40Z

Not ready for review yet. This depends on #8614 and #8663

- Introduce tile_interchange attribute in lowering_config. - Plumb it through IREE compilation stack, which includes - Enhance distribution logic to consider tile interchange - Make LinalgFuse get tile_interchange from lowering config - Add an E2E matmul test This can only be controlled by compilation attribute for experimental purpose. It is not used in heuristic configuraion pipeline.

MaheshRavishankar

Some minor comments. Overall looks good.

MaheshRavishankar · 2022-03-30T17:00:06Z

 /// operation.
 void eraseCompilationInfo(Operation *op);

+struct IREETileConfig {


I see why you need this. Wondering if there is a way to avoid a struct. The Attribute itself is a struct. Maybe just return the relevant LoweringConfigAttr and extract the information from there.

For tile_interchange, I think the answer is yes. We may extract it from attribute. For tilingSizes, we may have to do something more (like what's done in this method). We only consider distributable loops. If we return the LoweringConfigAttr here, we still need to extract the distributable loops. That would be computed everywhere.

I'm actually wondering if having more than one lowering config is valid or not. If not, we can verify it in a verifier. Then we don't need complicated logic here, and may split it into two methods.

If it's valid, different lowering config attr may be fine for splitting it into two methods. Because they should result into the same tiling sizes and interchange config. The trade-off is have longer compilation time or complicated logic before calling the methods because we have to make sure they result into the same config.

I just realized that we don't need tile_interchange attributes when inserting distribution information. The tWorkloadPerWorkgroup is not affected by the attribute. We may return tiling sizes and tile_interchange separately, but that still need to clarify if it is valid to have two lowering config in one dispatch.

We want to be deliberate in the number of attributes we use. I have carefully settled at 2. One at the op level, and other at the translation unit level (put these two together to get the third). We would rather have a single attribute carry all the information needed here.
I am not sure I understand why the workload per workgroup is not affected by interchange. It should be. Interchange in general can change the axes along which the tiles are distributed. Maybe the particular intechange you are looking at doesnt cause it doesnt change the relative ordering of the distributed dimension.

I was wrong, it's affected by interchange. There is a bug in createRemoveSingleIterationLoopPass which results in segfault. I the loop ranges and queried workload_per_workgroup look fine to me in the pass. There may be deeper issues.

MaheshRavishankar · 2022-03-31T03:28:36Z

 /// operation.
 void eraseCompilationInfo(Operation *op);

+struct IREETileConfig {


We want to be deliberate in the number of attributes we use. I have carefully settled at 2. One at the op level, and other at the translation unit level (put these two together to get the third). We would rather have a single attribute carry all the information needed here.
I am not sure I understand why the workload per workgroup is not affected by interchange. It should be. Interchange in general can change the axes along which the tiles are distributed. Maybe the particular intechange you are looking at doesnt cause it doesnt change the relative ordering of the distributed dimension.

MaheshRavishankar

I am not sure this works. I think the way to make this work is that you keep the same order for the tile sizes. When generating the region for the hal.executable.entry_point you apply interchange to the yielded values.

So if you tile sizes are [ts1, ts2, ts3], and your loop extent is [w1, w2, w3] (which is what the workload is set to at Flow level, then without intechange the number of workgroups is [ceildiv(t1, w1), ceildiv(t2, w2), ceildiv(t3, w3)]. If you apply an interchange of [1, 0, 2] you will get the number of workgroups be [ceildiv(t2, w2), ceildiv(t1, w1), ceildiv(t2, w2)].

(Note that the actual value yielded by the region on the entry point op is reverse of what I wrote above, but I didnt account for it above for sake of clarity).

MaheshRavishankar · 2022-04-05T04:00:51Z

+    interchangedTileSizes.resize(tileSizes.size());
+    for (auto en : llvm::enumerate(interchange)) {
+      // The check is needed because only the distributable dims are set.
+      if (en.value() < tileSizes.size()) {


Not guaranteed that only the innermost dimensions are reductions. You need to look at partitionable loop interface here to get the partitioned loops.

Something I would like to fix is instead of constantly invoking the partitionable loops interface, make partitionable_loops also part of the lowering config. Then you have all the information you need here.

hanhanW · 2022-04-06T23:13:15Z

Thanks for the great input, Mahesh! I now understand distribution mechanism more. This works with not deleting RemoveSingleIterationLoopPass.

I found that we decide what loops to partition at flow level. Anything related to first level of tiling should consider it.

https://github.com/google/iree/blob/d52f47ee8ce56411a2a2c2adf69071165fec9de7/iree/compiler/Dialect/Flow/Transforms/DispatchLinalgOnTensors.cpp#L755-L770

In this context, the partitioned loops have to be leading loops after interchange. I agree that it's better to have it be part of lowering config, but that would need much more clean up. The PR is already large, so I don't want to add heavy stuff on it. I added a hard check, leave a TODO, and would like to clean it up later. WDYT?

MaheshRavishankar · 2022-04-08T15:39:08Z

Thanks for the great input, Mahesh! I now understand distribution mechanism more. This works with not deleting RemoveSingleIterationLoopPass.

I found that we decide what loops to partition at flow level. Anything related to first level of tiling should consider it.

https://github.com/google/iree/blob/d52f47ee8ce56411a2a2c2adf69071165fec9de7/iree/compiler/Dialect/Flow/Transforms/DispatchLinalgOnTensors.cpp#L755-L770

In this context, the partitioned loops have to be leading loops after interchange. I agree that it's better to have it be part of lowering config, but that would need much more clean up. The PR is already large, so I don't want to add heavy stuff on it. I added a hard check, leave a TODO, and would like to clean it up later. WDYT?

Yeah that makes sense. Ill take a look again.

MaheshRavishankar

This looks good. The way it is setup now though you can only interchange the "parallel" loops. Have you tried intechanging the reduction loop with the parallel loops? The getDistributed...TileSizes method might need some change to work with that....

hanhanW · 2022-04-08T18:53:57Z

This looks good. The way it is setup now though you can only interchange the "parallel" loops. Have you tried intechanging the reduction loop with the parallel loops? The getDistributed...TileSizes method might need some change to work with that....

I did not try interchanging reduction loop at distribution level successfully. I ran into issues. IMO, we're not able to apply distribution in parallel -> reduction -> parallel order. Because

It might introduces races. Some threads read/write data in a reduction loop.
We'll stop fusion after tiling reduction loops.

After exploring tile interchange for a while, I think we'll do tile interchange at the L1 tiling level (which is WIP on my plate).

These methods are used externally for auto-tuner, which were accidentally removed in #8664

hanhanW requested a review from MaheshRavishankar as a code owner March 29, 2022 00:49

hanhanW marked this pull request as draft March 29, 2022 00:49

hanhanW force-pushed the tmp branch 2 times, most recently from 217ad53 to bc12637 Compare March 29, 2022 22:42

hanhanW force-pushed the tmp branch from bc12637 to 740fc46 Compare March 29, 2022 22:45

hanhanW marked this pull request as ready for review March 29, 2022 22:46

hanhanW requested review from ThomasRaoux and antiagainst March 29, 2022 22:46

MaheshRavishankar requested changes Mar 30, 2022

View reviewed changes

hanhanW added 2 commits March 30, 2022 16:12

Fix indexing issue

4effee2

recover builders

dea8983

MaheshRavishankar requested changes Mar 31, 2022

View reviewed changes

hanhanW added 3 commits April 1, 2022 14:16

Disable RemoveSingleIterationLoopPass

4fc6995

Merge branch 'main' into tmp

6783465

address comments

5873ddc

hanhanW requested a review from MaheshRavishankar April 4, 2022 18:26

MaheshRavishankar requested changes Apr 5, 2022

View reviewed changes

hanhanW added 2 commits April 6, 2022 13:48

Merge branch 'main' into tmp

a42a061

Fixes interchange issues

cc48ce7

hanhanW added 2 commits April 6, 2022 16:13

move check outside lambda

9637fac

fix function call

9f89e0f

hanhanW requested a review from MaheshRavishankar April 6, 2022 23:14

hanhanW added the kokoro:force-run label Apr 6, 2022

iree-kokoro-bot removed the kokoro:force-run label Apr 6, 2022

MaheshRavishankar approved these changes Apr 8, 2022

View reviewed changes

Merge branch 'main' into tmp

05d3fe8

hanhanW merged commit 7a3bbd3 into iree-org:main Apr 8, 2022

hanhanW deleted the tmp branch April 8, 2022 19:26

hanhanW mentioned this pull request Apr 11, 2022

Fixes for CompilationInfoAttr::get methods. #8842

Merged

hanhanW added a commit that referenced this pull request Apr 11, 2022

Fixes for CompilationInfoAttr::get methods. (#8842)

1ddd917

These methods are used externally for auto-tuner, which were accidentally removed in #8664

Conversation

hanhanW commented Mar 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanhanW commented Mar 29, 2022

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar Mar 30, 2022

Choose a reason for hiding this comment

Uh oh!

hanhanW Mar 30, 2022

Choose a reason for hiding this comment

Uh oh!

hanhanW Mar 31, 2022

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar Mar 31, 2022

Choose a reason for hiding this comment

Uh oh!

hanhanW Apr 1, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MaheshRavishankar Mar 31, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar Apr 5, 2022

Choose a reason for hiding this comment

Uh oh!

hanhanW commented Apr 6, 2022

Uh oh!

MaheshRavishankar commented Apr 8, 2022

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

hanhanW commented Apr 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hanhanW commented Mar 29, 2022 •

edited

Loading