Enable more vectorizations in LLVMCPUTileFuseAndVectorizePass #7652

hanhanW · 2021-11-12T22:35:52Z

Vectorization will turn Linalg ops to vector ops and arith ops. Since we
don't propagate information through arith ops during bufferization. We
can't unconditionally vectorize all the ops. To prevent creating extra
memref.alloc ops, we can't tile along reduction dims.

The next step is to get unroll vector pass in, so we can vectorize more
ops.

This PR improves the performance of transformer-benchmark from 58 ms to
33 ms.

Vectorization will turn Linalg ops to vector ops and arith ops. Since we don't propagate information through arith ops during bufferization. We can't unconditionally vectorize all the ops. To prevent creating extra memref.alloc ops, we can't tile along reduction dims. The next step is to get unroll vector pass in, so we can vectorize more ops. This PR improves the performance of transformer-benchmark from 58 ms to 33 ms.

hanhanW · 2021-11-12T22:36:52Z

We are able to pass all the tests when turning the flag on by default.

I'd like to do it after we're able to tile reduction loop in first level of tiling and enable unrolling vector passes.

MaheshRavishankar · 2021-11-13T00:37:13Z

iree/compiler/Codegen/LLVMCPU/LLVMCPUTileFuseAndVectorizeLinalgTensorOps.cpp

@@ -171,10 +149,36 @@ void LLVMCPUTileFuseAndVectorizePass::runOnOperation() {

  // Tile and fuse for vector sizes, then tile reduction loops. We don't rely on
  // unroll vector pass because it could introduce register pressure.
+  bool hasMatmulAndIsVectorizable = true;


Just an FYI, this should move into KernelDispatch.cpp so that we can enable this path for cases where this is true within a dispatch region.

MaheshRavishankar · 2021-11-13T00:38:11Z

We are able to pass all the tests when turning the flag on by default.

I'd like to do it after we're able to tile reduction loop in first level of tiling and enable unrolling vector passes.

Thats a good idea. At that time we probably have to also reduce the L1 tile size used here by default to reduce register pressure (of course search would also find that out probably, but good to have a decent default)

hanhanW requested review from antiagainst, MaheshRavishankar and ThomasRaoux November 12, 2021 22:35

google-cla bot added the cla: yes label Nov 12, 2021

MaheshRavishankar approved these changes Nov 13, 2021

View reviewed changes

ThomasRaoux approved these changes Nov 13, 2021

View reviewed changes

MaheshRavishankar merged commit 9072698 into iree-org:main Nov 13, 2021

hanhanW deleted the tfv branch November 13, 2021 21:00

GMNGeoffrey mentioned this pull request Nov 15, 2021

Merge main -> google #7664

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable more vectorizations in LLVMCPUTileFuseAndVectorizePass #7652

Enable more vectorizations in LLVMCPUTileFuseAndVectorizePass #7652

hanhanW commented Nov 12, 2021

hanhanW commented Nov 12, 2021

MaheshRavishankar Nov 13, 2021

MaheshRavishankar commented Nov 13, 2021

Enable more vectorizations in LLVMCPUTileFuseAndVectorizePass #7652

Enable more vectorizations in LLVMCPUTileFuseAndVectorizePass #7652

Conversation

hanhanW commented Nov 12, 2021

hanhanW commented Nov 12, 2021

MaheshRavishankar Nov 13, 2021

Choose a reason for hiding this comment

MaheshRavishankar commented Nov 13, 2021