[mlir][linalg] Flag to guard MoveInitOperandsToInput in LinalgFoldUnitExtendDimsPass #157941

sdalvi-quic · 2025-09-10T19:52:14Z

The pattern MoveInitOperandsToInput in the LinalgFoldUnitExtendDimsPass tries to move the init operands to input, to ensure that the element wise ops are canonicalized. I need to run split-reduction optimization before LinalgFoldUnitExtendDimsPass. The split reduction optimization is trying to optimize the Linalg op by reading from the destination operand but, the pattern MoveInitOperandsToInput is reverting it and trying to canonicalize element wise op. This is leading to 10x performance regression. Proposing to guard the pattern MoveInitOperandsToInput to have more control over enabling/disabling it.

IR after Split-Reduction and before LinalgFoldUnitExtendDimsPass

IR after LinalgFoldUnitExtendDimsPass

…entDimsPass, as it causes regression in performance.

github-actions · 2025-09-10T19:52:35Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

sdalvi-quic · 2025-09-10T21:21:00Z

Hi @MaheshRavishankar, @matthias-springer @hanhanW,
Can you help me understand, why was there a need to add the pattern MoveInitOperandsToInput?
Why was this pattern included in LinalgFoldUnitExtendDimsPass pass and not in a separate transformation pass?

CC: @jverma-quic, @javedabsar1, @aankit-quic, @muthubaskaran

llvmbot · 2025-09-10T21:21:07Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-linalg

Author: None (sdalvi-quic)

Changes

The pattern MoveInitOperandsToInput in the LinalgFoldUnitExtendDimsPass tries to move the init operands to input, to ensure that the element wise ops are canonicalized. I need to run split-reduction optimization before LinalgFoldUnitExtendDimsPass. The split reduction optimization is trying to optimize the Linalg op by reading from the destination operand but, the pattern MoveInitOperandsToInput is reverting it and trying to canonicalize element wise op. This is leading to 10x performance regression. Proposing to guard the pattern MoveInitOperandsToInput to have more control over enabling/disabling it.

IR after Split-Reduction and before LinalgFoldUnitExtendDimsPass
<img width="1243" height="290" alt="image" src="https://github.com/user-attachments/assets/4c9a08e3-3d89-4fa8-a484-18eff21eeadb" />

IR after LinalgFoldUnitExtendDimsPass
<img width="1246" height="279" alt="image" src="https://github.com/user-attachments/assets/91b4ddb9-a73c-4f0c-965e-0616066a4b71" />

Full diff: https://github.com/llvm/llvm-project/pull/157941.diff

2 Files Affected:

(modified) mlir/include/mlir/Dialect/Linalg/Passes.td (+4-1)
(modified) mlir/lib/Dialect/Linalg/Transforms/DropUnitDims.cpp (+3-1)

diff --git a/mlir/include/mlir/Dialect/Linalg/Passes.td b/mlir/include/mlir/Dialect/Linalg/Passes.td
index 44da2965e6892..7cd8da6643400 100644
--- a/mlir/include/mlir/Dialect/Linalg/Passes.td
+++ b/mlir/include/mlir/Dialect/Linalg/Passes.td
@@ -145,7 +145,10 @@ def LinalgFoldUnitExtentDimsPass : Pass<"linalg-fold-unit-extent-dims", ""> {
   let options = [
     Option<"useRankReducingSlices", "use-rank-reducing-slices", "bool",
            /*default=*/"false",
-           "Generate rank-reducing slices instead of reassociative reshapes">
+           "Generate rank-reducing slices instead of reassociative reshapes">,
+    Option<"enableMoveInitOperandsToInput", "enable-move-init-operands-to-input", "bool",
+           /*default=*/"true",
+           "Enable MoveInitOperandsToInputPattern transformation">
   ];
   let dependentDialects = [
     "linalg::LinalgDialect", "affine::AffineDialect", "memref::MemRefDialect"
diff --git a/mlir/lib/Dialect/Linalg/Transforms/DropUnitDims.cpp b/mlir/lib/Dialect/Linalg/Transforms/DropUnitDims.cpp
index 22690daa4f9e1..ba466ed3df5cd 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/DropUnitDims.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/DropUnitDims.cpp
@@ -868,7 +868,9 @@ struct LinalgFoldUnitExtentDimsPass
           RankReductionStrategy::ExtractInsertSlice;
     }
     linalg::populateFoldUnitExtentDimsPatterns(patterns, options);
-    populateMoveInitOperandsToInputPattern(patterns);
+    if (enableMoveInitOperandsToInput) {
+      populateMoveInitOperandsToInputPattern(patterns);
+    }
     (void)applyPatternsGreedily(op, std::move(patterns));
   }
 };

MaheshRavishankar · 2025-09-11T01:01:11Z

I'll take a look in more detail soon, but general advise is don't use passes as is. From my perspective passes in MLIR are mostly for testing, especially this pass. Better to write your own pass using the patterns you want.

MaheshRavishankar

Need time to review

sdalvi-quic · 2025-09-18T19:53:29Z

“Ping” - Just a gentle follow-up on the PR. Could you please take a look when you get a chance? Also, I’d appreciate your help in clarifying the query I’ve raised in the comments.

Hi @MaheshRavishankar, @matthias-springer @hanhanW, can you help me understand, why was there a need to add the pattern MoveInitOperandsToInput? Why was this pattern included in LinalgFoldUnitExtendDimsPass pass and not in a separate transformation pass?

CC: @jverma-quic, @javedabsar1, @aankit-quic, @muthubaskaran

MaheshRavishankar · 2025-09-18T20:40:44Z

What happened here. Now I just see the changes to add a pass option...

sdalvi-quic · 2025-09-18T20:53:52Z

Hi @MaheshRavishankar, I might have misunderstood your comment.

This PR adds an option to guard the MoveInitOperandsToInput pattern in the LinalgFoldUnitExtentDimsPass. The reason for this is that the pattern attempts to canonicalize element-wise ops thereby reverting any optimization, which has led to performance regressions in some cases.
By introducing this option, we gain flexibility to enable or disable the pattern depending on the use case. I’ve added more context in the PR description.

What happened here. Now I just see the changes to add a pass option...

javedabsar1

LGTM. Please give a couple of days for others to agree/disagree before landing.

javedabsar1 · 2025-09-19T16:28:56Z

mlir/include/mlir/Dialect/Linalg/Passes.td

+    Option<"enableMoveInitOperandsToInput", "enable-move-init-operands-to-input", "bool",
+           /*default=*/"true",
+           "Enable MoveInitOperandsToInputPattern transformation">
  ];


As you have made default 'true', this is NFC for if (enableMoveInitOperandsToInput) {.
With that context, LGTM.

Yes, thank you!

MaheshRavishankar · 2025-09-19T17:27:44Z

Ok, Actually this is also related to another PR I was reviewing. First of all this input is wrong

%0 = linalg.generic {
    indexing_maps = [<(affine_map<(d0) -> (d0), affine_map<(d0) -> (d0)>],
    iterator_types = ["parallel"]}
    ins(%input : tensor<?xf32>) outs(%output : tensor<?xf32>) {
  ^bb0(%b0 : f32, %b1: f32):
    %1 = arith.maximumf %b0, %b1: f32
    linalg.yield %1 : f32
} -> tensor<?xf32>

You are reading the outs value for a parallel operation. In my book that is undefined behavior. If anything the pattern you are masking off is actually trying to be too smart and making it the correct representation

%empty = tensor.empty(...) : tensor<?xf32>
%0 = linalg.generic {
    indexing_maps = [<(affine_map<(d0) -> (d0), affine_map<(d0) -> (d0)>],
    iterator_types = ["parallel"]}
    ins(%input, %output : tensor<?xf32>, tensor<?xf32>) outs(%empty: tensor<?xf32>) {
  ^bb0(%b0 : f32, %b1: f32, %b2 : ):
    %1 = arith.maximumf %b0, %b1 : f32
    linalg.yield %1 : f32
} -> tensor<?xf32>

So before we go ahead with this change... id strongly suggest looking at your input... Input seems off to me.

sdalvi-quic · 2025-09-19T19:13:06Z

Thank you for raising the concern.
For the input IR, we're initializing the outs(output) outside the linalg.generic op before using it, to ensure value correctness is maintained.
I hope this aligns with what you were referring to in your comment.

Ok, Actually this is also related to another PR I was reviewing. First of all this input is wrong
%0 = linalg.generic {
    indexing_maps = [<(affine_map<(d0) -> (d0), affine_map<(d0) -> (d0)>],
    iterator_types = ["parallel"]}
    ins(%input : tensor<?xf32>) outs(%output : tensor<?xf32>) {
  ^bb0(%b0 : f32, %b1: f32):
    %1 = arith.maximumf %b0, %b1: f32
    linalg.yield %1 : f32
} -> tensor<?xf32>
You are reading the outs value for a parallel operation. In my book that is undefined behavior. If anything the pattern you are masking off is actually trying to be too smart and making it the correct representation
%empty = tensor.empty(...) : tensor<?xf32>
%0 = linalg.generic {
    indexing_maps = [<(affine_map<(d0) -> (d0), affine_map<(d0) -> (d0)>],
    iterator_types = ["parallel"]}
    ins(%input, %output : tensor<?xf32>, tensor<?xf32>) outs(%empty: tensor<?xf32>) {
  ^bb0(%b0 : f32, %b1: f32, %b2 : ):
    %1 = arith.maximumf %b0, %b1 : f32
    linalg.yield %1 : f32
} -> tensor<?xf32>
So before we go ahead with this change... id strongly suggest looking at your input... Input seems off to me.

MaheshRavishankar · 2025-09-19T22:48:23Z

Thank you for raising the concern. For the input IR, we're initializing the outs(output) outside the linalg.generic op before using it, to ensure value correctness is maintained. I hope this aligns with what you were referring to in your comment.

Of course, I understand thats how it works for your full program, but from an operations perspective, there is no guarantee that it will maintain/use the value of outs. It basically then conflates the "meaning" of outs. It will make the outs always read the output, while it might be useful for analysis to say for an operation with all parallel iterator types, only the ins are read. At the very least, its not the best practice.

sdalvi-quic · 2025-09-24T18:47:34Z

Thank you for raising the concern. For the input IR, we're initializing the outs(output) outside the linalg.generic op before using it, to ensure value correctness is maintained. I hope this aligns with what you were referring to in your comment.

Of course, I understand thats how it works for your full program, but from an operations perspective, there is no guarantee that it will maintain/use the value of outs. It basically then conflates the "meaning" of outs. It will make the outs always read the output, while it might be useful for analysis to say for an operation with all parallel iterator types, only the ins are read. At the very least, its not the best practice.

I see.

I checked few mlir tests where I noticed that we do read from 'outs'. Example mlir/test/Dialect/Linalg/transform-tile-reduction.mlir test, the generated IR reads from the 'outs' operand inside a linalg.generic with parallel iterators.

I see another test which does the same mlir/test/Dialect/Linalg/transform-op-specialize.mlir.

Please let me know if I am missing something - otherwise can we proceed with the merge as it is NFC?

sdalvi-quic · 2025-10-01T18:01:35Z

After discussion with the team, we are planning to do the changes to the IR such that instead of reading from the 'out' we try to add the variable as part of both ins and outs and read it from the ins. This won't break the semantics and won't add redundant new tensors (memcopy and mem alloc) in the loop and won't hurt the performance numbers. Adding example for reference.

I will close this ticket. Thank you for the valuable input.

Add flag to guard MoveInitOperandsToInputPattern in LinalgFoldUnitExt…

e7061c5

…entDimsPass, as it causes regression in performance.

sdalvi-quic and others added 2 commits September 10, 2025 13:10

Enable MoveInitOperandsToInputPattern for original behaviour

b4dca9d

Merge branch 'main' into sdalvi/flag-move-init-operands-to-input

134831a

sdalvi-quic marked this pull request as ready for review September 10, 2025 21:20

sdalvi-quic requested review from dcaballe, MaheshRavishankar, nicolasvasilache and rengolin as code owners September 10, 2025 21:20

llvmbot added mlir:linalg mlir labels Sep 10, 2025

MaheshRavishankar requested changes Sep 11, 2025

View reviewed changes

aankit-ca requested a review from javedabsar1 September 18, 2025 23:52

javedabsar1 approved these changes Sep 19, 2025

View reviewed changes

javedabsar1 changed the title ~~Flag to guard MoveInitOperandsToInput in LinalgFoldUnitExtendDimsPass~~ [mlir][linalg] Flag to guard MoveInitOperandsToInput in LinalgFoldUnitExtendDimsPass Sep 19, 2025

Merge branch 'main' into sdalvi/flag-move-init-operands-to-input

1de464a

sdalvi-quic closed this Oct 1, 2025

[mlir][linalg] Flag to guard MoveInitOperandsToInput in LinalgFoldUnitExtendDimsPass #157941

[mlir][linalg] Flag to guard MoveInitOperandsToInput in LinalgFoldUnitExtendDimsPass #157941

Uh oh!

Conversation

sdalvi-quic commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 10, 2025

Uh oh!

sdalvi-quic commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaheshRavishankar commented Sep 11, 2025

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

sdalvi-quic commented Sep 18, 2025

Uh oh!

MaheshRavishankar commented Sep 18, 2025

Uh oh!

sdalvi-quic commented Sep 18, 2025

Uh oh!

javedabsar1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javedabsar1 Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

sdalvi-quic Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar commented Sep 19, 2025

Uh oh!

sdalvi-quic commented Sep 19, 2025

Uh oh!

MaheshRavishankar commented Sep 19, 2025

Uh oh!

sdalvi-quic commented Sep 24, 2025

Uh oh!

sdalvi-quic commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sdalvi-quic commented Sep 10, 2025 •

edited

Loading

sdalvi-quic commented Sep 10, 2025 •

edited

Loading

llvmbot commented Sep 10, 2025 •

edited

Loading

javedabsar1 left a comment •

edited

Loading

sdalvi-quic commented Oct 1, 2025 •

edited

Loading