[LoopFusion] Assert failure in the issue 80301 #167837

sushgokh · 2025-11-13T07:31:37Z

Concerning the test 'IR.ll' mentioned in the issue(same has been added as test case), we have the following scenario before fusion:

GC1 --> { L1 --> L2} -- 
|                     |
|<-------------------- 
V 
GC2 --> {L3 --> L4} --
|                    |
|<--------------------
V
exit

The candidate set before fusion, in the same sequence is {L1, L2, L3, L4}.

L1 and L2 get fused thus giving us the single loop L12 with guard condition GC1. L3 and L4 are considered not to have a guard condition.

When inserting L12 back into the candidate set again, there was an assert that if entry block of L12(i.e. GC1) dominates entry block of {L3-->L4} (i.e. L3's preheader) then entry block of {L3-->L4} must postdominate entry block of L12. While this is true for fusing candidates, this is not true for inserting candidates into the candidate set(or preparing the new candidate set)(or setting the order of the candidates for the next fusion). This was not happening for the above case and hence the assert failure.

This patch tries to resolve this issue by removing the assert and using the assert's condition for deciding the relative order of candidates in the set.

Fixes #80301

Concerning the test 'IR.ll' mentioned in the issue(same has been added as test case), we have the following scenario before fusion: GC1 --> { L1 --> L2} -- | | |<-------------------- V GC2 --> {L3 --> L4} -- | | |<-------------------- V exit The candidate set before fusion, in the same sequence is {L1, L2, L3, L4}. L1 and L2 get fused thus giving us the single loop L12 with guard condition GC1. L3 and L4 are considered not to have a guard condition. When inserting L12 back into the candidate set again, there was an assert that if entry block of L12(i.e. GC1) dominates entry block of {L3-->L4} (i.e. L3's preheader) then entry block of {L3-->L4} must postdominate entry block of L12. While this is true for fusing candidates, this is not true for inserting candidates into the candidate set(or preparing the new candidate set)(or setting the order of the candidates for the next fusion). This was not happening for the above case and hence the assert failure. This patch tries to resolve this issue by removing the assert and using the assert's condition for deciding the relative order of candidates in the set.

llvmbot · 2025-11-13T07:32:16Z

@llvm/pr-subscribers-llvm-transforms

Author: Sushant Gokhale (sushgokh)

Changes

Concerning the test 'IR.ll' mentioned in the issue(same has been added as test case), we have the following scenario before fusion:

GC1 --&gt; { L1 --&gt; L2} -- 
|                     |
|&lt;-------------------- 
V 
GC2 --&gt; {L3 --&gt; L4} --
|                    |
|&lt;--------------------
V
exit

The candidate set before fusion, in the same sequence is {L1, L2, L3, L4}.

L1 and L2 get fused thus giving us the single loop L12 with guard condition GC1. L3 and L4 are considered not to have a guard condition.

When inserting L12 back into the candidate set again, there was an assert that if entry block of L12(i.e. GC1) dominates entry block of {L3-->L4} (i.e. L3's preheader) then entry block of {L3-->L4} must postdominate entry block of L12. While this is true for fusing candidates, this is not true for inserting candidates into the candidate set(or preparing the new candidate set)(or setting the order of the candidates for the next fusion). This was not happening for the above case and hence the assert failure.

This patch tries to resolve this issue by removing the assert and using the assert's condition for deciding the relative order of candidates in the set.

Full diff: https://github.com/llvm/llvm-project/pull/167837.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopFuse.cpp (+5-7)
(modified) llvm/test/Transforms/LoopFusion/guarded.ll (+139)

diff --git a/llvm/lib/Transforms/Scalar/LoopFuse.cpp b/llvm/lib/Transforms/Scalar/LoopFuse.cpp
index 9ffa602416b05..e3a44a4976b7d 100644
--- a/llvm/lib/Transforms/Scalar/LoopFuse.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopFuse.cpp
@@ -406,16 +406,14 @@ struct FusionCandidateCompare {
 
     // Do this compare first so if LHS == RHS, function returns false.
     if (DT->dominates(RHSEntryBlock, LHSEntryBlock)) {
-      // RHS dominates LHS
-      // Verify LHS post-dominates RHS
-      assert(LHS.PDT->dominates(LHSEntryBlock, RHSEntryBlock));
-      return false;
+      // RHS dominates LHS.
+      // Check if LHS post-dominates RHS.
+      return !LHS.PDT->dominates(LHSEntryBlock, RHSEntryBlock);
     }
 
     if (DT->dominates(LHSEntryBlock, RHSEntryBlock)) {
-      // Verify RHS Postdominates LHS
-      assert(LHS.PDT->dominates(RHSEntryBlock, LHSEntryBlock));
-      return true;
+      // Check if RHS postdominates LHS.
+      return LHS.PDT->dominates(RHSEntryBlock, LHSEntryBlock);
     }
 
     // If two FusionCandidates are in the same level of dominator tree,
diff --git a/llvm/test/Transforms/LoopFusion/guarded.ll b/llvm/test/Transforms/LoopFusion/guarded.ll
index 863d9b1bb4e86..9d27adfe1fe8f 100644
--- a/llvm/test/Transforms/LoopFusion/guarded.ll
+++ b/llvm/test/Transforms/LoopFusion/guarded.ll
@@ -390,3 +390,142 @@ for.2:                                       ; preds = %for.cond13, %for.body6,
 exit:                                          ; preds = %for.cond13
   ret void
 }
+
+define void @foo(ptr noalias %A, ptr noalias %B, i64 %N) {
+; CHECK-LABEL: @foo(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP4:%.*]] = icmp slt i64 0, [[N:%.*]]
+; CHECK-NEXT:    br i1 [[CMP4]], label [[BB3:%.*]], label [[BB14:%.*]]
+; CHECK:       bb3:
+; CHECK-NEXT:    br label [[BB5:%.*]]
+; CHECK:       bb5:
+; CHECK-NEXT:    [[I_05:%.*]] = phi i64 [ [[INC:%.*]], [[BB5]] ], [ 0, [[BB3]] ]
+; CHECK-NEXT:    [[I_0510:%.*]] = phi i64 [ [[INC10:%.*]], [[BB5]] ], [ 0, [[BB3]] ]
+; CHECK-NEXT:    [[SUB:%.*]] = sub nsw i64 [[I_05]], 3
+; CHECK-NEXT:    [[ADD:%.*]] = add nsw i64 [[I_05]], 3
+; CHECK-NEXT:    [[MUL:%.*]] = mul nsw i64 [[SUB]], [[ADD]]
+; CHECK-NEXT:    [[REM:%.*]] = srem i64 [[MUL]], [[I_05]]
+; CHECK-NEXT:    [[CONV:%.*]] = trunc i64 [[REM]] to i32
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[I_05]]
+; CHECK-NEXT:    store i32 [[CONV]], ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[INC]] = add nsw i64 [[I_05]], 1
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i64 [[INC]], [[N]]
+; CHECK-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds i32, ptr [[B:%.*]], i64 [[I_0510]]
+; CHECK-NEXT:    [[LOAD_B:%.*]] = load i32, ptr [[ARRAYIDX10]], align 4
+; CHECK-NEXT:    [[ARRAYIDX11:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I_0510]]
+; CHECK-NEXT:    store i32 0, ptr [[ARRAYIDX11]], align 4
+; CHECK-NEXT:    [[INC10]] = add nsw i64 [[I_0510]], 1
+; CHECK-NEXT:    [[CMP10:%.*]] = icmp slt i64 [[INC10]], [[N]]
+; CHECK-NEXT:    br i1 [[CMP10]], label [[BB5]], label [[BB1010:%.*]]
+; CHECK:       bb1010:
+; CHECK-NEXT:    br label [[BB14]]
+; CHECK:       bb14:
+; CHECK-NEXT:    br i1 [[CMP4]], label [[BB8:%.*]], label [[BB12:%.*]]
+; CHECK:       bb8:
+; CHECK-NEXT:    br label [[BB9:%.*]]
+; CHECK:       bb9:
+; CHECK-NEXT:    [[I1_02:%.*]] = phi i64 [ [[INC14:%.*]], [[BB9]] ], [ 0, [[BB8]] ]
+; CHECK-NEXT:    [[SUB7:%.*]] = sub nsw i64 [[I1_02]], 3
+; CHECK-NEXT:    [[ADD8:%.*]] = add nsw i64 [[I1_02]], 3
+; CHECK-NEXT:    [[MUL9:%.*]] = mul nsw i64 [[SUB7]], [[ADD8]]
+; CHECK-NEXT:    [[REM10:%.*]] = srem i64 [[MUL9]], [[I1_02]]
+; CHECK-NEXT:    [[CONV11:%.*]] = trunc i64 [[REM10]] to i32
+; CHECK-NEXT:    [[ARRAYIDX12:%.*]] = getelementptr inbounds [1024 x i32], ptr @B, i64 0, i64 [[I1_02]]
+; CHECK-NEXT:    store i32 [[CONV11]], ptr [[ARRAYIDX12]], align 4
+; CHECK-NEXT:    [[INC14]] = add nsw i64 [[I1_02]], 1
+; CHECK-NEXT:    [[CMP3:%.*]] = icmp slt i64 [[INC14]], [[N]]
+; CHECK-NEXT:    br i1 [[CMP3]], label [[BB9]], label [[BB15:%.*]]
+; CHECK:       bb15:
+; CHECK-NEXT:    br label [[BB80:%.*]]
+; CHECK:       bb80:
+; CHECK-NEXT:    br label [[BB90:%.*]]
+; CHECK:       bb90:
+; CHECK-NEXT:    [[I1_0210:%.*]] = phi i64 [ [[INC1410:%.*]], [[BB90]] ], [ 0, [[BB80]] ]
+; CHECK-NEXT:    [[CONV12:%.*]] = trunc i64 [[I1_0210]] to i32
+; CHECK-NEXT:    [[ARRAYIDX1210:%.*]] = getelementptr inbounds [1024 x i32], ptr @B, i64 0, i64 [[I1_0210]]
+; CHECK-NEXT:    store i32 [[CONV12]], ptr [[ARRAYIDX1210]], align 4
+; CHECK-NEXT:    [[INC1410]] = add nsw i64 [[I1_0210]], 1
+; CHECK-NEXT:    [[CMP310:%.*]] = icmp slt i64 [[INC1410]], [[N]]
+; CHECK-NEXT:    br i1 [[CMP310]], label [[BB90]], label [[BB150:%.*]]
+; CHECK:       bb150:
+; CHECK-NEXT:    br label [[BB12]]
+; CHECK:       bb12:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %cmp4 = icmp slt i64 0, %N
+  br i1 %cmp4, label %bb3, label %bb14
+
+bb3:                               ; preds = %entry
+  br label %bb5
+
+bb5:                                         ; preds = %bb3, %bb5
+  %i.05 = phi i64 [ %inc, %bb5 ], [ 0, %bb3 ]
+  %sub = sub nsw i64 %i.05, 3
+  %add = add nsw i64 %i.05, 3
+  %mul = mul nsw i64 %sub, %add
+  %rem = srem i64 %mul, %i.05
+  %conv = trunc i64 %rem to i32
+  %arrayidx = getelementptr inbounds i32, ptr %A, i64 %i.05
+  store i32 %conv, ptr %arrayidx, align 4
+  %inc = add nsw i64 %i.05, 1
+  %cmp = icmp slt i64 %inc, %N
+  br i1 %cmp, label %bb5, label %bb103
+
+bb103:                               ; preds = %bb103
+  br label %bb105
+
+bb105:                                         ; preds = %bb103, %bb105
+  %i.0510 = phi i64 [ %inc10, %bb105 ], [ 0, %bb103 ]
+  %arrayidx10 = getelementptr inbounds i32, ptr %B, i64 %i.0510
+  %load.b = load i32, ptr %arrayidx10, align 4
+  %arrayidx11 = getelementptr inbounds i32, ptr %A, i64 %i.0510
+  store i32 0, ptr %arrayidx11, align 4
+  %inc10 = add nsw i64 %i.0510, 1
+  %cmp10 = icmp slt i64 %inc10, %N
+  br i1 %cmp10, label %bb105, label %bb1010
+
+bb1010:                                 ; preds = %bb105
+  br label %bb14
+
+bb14:                                          ; preds = %bb1010, %entry
+  br i1 %cmp4, label %bb8, label %bb12
+
+bb8:                              ; preds = %bb14
+  br label %bb9
+
+bb9:                                        ; preds = %bb8, %bb9
+  %i1.02 = phi i64 [ %inc14, %bb9 ], [ 0, %bb8 ]
+  %sub7 = sub nsw i64 %i1.02, 3
+  %add8 = add nsw i64 %i1.02, 3
+  %mul9 = mul nsw i64 %sub7, %add8
+  %rem10 = srem i64 %mul9, %i1.02
+  %conv11 = trunc i64 %rem10 to i32
+  %arrayidx12 = getelementptr inbounds [1024 x i32], ptr @B, i64 0, i64 %i1.02
+  store i32 %conv11, ptr %arrayidx12, align 4
+  %inc14 = add nsw i64 %i1.02, 1
+  %cmp3 = icmp slt i64 %inc14, %N
+  br i1 %cmp3, label %bb9, label %bb15
+
+bb15:                               ; preds = %bb9
+  br label %bb80
+
+bb80:                              ; preds = %bb15
+  br label %bb90
+
+bb90:                                        ; preds = %bb80, %bb90
+  %i1.0210 = phi i64 [ %inc1410, %bb90 ], [ 0, %bb80 ]
+  %conv12 = trunc i64 %i1.0210 to i32
+  %arrayidx1210 = getelementptr inbounds [1024 x i32], ptr @B, i64 0, i64 %i1.0210
+  store i32 %conv12, ptr %arrayidx1210, align 4
+  %inc1410 = add nsw i64 %i1.0210, 1
+  %cmp310 = icmp slt i64 %inc1410, %N
+  br i1 %cmp310, label %bb90, label %bb150
+
+bb150:                               ; preds = %bb90
+  br label %bb12
+
+
+bb12:                                        ; preds = %bb15, %bb14
+  ret void
+}

amehsan · 2025-11-13T14:33:46Z

llvm/lib/Transforms/Scalar/LoopFuse.cpp

-      return false;
+      // RHS dominates LHS.
+      // Check if LHS post-dominates RHS.
+      return !LHS.PDT->dominates(LHSEntryBlock, RHSEntryBlock);


I am not sure what are the implication of violating control flow equivalence of the loops in the candidate set. There is a comment in the code that says members of FusionCandidateSet are control flow equivalent. but I haven't looked into it enough to know what happens if we violate the condition. @1997alireza @CongzheUalberta Do you know about this?

your comment says , while post-domination is required for fusing candidates, it is not required for adding something to the candidate set. Does this mean there is another check of post-dominance relationship later on?

I was going to post a comment on this issue and the one extracted from it, but have been very busy with DA stuff. Will try to post a comment soon (But I may need to read some parts of fusion code, if DA stuff leaves enough time for me).

your comment says , while post-domination is required for fusing candidates, it is not required for adding something to the candidate set. Does this mean there is another check of post-dominance relationship later on?

'isAdjacent' is doing the same thing I believe ? In fact, 'isAdjacent' is more stricter than saying 2 loops are control flow equivalent.

CongzheUalberta · 2025-11-13T15:16:02Z

Please correct me if I'm wrong, but my first impression on this patch is that, it allows loops that are not control flow equivalent (e.g., loop1 dominates loop2 but loop2 does not post dominate loop1) to be added into the same fusion candidate set hence those loops possibly could be fused. Would it cause functional problem when they are fused? Have you tested the patch on some benchmarks like SPEC?

sushgokh · 2025-11-13T17:25:21Z

Please correct me if I'm wrong, but my first impression on this patch is that, it allows loops that are not control flow equivalent (e.g., loop1 dominates loop2 but loop2 does not post dominate loop1) to be added into the same fusion candidate set hence those loops possibly could be fused. Would it cause functional problem when they are fused? Have you tested the patch on some benchmarks like SPEC?

There are 2 scenarios:

We are preparing the candidate set for the first time
If the situation is ( L1 dom L2 ) but (L2 !postdom L1) then they wont get added to the candidate set and I verified this with the same test example(by removing loop bb105 and its preheader/exit ).
We are adding the fused candidate to the candidate set(as being done by this patch)
The fusion doesnt happen in this case because legality checks(such as adjacency) fail.

Now, coming back to the example in the issue. L1/L2/L3/L4 are in the same candidate set. There is one obvious flaw in the function 'isControlFlowEquivalent' in the sense that L1/L2 and L3/L4 are equivalent pairwise but there is no dominance relation between L2 and L3. So, I think this function is not very useful other than as a legality check for fusion.

I havent tested this with SPEC for the reason that this patch is just placing the candidate into the set and doesnt change anything wrt legality checks for fusion.

sushgokh · 2025-11-13T17:35:37Z

With this patch, we could have fusion candidates even if they do not have the proper control flow. In this case, we need to check the control flow of the candidates on the fusion step. Does loop fusion do this check on the fusion step?

Yes, 'isAdjacent' should be the function doing the check. I cant think of scenario where they are not control flow equivalent yet adjacent.

amehsan · 2025-11-13T19:00:23Z

With this patch, we could have fusion candidates even if they do not have the proper control flow. In this case, we need to check the control flow of the candidates on the fusion step. Does loop fusion do this check on the fusion step?

Yes, 'isAdjacent' should be the function doing the check. I cant think of scenario where they are not control flow equivalent yet adjacent.

If our codegen is limited to adjacent loops, why do we put non-adjacent loops in a candidate set? If we just put adjacent loops in the candidate set, do we lose any opportunity?

We can always extend fusion in the future, and make it to catch more candidates.

Do I miss something here? (@CongzheUalberta what do you think?)

CongzheUalberta · 2025-11-14T13:53:22Z

There are 2 scenarios:

We are preparing the candidate set for the first time
If the situation is ( L1 dom L2 ) but (L2 !postdom L1) then they wont get added to the candidate set and I verified this with the same test example(by removing loop bb105 and its preheader/exit ).

ok, thanks for the clarification.

We are adding the fused candidate to the candidate set(as being done by this patch)
The fusion doesnt happen in this case because legality checks(such as adjacency) fail.

I have not done a deep debugging so correct me if I'm wrong, but I think the reason fusion does not happen is not because adjacency check fails, it is because with your ordering, after the 1st fusion the candidate set becomes bb8, bb80, bb3 which likely messed up with the for-loop for (auto FC0 = CandidateSet.begin(); FC0 != CandidateSet.end(); ++FC0) and caused the code to immediately exit the for-loop, without even considering that bb8 and bb80 can be fused. That ordering does not seem entirely correct anyways (should bb3 appeared in the end? Before fuse it appeared at the beginning).

Now, coming back to the example in the issue. L1/L2/L3/L4 are in the same candidate set. There is one obvious flaw in the function 'isControlFlowEquivalent' in the sense that L1/L2 and L3/L4 are equivalent pairwise but there is no dominance relation between L2 and L3. So, I think this function is not very useful other than as a legality check for fusion.

I'm not sure if that is correct. LLVM is smart enough to determine when L2 is executed then L3 must gets executed, and vice versa.

One more thing: this PR gives non-deterministic to the ordering in the candidate set, e.g., when DT->dominates(LHSEntryBlock, RHSEntryBlock), it could happen that LHS will be ordered before or after RHS. This seems a bit counter-intuitive to me. Maybe one way to fix the motivative issue is just to drop the fused loop from the candidate set when the ordering cannot be determined, but I need to think more.

CongzheUalberta · 2025-11-14T13:57:16Z

With this patch, we could have fusion candidates even if they do not have the proper control flow. In this case, we need to check the control flow of the candidates on the fusion step. Does loop fusion do this check on the fusion step?

Yes, 'isAdjacent' should be the function doing the check. I cant think of scenario where they are not control flow equivalent yet adjacent.

If our codegen is limited to adjacent loops, why do we put non-adjacent loops in a candidate set? If we just put adjacent loops in the candidate set, do we lose any opportunity?

We can always extend fusion in the future, and make it to catch more candidates.

Do I miss something here? (@CongzheUalberta what do you think?)

Currently llvm does not have the capability to fuse non-adjacent loops that are control flow equivalent (fusing non-adjacent loops would be something nice to have). However I think it is still good to put non-adjacent loops that are control flow equivalent in the candidate set, which opens the door and could be used as the stepping stone for future extension of fusing non-adjacent loops.

If we only put adjacent loops in the set, we would lose the opportunity extending to fusing non-adjacent loops. In that case when we want to extend to non-adjacent loops down the road, we would have to revert the pass to the state as of now and then we essentially would end up with the same issue as what we are facing now.

amehsan · 2025-11-14T21:30:14Z

Currently llvm does not have the capability to fuse non-adjacent loops that are control flow equivalent (fusing non-adjacent loops would be something nice to have). However I think it is still good to put non-adjacent loops that are control flow equivalent in the candidate set, which opens the door and could be used as the stepping stone for future extension of fusing non-adjacent loops.

If we only put adjacent loops in the set, we would lose the opportunity extending to fusing non-adjacent loops. In that case when we want to extend to non-adjacent loops down the road, we would have to revert the pass to the state as of now and then we essentially would end up with the same issue as what we are facing now.

If our goal is to turn on fusion by default then we need to simplify the code. Both in terms of maintainability and also compile time cost. Right now I believe we do three different check: isControlFlowEquivalent which is even more relaxed than dominance check, then there is dominance check and then there is adjacency check. This really needs to be simplified.

sjoerdmeijer · 2025-11-15T09:41:17Z

If our goal is to turn on fusion by default

Yes, I think we should make this a goal.

We chose interchange as the first loop optimisation to enable and run by default, but we are very interested in fusion as well. For default enablement, we would need to give it the "investigate and fix compile time outliers, and fix all correctness issues" treatment as per the updated/clarified developers policy section on this topic.

Running testing and fuzzers on loop fusion was a first exercise to judge the quality of the pass, and see how much work it would be. As I mentioned, we are mostly focused on interchange, but we are happy to support you with fusion. For example, we have data where fusion improves workloads if we need to justify enabling this and if it is worth the compile-time. And we can help with point fixes like this, although you seem to suggest the pass may need a bit of an overhaul?

I am interested to hear your opinions on this. Maybe we can make default enablement of interchange and fusion recurring topics for the loop optimisation workgroup meetings (CC: @Meinersbur ), so am happy to talk more about this in the meeting, and in the meantime we can also continue here of course.

amehsan · 2025-11-15T17:47:36Z

Running testing and fuzzers on loop fusion was a first exercise to judge the quality of the pass, and see how much work it would be. As I mentioned, we are mostly focused on interchange, but we are happy to support you with fusion. For example, we have data where fusion improves workloads if we need to justify enabling this and if it is worth the compile-time. And we can help with point fixes like this, although you seem to suggest the pass may need a bit of an overhaul?

Yes, we are interested in loop fusion. This past week we have been busy with DA discussions and internal stuff but I expect we speed up on loop fusion soon.

With regard to major changes in fusion. yes, I think at least two major improvements is needed (and both should help with compile time). One is the comment above, the other the changes in how we check dependency that I previously commented on one of the issues. But I think before we finalize the second one, we need to wait for the current investigations in DA to conclude (basically it is a discussion of what are the conditions for correctness of DA)

If you are interested in merging this fix as is, it is fine with me. we can merge this and we will address the issue of multiple different control flow checks separately.

And sure, we can discuss the status both in the loop opt meeting and also here or elsewhere on github.

Meinersbur · 2025-11-17T13:11:53Z

I don't think this is a proper solution. Removing assertions and using them for a tie-breaker will eventually get us to completely symmetric instances such as #166560. There is no reason that two abritary loops in the CFG that are not even sequential have to be found a sequential order for.

amehsan · 2025-11-17T18:44:59Z

@sjoerdmeijer
@1997alireza will be back to fusion starting next week (possibly earlier)
I will be back as soon as DA investigation reaches a conclusion.

sushgokh · 2025-11-18T09:10:08Z

I don't think this is a proper solution. Removing assertions and using them for a tie-breaker will eventually get us to completely symmetric instances such as #166560. There is no reason that two abritary loops in the CFG that are not even sequential have to be found a sequential order for.

If we want to limit the scope of fusion to just adjacent loops, then the checks can be completely removed I think. But in the given example(or the issue IR), there is much scope for improvement at the same loop depth level. For instance, post first fusion, we have 3 loops in the same sequence in the candidate set:

L3
L4
L12

There is still scope for merging L3/L4 and then L34 can be merged with L12. So, basically, this is a recursive procedure. By relaxing the assertions, we would be allowing such fusions by inserting plausible candidates for fusion.

amehsan · 2025-11-18T14:39:32Z

If we want to limit the scope of fusion to just adjacent loops, then the checks can be completely removed I think. But in the given example(or the issue IR), there is much scope for improvement at the same loop depth level. For instance, post first fusion, we have 3 loops in the same sequence in the candidate set:
L3
L4
L12
There is still scope for merging L3/L4 and then L34 can be merged with L12. So, basically, this is a recursive procedure. By relaxing the assertions, we would be allowing such fusions by inserting plausible candidates for fusion.

I am trying to understand your concern here. IIUC you are saying isAdjacent condition is not enough because if we limit ourselves to that we may not insert all the loops above into the same CandidateSet and then we will lose the opportunity. Is that correct?

sushgokh · 2025-11-19T12:18:20Z

If we want to limit the scope of fusion to just adjacent loops, then the checks can be completely removed I think. But in the given example(or the issue IR), there is much scope for improvement at the same loop depth level. For instance, post first fusion, we have 3 loops in the same sequence in the candidate set:
L3
L4
L12
There is still scope for merging L3/L4 and then L34 can be merged with L12. So, basically, this is a recursive procedure. By relaxing the assertions, we would be allowing such fusions by inserting plausible candidates for fusion.
I am trying to understand your concern here. IIUC you are saying isAdjacent condition is not enough because if we limit ourselves to that we may not insert all the loops above into the same CandidateSet and then we will lose the opportunity. Is that correct?

Yes, right.

To reiterate what I would like to say, checking for adjacency is just for actual fusion. Dominance checking is for preparing set of plausible candidates for fusion iteratively (i.e. even the fused candidate can be candidate for further fusion as is the case here). There are other ways to prepare plausible candidate set at every step of the pass but for now, lets get rid of this assert. Lets proceed one step at a time.

amehsan · 2025-11-20T04:09:54Z

Yes, right.

To reiterate what I would like to say, checking for adjacency is just for actual fusion. Dominance checking is for preparing set of plausible candidates for fusion iteratively (i.e. even the fused candidate can be candidate for further fusion as is the case here). There are other ways to prepare plausible candidate set at every step of the pass but for now, lets get rid of this assert. Lets proceed one step at a time.

give me a couple of days. I will get back to you about this.

sushgokh requested review from CongzheUalberta, Meinersbur, amehsan, madhur13490 and sjoerdmeijer November 13, 2025 07:31

llvmbot added the llvm:transforms label Nov 13, 2025

amehsan reviewed Nov 13, 2025

View reviewed changes

[LoopFusion] Assert failure in the issue 80301 #167837

Are you sure you want to change the base?

[LoopFusion] Assert failure in the issue 80301 #167837

Uh oh!

Conversation

sushgokh commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 13, 2025

Uh oh!

amehsan Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sushgokh Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

CongzheUalberta commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sushgokh commented Nov 13, 2025

Uh oh!

sushgokh commented Nov 13, 2025

Uh oh!

amehsan commented Nov 13, 2025

Uh oh!

CongzheUalberta commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CongzheUalberta commented Nov 14, 2025

Uh oh!

amehsan commented Nov 14, 2025

Uh oh!

sjoerdmeijer commented Nov 15, 2025

Uh oh!

amehsan commented Nov 15, 2025

Uh oh!

Meinersbur commented Nov 17, 2025

Uh oh!

amehsan commented Nov 17, 2025

Uh oh!

sushgokh commented Nov 18, 2025

Uh oh!

amehsan commented Nov 18, 2025

Uh oh!

sushgokh commented Nov 19, 2025

Uh oh!

amehsan commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sushgokh commented Nov 13, 2025 •

edited

Loading

amehsan Nov 13, 2025 •

edited

Loading

CongzheUalberta commented Nov 13, 2025 •

edited

Loading

CongzheUalberta commented Nov 14, 2025 •

edited

Loading