Skip to content

Conversation

@sushgokh
Copy link
Contributor

@sushgokh sushgokh commented Nov 13, 2025

Concerning the test 'IR.ll' mentioned in the issue(same has been added as test case), we have the following scenario before fusion:

GC1 --> { L1 --> L2} -- 
|                     |
|<-------------------- 
V 
GC2 --> {L3 --> L4} --
|                    |
|<--------------------
V
exit

The candidate set before fusion, in the same sequence is {L1, L2, L3, L4}.

L1 and L2 get fused thus giving us the single loop L12 with guard condition GC1. L3 and L4 are considered not to have a guard condition.

When inserting L12 back into the candidate set again, there was an assert that if entry block of L12(i.e. GC1) dominates entry block of {L3-->L4} (i.e. L3's preheader) then entry block of {L3-->L4} must postdominate entry block of L12. While this is true for fusing candidates, this is not true for inserting candidates into the candidate set(or preparing the new candidate set)(or setting the order of the candidates for the next fusion). This was not happening for the above case and hence the assert failure.

This patch tries to resolve this issue by removing the assert and using the assert's condition for deciding the relative order of candidates in the set.

Fixes #80301

Concerning the test 'IR.ll' mentioned in the issue(same has been added
as test case), we have the following scenario before fusion:

GC1 --> { L1 --> L2} --
 |                    |
 |<--------------------
 V
GC2 --> {L3 --> L4} --
 |                    |
 |<--------------------
 V
exit

The candidate set before fusion, in the same sequence is {L1, L2, L3, L4}.

L1 and L2 get fused thus giving us the single loop L12 with guard
condition GC1. L3 and L4 are considered not to have a guard condition.

When inserting L12 back into the candidate set again, there was an
assert that if entry block of L12(i.e. GC1) dominates entry block of
{L3-->L4} (i.e. L3's preheader) then entry block of {L3-->L4} must
postdominate entry block of L12. While this is true for fusing
candidates, this is not true for inserting candidates into the candidate
set(or preparing the new candidate set)(or setting the  order of the
candidates for the next fusion). This was not happening for the above
case and hence the assert failure.

This patch tries to resolve this issue by removing the assert and using
the assert's condition for deciding the relative order of candidates in
the set.
@llvmbot
Copy link
Member

llvmbot commented Nov 13, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Sushant Gokhale (sushgokh)

Changes

Concerning the test 'IR.ll' mentioned in the issue(same has been added as test case), we have the following scenario before fusion:

GC1 --&gt; { L1 --&gt; L2} -- 
|                     |
|&lt;-------------------- 
V 
GC2 --&gt; {L3 --&gt; L4} --
|                    |
|&lt;--------------------
V
exit

The candidate set before fusion, in the same sequence is {L1, L2, L3, L4}.

L1 and L2 get fused thus giving us the single loop L12 with guard condition GC1. L3 and L4 are considered not to have a guard condition.

When inserting L12 back into the candidate set again, there was an assert that if entry block of L12(i.e. GC1) dominates entry block of {L3-->L4} (i.e. L3's preheader) then entry block of {L3-->L4} must postdominate entry block of L12. While this is true for fusing candidates, this is not true for inserting candidates into the candidate set(or preparing the new candidate set)(or setting the order of the candidates for the next fusion). This was not happening for the above case and hence the assert failure.

This patch tries to resolve this issue by removing the assert and using the assert's condition for deciding the relative order of candidates in the set.


Full diff: https://github.com/llvm/llvm-project/pull/167837.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopFuse.cpp (+5-7)
  • (modified) llvm/test/Transforms/LoopFusion/guarded.ll (+139)
diff --git a/llvm/lib/Transforms/Scalar/LoopFuse.cpp b/llvm/lib/Transforms/Scalar/LoopFuse.cpp
index 9ffa602416b05..e3a44a4976b7d 100644
--- a/llvm/lib/Transforms/Scalar/LoopFuse.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopFuse.cpp
@@ -406,16 +406,14 @@ struct FusionCandidateCompare {
 
     // Do this compare first so if LHS == RHS, function returns false.
     if (DT->dominates(RHSEntryBlock, LHSEntryBlock)) {
-      // RHS dominates LHS
-      // Verify LHS post-dominates RHS
-      assert(LHS.PDT->dominates(LHSEntryBlock, RHSEntryBlock));
-      return false;
+      // RHS dominates LHS.
+      // Check if LHS post-dominates RHS.
+      return !LHS.PDT->dominates(LHSEntryBlock, RHSEntryBlock);
     }
 
     if (DT->dominates(LHSEntryBlock, RHSEntryBlock)) {
-      // Verify RHS Postdominates LHS
-      assert(LHS.PDT->dominates(RHSEntryBlock, LHSEntryBlock));
-      return true;
+      // Check if RHS postdominates LHS.
+      return LHS.PDT->dominates(RHSEntryBlock, LHSEntryBlock);
     }
 
     // If two FusionCandidates are in the same level of dominator tree,
diff --git a/llvm/test/Transforms/LoopFusion/guarded.ll b/llvm/test/Transforms/LoopFusion/guarded.ll
index 863d9b1bb4e86..9d27adfe1fe8f 100644
--- a/llvm/test/Transforms/LoopFusion/guarded.ll
+++ b/llvm/test/Transforms/LoopFusion/guarded.ll
@@ -390,3 +390,142 @@ for.2:                                       ; preds = %for.cond13, %for.body6,
 exit:                                          ; preds = %for.cond13
   ret void
 }
+
+define void @foo(ptr noalias %A, ptr noalias %B, i64 %N) {
+; CHECK-LABEL: @foo(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CMP4:%.*]] = icmp slt i64 0, [[N:%.*]]
+; CHECK-NEXT:    br i1 [[CMP4]], label [[BB3:%.*]], label [[BB14:%.*]]
+; CHECK:       bb3:
+; CHECK-NEXT:    br label [[BB5:%.*]]
+; CHECK:       bb5:
+; CHECK-NEXT:    [[I_05:%.*]] = phi i64 [ [[INC:%.*]], [[BB5]] ], [ 0, [[BB3]] ]
+; CHECK-NEXT:    [[I_0510:%.*]] = phi i64 [ [[INC10:%.*]], [[BB5]] ], [ 0, [[BB3]] ]
+; CHECK-NEXT:    [[SUB:%.*]] = sub nsw i64 [[I_05]], 3
+; CHECK-NEXT:    [[ADD:%.*]] = add nsw i64 [[I_05]], 3
+; CHECK-NEXT:    [[MUL:%.*]] = mul nsw i64 [[SUB]], [[ADD]]
+; CHECK-NEXT:    [[REM:%.*]] = srem i64 [[MUL]], [[I_05]]
+; CHECK-NEXT:    [[CONV:%.*]] = trunc i64 [[REM]] to i32
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[I_05]]
+; CHECK-NEXT:    store i32 [[CONV]], ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[INC]] = add nsw i64 [[I_05]], 1
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i64 [[INC]], [[N]]
+; CHECK-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds i32, ptr [[B:%.*]], i64 [[I_0510]]
+; CHECK-NEXT:    [[LOAD_B:%.*]] = load i32, ptr [[ARRAYIDX10]], align 4
+; CHECK-NEXT:    [[ARRAYIDX11:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I_0510]]
+; CHECK-NEXT:    store i32 0, ptr [[ARRAYIDX11]], align 4
+; CHECK-NEXT:    [[INC10]] = add nsw i64 [[I_0510]], 1
+; CHECK-NEXT:    [[CMP10:%.*]] = icmp slt i64 [[INC10]], [[N]]
+; CHECK-NEXT:    br i1 [[CMP10]], label [[BB5]], label [[BB1010:%.*]]
+; CHECK:       bb1010:
+; CHECK-NEXT:    br label [[BB14]]
+; CHECK:       bb14:
+; CHECK-NEXT:    br i1 [[CMP4]], label [[BB8:%.*]], label [[BB12:%.*]]
+; CHECK:       bb8:
+; CHECK-NEXT:    br label [[BB9:%.*]]
+; CHECK:       bb9:
+; CHECK-NEXT:    [[I1_02:%.*]] = phi i64 [ [[INC14:%.*]], [[BB9]] ], [ 0, [[BB8]] ]
+; CHECK-NEXT:    [[SUB7:%.*]] = sub nsw i64 [[I1_02]], 3
+; CHECK-NEXT:    [[ADD8:%.*]] = add nsw i64 [[I1_02]], 3
+; CHECK-NEXT:    [[MUL9:%.*]] = mul nsw i64 [[SUB7]], [[ADD8]]
+; CHECK-NEXT:    [[REM10:%.*]] = srem i64 [[MUL9]], [[I1_02]]
+; CHECK-NEXT:    [[CONV11:%.*]] = trunc i64 [[REM10]] to i32
+; CHECK-NEXT:    [[ARRAYIDX12:%.*]] = getelementptr inbounds [1024 x i32], ptr @B, i64 0, i64 [[I1_02]]
+; CHECK-NEXT:    store i32 [[CONV11]], ptr [[ARRAYIDX12]], align 4
+; CHECK-NEXT:    [[INC14]] = add nsw i64 [[I1_02]], 1
+; CHECK-NEXT:    [[CMP3:%.*]] = icmp slt i64 [[INC14]], [[N]]
+; CHECK-NEXT:    br i1 [[CMP3]], label [[BB9]], label [[BB15:%.*]]
+; CHECK:       bb15:
+; CHECK-NEXT:    br label [[BB80:%.*]]
+; CHECK:       bb80:
+; CHECK-NEXT:    br label [[BB90:%.*]]
+; CHECK:       bb90:
+; CHECK-NEXT:    [[I1_0210:%.*]] = phi i64 [ [[INC1410:%.*]], [[BB90]] ], [ 0, [[BB80]] ]
+; CHECK-NEXT:    [[CONV12:%.*]] = trunc i64 [[I1_0210]] to i32
+; CHECK-NEXT:    [[ARRAYIDX1210:%.*]] = getelementptr inbounds [1024 x i32], ptr @B, i64 0, i64 [[I1_0210]]
+; CHECK-NEXT:    store i32 [[CONV12]], ptr [[ARRAYIDX1210]], align 4
+; CHECK-NEXT:    [[INC1410]] = add nsw i64 [[I1_0210]], 1
+; CHECK-NEXT:    [[CMP310:%.*]] = icmp slt i64 [[INC1410]], [[N]]
+; CHECK-NEXT:    br i1 [[CMP310]], label [[BB90]], label [[BB150:%.*]]
+; CHECK:       bb150:
+; CHECK-NEXT:    br label [[BB12]]
+; CHECK:       bb12:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %cmp4 = icmp slt i64 0, %N
+  br i1 %cmp4, label %bb3, label %bb14
+
+bb3:                               ; preds = %entry
+  br label %bb5
+
+bb5:                                         ; preds = %bb3, %bb5
+  %i.05 = phi i64 [ %inc, %bb5 ], [ 0, %bb3 ]
+  %sub = sub nsw i64 %i.05, 3
+  %add = add nsw i64 %i.05, 3
+  %mul = mul nsw i64 %sub, %add
+  %rem = srem i64 %mul, %i.05
+  %conv = trunc i64 %rem to i32
+  %arrayidx = getelementptr inbounds i32, ptr %A, i64 %i.05
+  store i32 %conv, ptr %arrayidx, align 4
+  %inc = add nsw i64 %i.05, 1
+  %cmp = icmp slt i64 %inc, %N
+  br i1 %cmp, label %bb5, label %bb103
+
+bb103:                               ; preds = %bb103
+  br label %bb105
+
+bb105:                                         ; preds = %bb103, %bb105
+  %i.0510 = phi i64 [ %inc10, %bb105 ], [ 0, %bb103 ]
+  %arrayidx10 = getelementptr inbounds i32, ptr %B, i64 %i.0510
+  %load.b = load i32, ptr %arrayidx10, align 4
+  %arrayidx11 = getelementptr inbounds i32, ptr %A, i64 %i.0510
+  store i32 0, ptr %arrayidx11, align 4
+  %inc10 = add nsw i64 %i.0510, 1
+  %cmp10 = icmp slt i64 %inc10, %N
+  br i1 %cmp10, label %bb105, label %bb1010
+
+bb1010:                                 ; preds = %bb105
+  br label %bb14
+
+bb14:                                          ; preds = %bb1010, %entry
+  br i1 %cmp4, label %bb8, label %bb12
+
+bb8:                              ; preds = %bb14
+  br label %bb9
+
+bb9:                                        ; preds = %bb8, %bb9
+  %i1.02 = phi i64 [ %inc14, %bb9 ], [ 0, %bb8 ]
+  %sub7 = sub nsw i64 %i1.02, 3
+  %add8 = add nsw i64 %i1.02, 3
+  %mul9 = mul nsw i64 %sub7, %add8
+  %rem10 = srem i64 %mul9, %i1.02
+  %conv11 = trunc i64 %rem10 to i32
+  %arrayidx12 = getelementptr inbounds [1024 x i32], ptr @B, i64 0, i64 %i1.02
+  store i32 %conv11, ptr %arrayidx12, align 4
+  %inc14 = add nsw i64 %i1.02, 1
+  %cmp3 = icmp slt i64 %inc14, %N
+  br i1 %cmp3, label %bb9, label %bb15
+
+bb15:                               ; preds = %bb9
+  br label %bb80
+
+bb80:                              ; preds = %bb15
+  br label %bb90
+
+bb90:                                        ; preds = %bb80, %bb90
+  %i1.0210 = phi i64 [ %inc1410, %bb90 ], [ 0, %bb80 ]
+  %conv12 = trunc i64 %i1.0210 to i32
+  %arrayidx1210 = getelementptr inbounds [1024 x i32], ptr @B, i64 0, i64 %i1.0210
+  store i32 %conv12, ptr %arrayidx1210, align 4
+  %inc1410 = add nsw i64 %i1.0210, 1
+  %cmp310 = icmp slt i64 %inc1410, %N
+  br i1 %cmp310, label %bb90, label %bb150
+
+bb150:                               ; preds = %bb90
+  br label %bb12
+
+
+bb12:                                        ; preds = %bb15, %bb14
+  ret void
+}

return false;
// RHS dominates LHS.
// Check if LHS post-dominates RHS.
return !LHS.PDT->dominates(LHSEntryBlock, RHSEntryBlock);
Copy link
Contributor

@amehsan amehsan Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what are the implication of violating control flow equivalence of the loops in the candidate set. There is a comment in the code that says members of FusionCandidateSet are control flow equivalent. but I haven't looked into it enough to know what happens if we violate the condition. @1997alireza @CongzheUalberta Do you know about this?

your comment says , while post-domination is required for fusing candidates, it is not required for adding something to the candidate set. Does this mean there is another check of post-dominance relationship later on?

I was going to post a comment on this issue and the one extracted from it, but have been very busy with DA stuff. Will try to post a comment soon (But I may need to read some parts of fusion code, if DA stuff leaves enough time for me).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your comment says , while post-domination is required for fusing candidates, it is not required for adding something to the candidate set. Does this mean there is another check of post-dominance relationship later on?

'isAdjacent' is doing the same thing I believe ? In fact, 'isAdjacent' is more stricter than saying 2 loops are control flow equivalent.

@CongzheUalberta
Copy link
Contributor

CongzheUalberta commented Nov 13, 2025

Please correct me if I'm wrong, but my first impression on this patch is that, it allows loops that are not control flow equivalent (e.g., loop1 dominates loop2 but loop2 does not post dominate loop1) to be added into the same fusion candidate set hence those loops possibly could be fused. Would it cause functional problem when they are fused? Have you tested the patch on some benchmarks like SPEC?

@sushgokh
Copy link
Contributor Author

Please correct me if I'm wrong, but my first impression on this patch is that, it allows loops that are not control flow equivalent (e.g., loop1 dominates loop2 but loop2 does not post dominate loop1) to be added into the same fusion candidate set hence those loops possibly could be fused. Would it cause functional problem when they are fused? Have you tested the patch on some benchmarks like SPEC?

There are 2 scenarios:

  1. We are preparing the candidate set for the first time
    If the situation is ( L1 dom L2 ) but (L2 !postdom L1) then they wont get added to the candidate set and I verified this with the same test example(by removing loop bb105 and its preheader/exit ).
  2. We are adding the fused candidate to the candidate set(as being done by this patch)
    The fusion doesnt happen in this case because legality checks(such as adjacency) fail.

Now, coming back to the example in the issue. L1/L2/L3/L4 are in the same candidate set. There is one obvious flaw in the function 'isControlFlowEquivalent' in the sense that L1/L2 and L3/L4 are equivalent pairwise but there is no dominance relation between L2 and L3. So, I think this function is not very useful other than as a legality check for fusion.

I havent tested this with SPEC for the reason that this patch is just placing the candidate into the set and doesnt change anything wrt legality checks for fusion.

@sushgokh
Copy link
Contributor Author

With this patch, we could have fusion candidates even if they do not have the proper control flow. In this case, we need to check the control flow of the candidates on the fusion step. Does loop fusion do this check on the fusion step?

Yes, 'isAdjacent' should be the function doing the check. I cant think of scenario where they are not control flow equivalent yet adjacent.

@amehsan
Copy link
Contributor

amehsan commented Nov 13, 2025

With this patch, we could have fusion candidates even if they do not have the proper control flow. In this case, we need to check the control flow of the candidates on the fusion step. Does loop fusion do this check on the fusion step?

Yes, 'isAdjacent' should be the function doing the check. I cant think of scenario where they are not control flow equivalent yet adjacent.

If our codegen is limited to adjacent loops, why do we put non-adjacent loops in a candidate set? If we just put adjacent loops in the candidate set, do we lose any opportunity?

We can always extend fusion in the future, and make it to catch more candidates.

Do I miss something here? (@CongzheUalberta what do you think?)

@CongzheUalberta
Copy link
Contributor

CongzheUalberta commented Nov 14, 2025

There are 2 scenarios:

  1. We are preparing the candidate set for the first time
    If the situation is ( L1 dom L2 ) but (L2 !postdom L1) then they wont get added to the candidate set and I verified this with the same test example(by removing loop bb105 and its preheader/exit ).

ok, thanks for the clarification.

  1. We are adding the fused candidate to the candidate set(as being done by this patch)
    The fusion doesnt happen in this case because legality checks(such as adjacency) fail.

I have not done a deep debugging so correct me if I'm wrong, but I think the reason fusion does not happen is not because adjacency check fails, it is because with your ordering, after the 1st fusion the candidate set becomes bb8, bb80, bb3 which likely messed up with the for-loop for (auto FC0 = CandidateSet.begin(); FC0 != CandidateSet.end(); ++FC0) and caused the code to immediately exit the for-loop, without even considering that bb8 and bb80 can be fused. That ordering does not seem entirely correct anyways (should bb3 appeared in the end? Before fuse it appeared at the beginning).

Now, coming back to the example in the issue. L1/L2/L3/L4 are in the same candidate set. There is one obvious flaw in the function 'isControlFlowEquivalent' in the sense that L1/L2 and L3/L4 are equivalent pairwise but there is no dominance relation between L2 and L3. So, I think this function is not very useful other than as a legality check for fusion.

I'm not sure if that is correct. LLVM is smart enough to determine when L2 is executed then L3 must gets executed, and vice versa.

One more thing: this PR gives non-deterministic to the ordering in the candidate set, e.g., when DT->dominates(LHSEntryBlock, RHSEntryBlock), it could happen that LHS will be ordered before or after RHS. This seems a bit counter-intuitive to me. Maybe one way to fix the motivative issue is just to drop the fused loop from the candidate set when the ordering cannot be determined, but I need to think more.

@CongzheUalberta
Copy link
Contributor

With this patch, we could have fusion candidates even if they do not have the proper control flow. In this case, we need to check the control flow of the candidates on the fusion step. Does loop fusion do this check on the fusion step?

Yes, 'isAdjacent' should be the function doing the check. I cant think of scenario where they are not control flow equivalent yet adjacent.

If our codegen is limited to adjacent loops, why do we put non-adjacent loops in a candidate set? If we just put adjacent loops in the candidate set, do we lose any opportunity?

We can always extend fusion in the future, and make it to catch more candidates.

Do I miss something here? (@CongzheUalberta what do you think?)

Currently llvm does not have the capability to fuse non-adjacent loops that are control flow equivalent (fusing non-adjacent loops would be something nice to have). However I think it is still good to put non-adjacent loops that are control flow equivalent in the candidate set, which opens the door and could be used as the stepping stone for future extension of fusing non-adjacent loops.

If we only put adjacent loops in the set, we would lose the opportunity extending to fusing non-adjacent loops. In that case when we want to extend to non-adjacent loops down the road, we would have to revert the pass to the state as of now and then we essentially would end up with the same issue as what we are facing now.

@amehsan
Copy link
Contributor

amehsan commented Nov 14, 2025

Currently llvm does not have the capability to fuse non-adjacent loops that are control flow equivalent (fusing non-adjacent loops would be something nice to have). However I think it is still good to put non-adjacent loops that are control flow equivalent in the candidate set, which opens the door and could be used as the stepping stone for future extension of fusing non-adjacent loops.

If we only put adjacent loops in the set, we would lose the opportunity extending to fusing non-adjacent loops. In that case when we want to extend to non-adjacent loops down the road, we would have to revert the pass to the state as of now and then we essentially would end up with the same issue as what we are facing now.

If our goal is to turn on fusion by default then we need to simplify the code. Both in terms of maintainability and also compile time cost. Right now I believe we do three different check: isControlFlowEquivalent which is even more relaxed than dominance check, then there is dominance check and then there is adjacency check. This really needs to be simplified.

@sjoerdmeijer
Copy link
Collaborator

If our goal is to turn on fusion by default

Yes, I think we should make this a goal.

We chose interchange as the first loop optimisation to enable and run by default, but we are very interested in fusion as well. For default enablement, we would need to give it the "investigate and fix compile time outliers, and fix all correctness issues" treatment as per the updated/clarified developers policy section on this topic.

Running testing and fuzzers on loop fusion was a first exercise to judge the quality of the pass, and see how much work it would be. As I mentioned, we are mostly focused on interchange, but we are happy to support you with fusion. For example, we have data where fusion improves workloads if we need to justify enabling this and if it is worth the compile-time. And we can help with point fixes like this, although you seem to suggest the pass may need a bit of an overhaul?

I am interested to hear your opinions on this. Maybe we can make default enablement of interchange and fusion recurring topics for the loop optimisation workgroup meetings (CC: @Meinersbur ), so am happy to talk more about this in the meeting, and in the meantime we can also continue here of course.

@amehsan
Copy link
Contributor

amehsan commented Nov 15, 2025

Running testing and fuzzers on loop fusion was a first exercise to judge the quality of the pass, and see how much work it would be. As I mentioned, we are mostly focused on interchange, but we are happy to support you with fusion. For example, we have data where fusion improves workloads if we need to justify enabling this and if it is worth the compile-time. And we can help with point fixes like this, although you seem to suggest the pass may need a bit of an overhaul?

Yes, we are interested in loop fusion. This past week we have been busy with DA discussions and internal stuff but I expect we speed up on loop fusion soon.

With regard to major changes in fusion. yes, I think at least two major improvements is needed (and both should help with compile time). One is the comment above, the other the changes in how we check dependency that I previously commented on one of the issues. But I think before we finalize the second one, we need to wait for the current investigations in DA to conclude (basically it is a discussion of what are the conditions for correctness of DA)

If you are interested in merging this fix as is, it is fine with me. we can merge this and we will address the issue of multiple different control flow checks separately.

And sure, we can discuss the status both in the loop opt meeting and also here or elsewhere on github.

@Meinersbur
Copy link
Member

I don't think this is a proper solution. Removing assertions and using them for a tie-breaker will eventually get us to completely symmetric instances such as #166560. There is no reason that two abritary loops in the CFG that are not even sequential have to be found a sequential order for.

@amehsan
Copy link
Contributor

amehsan commented Nov 17, 2025

@sjoerdmeijer
@1997alireza will be back to fusion starting next week (possibly earlier)
I will be back as soon as DA investigation reaches a conclusion.

@sushgokh
Copy link
Contributor Author

I don't think this is a proper solution. Removing assertions and using them for a tie-breaker will eventually get us to completely symmetric instances such as #166560. There is no reason that two abritary loops in the CFG that are not even sequential have to be found a sequential order for.

If we want to limit the scope of fusion to just adjacent loops, then the checks can be completely removed I think. But in the given example(or the issue IR), there is much scope for improvement at the same loop depth level. For instance, post first fusion, we have 3 loops in the same sequence in the candidate set:

L3
L4
L12

There is still scope for merging L3/L4 and then L34 can be merged with L12. So, basically, this is a recursive procedure. By relaxing the assertions, we would be allowing such fusions by inserting plausible candidates for fusion.

@amehsan
Copy link
Contributor

amehsan commented Nov 18, 2025

If we want to limit the scope of fusion to just adjacent loops, then the checks can be completely removed I think. But in the given example(or the issue IR), there is much scope for improvement at the same loop depth level. For instance, post first fusion, we have 3 loops in the same sequence in the candidate set:

L3
L4
L12

There is still scope for merging L3/L4 and then L34 can be merged with L12. So, basically, this is a recursive procedure. By relaxing the assertions, we would be allowing such fusions by inserting plausible candidates for fusion.

I am trying to understand your concern here. IIUC you are saying isAdjacent condition is not enough because if we limit ourselves to that we may not insert all the loops above into the same CandidateSet and then we will lose the opportunity. Is that correct?

@sushgokh
Copy link
Contributor Author

If we want to limit the scope of fusion to just adjacent loops, then the checks can be completely removed I think. But in the given example(or the issue IR), there is much scope for improvement at the same loop depth level. For instance, post first fusion, we have 3 loops in the same sequence in the candidate set:

L3
L4
L12

There is still scope for merging L3/L4 and then L34 can be merged with L12. So, basically, this is a recursive procedure. By relaxing the assertions, we would be allowing such fusions by inserting plausible candidates for fusion.

I am trying to understand your concern here. IIUC you are saying isAdjacent condition is not enough because if we limit ourselves to that we may not insert all the loops above into the same CandidateSet and then we will lose the opportunity. Is that correct?

Yes, right.

To reiterate what I would like to say, checking for adjacency is just for actual fusion. Dominance checking is for preparing set of plausible candidates for fusion iteratively (i.e. even the fused candidate can be candidate for further fusion as is the case here). There are other ways to prepare plausible candidate set at every step of the pass but for now, lets get rid of this assert. Lets proceed one step at a time.

@amehsan
Copy link
Contributor

amehsan commented Nov 20, 2025

Yes, right.

To reiterate what I would like to say, checking for adjacency is just for actual fusion. Dominance checking is for preparing set of plausible candidates for fusion iteratively (i.e. even the fused candidate can be candidate for further fusion as is the case here). There are other ways to prepare plausible candidate set at every step of the pass but for now, lets get rid of this assert. Lets proceed one step at a time.

give me a couple of days. I will get back to you about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[LoopFusion] [SPEC] Assert "LHS.PDT->dominates(LHSEntryBlock, RHSEntryBlock)"

6 participants