Skip to content

Conversation

@luciechoi
Copy link
Contributor

@luciechoi luciechoi commented Nov 4, 2025

Fixes a bug #165642. Similar fix is being made in IndVarSimplify pass to account for convergence tokens.

LLVM Spec states that only a single convergence intrinsic can be included in a basic block.

This PR fixes the issue in SimplifyCFG pass so that when a basic block and its predecessor both contain convergence intrinsics, it skips merging the two blocks.

@llvmbot
Copy link
Member

llvmbot commented Nov 4, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Lucie Choi (luciechoi)

Changes

Fixes a bug #165642

LLVM Spec states that only a single convergence intrinsic can be included in a basic block.

This PR fixes the issue in SimplifyCFG pass so that when a basic block and its predecessor both contain convergence intrinsics, it skips merging the two blocks.


Full diff: https://github.com/llvm/llvm-project/pull/166452.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/BasicBlockUtils.cpp (+9)
  • (added) llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll (+68)
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 11db0ec487328..c1b6140abb471 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -230,6 +230,15 @@ bool llvm::MergeBlockIntoPredecessor(BasicBlock *BB, DomTreeUpdater *DTU,
   // Don't break self-loops.
   if (PredBB == BB) return false;
 
+  // Don't break if both the basic block and the predecessor contain convergent
+  // intrinsics.
+  for (Instruction &I : *BB)
+    if (isa<ConvergenceControlInst>(I)) {
+      for (Instruction &I : *PredBB)
+        if (isa<ConvergenceControlInst>(I))
+          return false;
+    }
+
   // Don't break unwinding instructions or terminators with other side-effects.
   Instruction *PTI = PredBB->getTerminator();
   if (PTI->isSpecialTerminator() || PTI->mayHaveSideEffects())
diff --git a/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll
new file mode 100644
index 0000000000000..d5ae64f6897e3
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll
@@ -0,0 +1,68 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -S -passes=simplifycfg | FileCheck %s
+
+declare token @llvm.experimental.convergence.entry() #0
+
+define void @nested(i32 %tidx, i32 %tidy, ptr %array) #0 {
+; CHECK-LABEL: @nested(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = tail call token @llvm.experimental.convergence.entry()
+; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[TIDY:%.*]], [[TIDX:%.*]]
+; CHECK-NEXT:    [[OR_COND_I:%.*]] = icmp eq i32 [[TMP1]], 0
+; CHECK-NEXT:    br label [[FOR_COND_I:%.*]]
+; CHECK:       for.cond.i:
+; CHECK-NEXT:    [[TMP2:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP0]]) ]
+; CHECK-NEXT:    br label [[FOR_COND1_I:%.*]]
+; CHECK:       for.cond1.i:
+; CHECK-NEXT:    [[CMP2_I:%.*]] = phi i1 [ false, [[FOR_BODY4_I:%.*]] ], [ true, [[FOR_COND_I]] ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP2]]) ]
+; CHECK-NEXT:    br i1 [[CMP2_I]], label [[FOR_BODY4_I]], label [[EXIT:%.*]]
+; CHECK:       for.body4.i:
+; CHECK-NEXT:    br i1 [[OR_COND_I]], label [[IF_THEN_I:%.*]], label [[FOR_COND1_I]]
+; CHECK:       if.then.i:
+; CHECK-NEXT:    [[HLSL_WAVE_ACTIVE_MAX7_I:%.*]] = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token [[TMP3]]) ]
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[ARRAY:%.*]], i32 0
+; CHECK-NEXT:    store i32 [[HLSL_WAVE_ACTIVE_MAX7_I]], ptr [[TMP4]], align 4
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = tail call token @llvm.experimental.convergence.entry()
+  %2 = or i32 %tidy, %tidx
+  %or.cond.i = icmp eq i32 %2, 0
+  br label %for.cond.i
+
+for.cond.i:
+  %3 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %0) ]
+  br label %for.cond1.i
+
+for.cond1.i:
+  %cmp2.i = phi i1 [ false, %for.body4.i ], [ true, %for.cond.i ]
+  %4 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %3) ]
+  br i1 %cmp2.i, label %for.body4.i, label %cleanup.i.loopexit
+
+for.body4.i:
+  br i1 %or.cond.i, label %if.then.i, label %for.cond1.i
+
+if.then.i:
+  %hlsl.wave.active.max7.i = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token %4) ]
+  %5 = getelementptr inbounds i32, ptr %array, i32 0
+  store i32 %hlsl.wave.active.max7.i, ptr %5, align 4
+  br label %cleanup.i
+
+cleanup.i.loopexit:
+  br label %cleanup.i
+
+cleanup.i:
+  br label %exit
+
+exit:
+  ret void
+}
+
+declare token @llvm.experimental.convergence.loop() #0
+
+declare i32 @llvm.spv.wave.reduce.umax.i32(i32) #0
+
+attributes #0 = { convergent }

@luciechoi luciechoi requested a review from rnk November 4, 2025 21:53
@rnk
Copy link
Collaborator

rnk commented Nov 5, 2025

It seems bad that the convergence control LangRef rules make it illegal to merge basic blocks connected by a single direct branch. If this every happens, it's a sign that the loop is degenerate, i.e. it runs no more than once. Could the backends requiring convergence be taught to tolerate the possibility of multiple convergence control intrinsics, perhaps simply by running a simple cleanup pass that eliminates degenerate loops and replaces the loop token with the token used to create it? Does that transform preserve semantics, or can it change observable behavior?

It seems like the best outcome would be that we relax the LangRef rules so fewer passes are required to scan for this non-local information.

@s-perron
Copy link
Contributor

s-perron commented Nov 5, 2025

It seems like the best outcome would be that we relax the LangRef rules so fewer passes are required to scan for this non-local information.

@ssahasra Do you know which backends use the convergence tokens? Do you have any thoughts on modifying the spec?

I think the SPIR-V backend could handle having two tokens defined in the same BB as long as the second is a loop that uses the first. We just have to do a type of copy propagation on it.

However, I'm not convinced many transforms will benefit from the rule change. We still need any transform that replicates code to be skipped if it replicates the token. How many transformations merge two block without replicating code? We will still need to add a legality check to other passes.

Copy link
Contributor

@Keenuts Keenuts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an initial comment regarding the test file (waiting to see if the discussion over the change makes progress)

@luciechoi
Copy link
Contributor Author

@rnk @s-perron @Keenuts friendly ping for review, thanks!

@github-actions
Copy link

github-actions bot commented Nov 18, 2025

🐧 Linux x64 Test Results

  • 166124 tests passed
  • 2839 tests skipped
  • 1 test failed

Failed Tests

(click on a test name to see its output)

LLVM

LLVM.Transforms/LoopUnroll/convergent.controlled.ll
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt < /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopUnroll/convergent.controlled.ll -passes=loop-unroll -unroll-runtime -unroll-allow-partial -S | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopUnroll/convergent.controlled.ll
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -passes=loop-unroll -unroll-runtime -unroll-allow-partial -S
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopUnroll/convergent.controlled.ll
# .---command stderr------------
# | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopUnroll/convergent.controlled.ll:310:15: error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: [[X_0:%.*]] = phi i32 [ 0, [[ENTRY_NEW]] ], [ [[INC_1:%.*]], [[L3]] ]
# |               ^
# | <stdin>:194:4: note: scanning from here
# | l3: ; preds = %l3.1, %entry.new
# |    ^
# | <stdin>:194:4: note: with "ENTRY_NEW" equal to "%entry.new"
# | l3: ; preds = %l3.1, %entry.new
# |    ^
# | <stdin>:194:4: note: with "L3" equal to "%l3"
# | l3: ; preds = %l3.1, %entry.new
# |    ^
# | <stdin>:195:2: note: possible intended match here
# |  %x.0 = phi i32 [ 0, %entry.new ], [ %inc.1, %l3.1 ]
# |  ^
# | 
# | Input file: <stdin>
# | Check file: /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopUnroll/convergent.controlled.ll
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |           189:  
# |           190: entry.new: ; preds = %entry 
# |           191:  %unroll_iter = sub i32 %0, %xtraiter 
# |           192:  br label %l3, !llvm.loop !4 
# |           193:  
# |           194: l3: ; preds = %l3.1, %entry.new 
# | next:310'0        X~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# | next:310'1                                      with "ENTRY_NEW" equal to "%entry.new"
# | next:310'2                                      with "L3" equal to "%l3"
# |           195:  %x.0 = phi i32 [ 0, %entry.new ], [ %inc.1, %l3.1 ] 
# | next:310'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | next:310'3      ?                                                    possible intended match
# |           196:  %niter = phi i32 [ 0, %entry.new ], [ %niter.next.1, %l3.1 ] 
# | next:310'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           197:  %tok.loop = call token @llvm.experimental.convergence.anchor() 
# | next:310'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           198:  call void @f() [ "convergencectrl"(token %tok.loop) ] 
# | next:310'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           199:  br label %l3.1 
# | next:310'0     ~~~~~~~~~~~~~~~~
# |           200:  
# | next:310'0     ~
# |             .
# |             .
# |             .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

Comment on lines +235 to +240
for (Instruction &I : *BB)
if (isa<ConvergenceControlInst>(I)) {
for (Instruction &I : *PredBB)
if (isa<ConvergenceControlInst>(I))
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a move expensive check because it has to traverse potentially two basic blocks. We should move it later. I would make it the last check before deciding to merge.

Also the way this is written is a little unclear. First if we know that there can be at most one convergence token per basic block, we could make this two separate checks and take advantage of short-circuit evaluation:

bool HasConvergenceToken(const BasicBlock *BB) {                                                                                                                                                                                                                                               
  for (const Instruction &I : *BB)                                                                                                                                                                                                                                                             
    if (isa<ConvergenceControlInst>(I))                                                                                                                                                                                                                                                        
      return true;                                                                                                                                                                                                                                                                             
  return false;                                                                                                                                                                                                                                                                                
} 

Then the check becomes:

if (HasConvergenceToken(BB) && HasConvervenceToken(PredBB))
  return false; 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants