[SimplifyCFG] Fix `SimplifyCFG` pass to skip folding when both blocks contain convergence intrinsics. #166452

luciechoi · 2025-11-04T21:52:32Z

Fixes a bug #165642. Similar fix is being made in IndVarSimplify pass to account for convergence tokens.

LLVM Spec states that only a single convergence intrinsic can be included in a basic block.

This PR fixes the issue in SimplifyCFG pass so that when a basic block and its predecessor both contain convergence intrinsics, it skips merging the two blocks.

llvmbot · 2025-11-04T21:53:06Z

@llvm/pr-subscribers-llvm-transforms

Author: Lucie Choi (luciechoi)

Changes

Fixes a bug #165642

LLVM Spec states that only a single convergence intrinsic can be included in a basic block.

This PR fixes the issue in SimplifyCFG pass so that when a basic block and its predecessor both contain convergence intrinsics, it skips merging the two blocks.

Full diff: https://github.com/llvm/llvm-project/pull/166452.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Utils/BasicBlockUtils.cpp (+9)
(added) llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll (+68)

diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 11db0ec487328..c1b6140abb471 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -230,6 +230,15 @@ bool llvm::MergeBlockIntoPredecessor(BasicBlock *BB, DomTreeUpdater *DTU,
   // Don't break self-loops.
   if (PredBB == BB) return false;
 
+  // Don't break if both the basic block and the predecessor contain convergent
+  // intrinsics.
+  for (Instruction &I : *BB)
+    if (isa<ConvergenceControlInst>(I)) {
+      for (Instruction &I : *PredBB)
+        if (isa<ConvergenceControlInst>(I))
+          return false;
+    }
+
   // Don't break unwinding instructions or terminators with other side-effects.
   Instruction *PTI = PredBB->getTerminator();
   if (PTI->isSpecialTerminator() || PTI->mayHaveSideEffects())
diff --git a/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll
new file mode 100644
index 0000000000000..d5ae64f6897e3
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll
@@ -0,0 +1,68 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -S -passes=simplifycfg | FileCheck %s
+
+declare token @llvm.experimental.convergence.entry() #0
+
+define void @nested(i32 %tidx, i32 %tidy, ptr %array) #0 {
+; CHECK-LABEL: @nested(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = tail call token @llvm.experimental.convergence.entry()
+; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[TIDY:%.*]], [[TIDX:%.*]]
+; CHECK-NEXT:    [[OR_COND_I:%.*]] = icmp eq i32 [[TMP1]], 0
+; CHECK-NEXT:    br label [[FOR_COND_I:%.*]]
+; CHECK:       for.cond.i:
+; CHECK-NEXT:    [[TMP2:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP0]]) ]
+; CHECK-NEXT:    br label [[FOR_COND1_I:%.*]]
+; CHECK:       for.cond1.i:
+; CHECK-NEXT:    [[CMP2_I:%.*]] = phi i1 [ false, [[FOR_BODY4_I:%.*]] ], [ true, [[FOR_COND_I]] ]
+; CHECK-NEXT:    [[TMP3:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP2]]) ]
+; CHECK-NEXT:    br i1 [[CMP2_I]], label [[FOR_BODY4_I]], label [[EXIT:%.*]]
+; CHECK:       for.body4.i:
+; CHECK-NEXT:    br i1 [[OR_COND_I]], label [[IF_THEN_I:%.*]], label [[FOR_COND1_I]]
+; CHECK:       if.then.i:
+; CHECK-NEXT:    [[HLSL_WAVE_ACTIVE_MAX7_I:%.*]] = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token [[TMP3]]) ]
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[ARRAY:%.*]], i32 0
+; CHECK-NEXT:    store i32 [[HLSL_WAVE_ACTIVE_MAX7_I]], ptr [[TMP4]], align 4
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %0 = tail call token @llvm.experimental.convergence.entry()
+  %2 = or i32 %tidy, %tidx
+  %or.cond.i = icmp eq i32 %2, 0
+  br label %for.cond.i
+
+for.cond.i:
+  %3 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %0) ]
+  br label %for.cond1.i
+
+for.cond1.i:
+  %cmp2.i = phi i1 [ false, %for.body4.i ], [ true, %for.cond.i ]
+  %4 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %3) ]
+  br i1 %cmp2.i, label %for.body4.i, label %cleanup.i.loopexit
+
+for.body4.i:
+  br i1 %or.cond.i, label %if.then.i, label %for.cond1.i
+
+if.then.i:
+  %hlsl.wave.active.max7.i = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token %4) ]
+  %5 = getelementptr inbounds i32, ptr %array, i32 0
+  store i32 %hlsl.wave.active.max7.i, ptr %5, align 4
+  br label %cleanup.i
+
+cleanup.i.loopexit:
+  br label %cleanup.i
+
+cleanup.i:
+  br label %exit
+
+exit:
+  ret void
+}
+
+declare token @llvm.experimental.convergence.loop() #0
+
+declare i32 @llvm.spv.wave.reduce.umax.i32(i32) #0
+
+attributes #0 = { convergent }

rnk · 2025-11-05T00:12:43Z

It seems bad that the convergence control LangRef rules make it illegal to merge basic blocks connected by a single direct branch. If this every happens, it's a sign that the loop is degenerate, i.e. it runs no more than once. Could the backends requiring convergence be taught to tolerate the possibility of multiple convergence control intrinsics, perhaps simply by running a simple cleanup pass that eliminates degenerate loops and replaces the loop token with the token used to create it? Does that transform preserve semantics, or can it change observable behavior?

It seems like the best outcome would be that we relax the LangRef rules so fewer passes are required to scan for this non-local information.

s-perron · 2025-11-05T20:47:40Z

It seems like the best outcome would be that we relax the LangRef rules so fewer passes are required to scan for this non-local information.

@ssahasra Do you know which backends use the convergence tokens? Do you have any thoughts on modifying the spec?

I think the SPIR-V backend could handle having two tokens defined in the same BB as long as the second is a loop that uses the first. We just have to do a type of copy propagation on it.

However, I'm not convinced many transforms will benefit from the rule change. We still need any transform that replicates code to be skipped if it replicates the token. How many transformations merge two block without replicating code? We will still need to add a legality check to other passes.

Keenuts

an initial comment regarding the test file (waiting to see if the discussion over the change makes progress)

llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll

luciechoi · 2025-11-18T20:00:25Z

@rnk @s-perron @Keenuts friendly ping for review, thanks!

github-actions · 2025-11-18T20:30:01Z

🐧 Linux x64 Test Results

166124 tests passed
2839 tests skipped
1 test failed

Failed Tests

(click on a test name to see its output)

LLVM

LLVM.Transforms/LoopUnroll/convergent.controlled.ll

Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt < /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopUnroll/convergent.controlled.ll -passes=loop-unroll -unroll-runtime -unroll-allow-partial -S | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopUnroll/convergent.controlled.ll
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -passes=loop-unroll -unroll-runtime -unroll-allow-partial -S
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopUnroll/convergent.controlled.ll
# .---command stderr------------
# | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopUnroll/convergent.controlled.ll:310:15: error: CHECK-NEXT: expected string not found in input
# | ; CHECK-NEXT: [[X_0:%.*]] = phi i32 [ 0, [[ENTRY_NEW]] ], [ [[INC_1:%.*]], [[L3]] ]
# |               ^
# | <stdin>:194:4: note: scanning from here
# | l3: ; preds = %l3.1, %entry.new
# |    ^
# | <stdin>:194:4: note: with "ENTRY_NEW" equal to "%entry.new"
# | l3: ; preds = %l3.1, %entry.new
# |    ^
# | <stdin>:194:4: note: with "L3" equal to "%l3"
# | l3: ; preds = %l3.1, %entry.new
# |    ^
# | <stdin>:195:2: note: possible intended match here
# |  %x.0 = phi i32 [ 0, %entry.new ], [ %inc.1, %l3.1 ]
# |  ^
# | 
# | Input file: <stdin>
# | Check file: /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/LoopUnroll/convergent.controlled.ll
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |           189:  
# |           190: entry.new: ; preds = %entry 
# |           191:  %unroll_iter = sub i32 %0, %xtraiter 
# |           192:  br label %l3, !llvm.loop !4 
# |           193:  
# |           194: l3: ; preds = %l3.1, %entry.new 
# | next:310'0        X~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# | next:310'1                                      with "ENTRY_NEW" equal to "%entry.new"
# | next:310'2                                      with "L3" equal to "%l3"
# |           195:  %x.0 = phi i32 [ 0, %entry.new ], [ %inc.1, %l3.1 ] 
# | next:310'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | next:310'3      ?                                                    possible intended match
# |           196:  %niter = phi i32 [ 0, %entry.new ], [ %niter.next.1, %l3.1 ] 
# | next:310'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           197:  %tok.loop = call token @llvm.experimental.convergence.anchor() 
# | next:310'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           198:  call void @f() [ "convergencectrl"(token %tok.loop) ] 
# | next:310'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           199:  br label %l3.1 
# | next:310'0     ~~~~~~~~~~~~~~~~
# |           200:  
# | next:310'0     ~
# |             .
# |             .
# |             .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

s-perron · 2025-11-19T15:18:05Z

llvm/lib/Transforms/Utils/BasicBlockUtils.cpp

+  for (Instruction &I : *BB)
+    if (isa<ConvergenceControlInst>(I)) {
+      for (Instruction &I : *PredBB)
+        if (isa<ConvergenceControlInst>(I))
+          return false;
+    }


This is a move expensive check because it has to traverse potentially two basic blocks. We should move it later. I would make it the last check before deciding to merge.

Also the way this is written is a little unclear. First if we know that there can be at most one convergence token per basic block, we could make this two separate checks and take advantage of short-circuit evaluation:

bool HasConvergenceToken(const BasicBlock *BB) { for (const Instruction &I : *BB) if (isa<ConvergenceControlInst>(I)) return true; return false; }

Then the check becomes:

if (HasConvergenceToken(BB) && HasConvervenceToken(PredBB)) return false;

llvmbot added the llvm:transforms label Nov 4, 2025

luciechoi requested review from Keenuts and s-perron November 4, 2025 21:53

luciechoi requested a review from rnk November 4, 2025 21:53

Keenuts reviewed Nov 12, 2025

View reviewed changes

llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll Outdated Show resolved Hide resolved

luciechoi added 2 commits November 18, 2025 19:43

Fix simplifycfg pass

8d678c1

Remove hlsl&spv specific functions

0cad7f0

luciechoi force-pushed the cfgsimplify branch from 6228972 to 0cad7f0 Compare November 18, 2025 19:59

Merge branch 'main' into cfgsimplify

9047980

Keenuts approved these changes Nov 19, 2025

View reviewed changes

s-perron reviewed Nov 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SimplifyCFG] Fix `SimplifyCFG` pass to skip folding when both blocks contain convergence intrinsics. #166452

[SimplifyCFG] Fix `SimplifyCFG` pass to skip folding when both blocks contain convergence intrinsics. #166452

luciechoi commented Nov 4, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 4, 2025

Uh oh!

rnk commented Nov 5, 2025

Uh oh!

s-perron commented Nov 5, 2025

Uh oh!

Keenuts left a comment

Uh oh!

Uh oh!

luciechoi commented Nov 18, 2025

Uh oh!

github-actions bot commented Nov 18, 2025 •

edited

Loading

Uh oh!

s-perron Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SimplifyCFG] Fix SimplifyCFG pass to skip folding when both blocks contain convergence intrinsics. #166452

Are you sure you want to change the base?

[SimplifyCFG] Fix SimplifyCFG pass to skip folding when both blocks contain convergence intrinsics. #166452

Conversation

luciechoi commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 4, 2025

Uh oh!

rnk commented Nov 5, 2025

Uh oh!

s-perron commented Nov 5, 2025

Uh oh!

Keenuts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luciechoi commented Nov 18, 2025

Uh oh!

github-actions bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Failed Tests

LLVM

Uh oh!

s-perron Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SimplifyCFG] Fix `SimplifyCFG` pass to skip folding when both blocks contain convergence intrinsics. #166452

[SimplifyCFG] Fix `SimplifyCFG` pass to skip folding when both blocks contain convergence intrinsics. #166452

luciechoi commented Nov 4, 2025 •

edited

Loading

github-actions bot commented Nov 18, 2025 •

edited

Loading