Skip to content

Conversation

@PrasoonMishra
Copy link
Contributor

@PrasoonMishra PrasoonMishra commented Nov 24, 2025

When SIMULATED_TRAP is at the end of a block with no successors, insertSimulatedTrap incorrectly returns the original MBB despite adding HaltLoopBB to the CFG.

EmitInstrWithCustomInserter detects CFG changes by comparing the returned MBB with the original. When they match, it assumes no modification occurred and skips MachineLoopInfo invalidation. This causes stale loop information in subsequent passes, particularly when using the NPM which relies on accurate invalidation signals.

Fix: Return HaltLoopBB to properly signal the CFG modification.

When SIMULATED_TRAP is at the end of a block with no successors,
insertSimulatedTrap incorrectly returns the original MBB despite
adding HaltLoopBB to the CFG.

EmitInstrWithCustomInserter detects CFG changes by comparing the
returned MBB with the original. When they match, it assumes no
modification occurred and skips MachineLoopInfo invalidation.
This causes stale loop information in subsequent passes.

Fix: Return HaltLoopBB to properly signal the CFG modification.
@llvmbot
Copy link
Member

llvmbot commented Nov 24, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Prasoon Mishra (PrasoonMishra)

Changes

When SIMULATED_TRAP is at the end of a block with no successors, insertSimulatedTrap incorrectly returns the original MBB despite adding HaltLoopBB to the CFG.

EmitInstrWithCustomInserter detects CFG changes by comparing the returned MBB with the original. When they match, it assumes no modification occurred and skips MachineLoopInfo invalidation. This causes stale loop information in subsequent passes.

Fix: Return HaltLoopBB to properly signal the CFG modification.


Full diff: https://github.com/llvm/llvm-project/pull/169290.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+4)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index a7333e3373f38..9b05d99e265e6 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -1963,6 +1963,10 @@ MachineBasicBlock *SIInstrInfo::insertSimulatedTrap(MachineRegisterInfo &MRI,
     BuildMI(MBB, MI, DL, get(AMDGPU::S_CBRANCH_EXECNZ)).addMBB(TrapBB);
     MF->push_back(TrapBB);
     MBB.addSuccessor(TrapBB);
+  } else {
+    // Since we're adding HaltLoopBB and modifying the CFG, we must return a
+    // different block to signal the change.
+    ContBB = HaltLoopBB;
   }
 
   // Start with a `s_trap 2`, if we're in PRIV=1 and we need the workaround this

@PrasoonMishra PrasoonMishra changed the title [AMDGPU] Fix CFG invalidation detection in insertSimulatedTrap [AMDGPU][NPM] Fix CFG invalidation detection in insertSimulatedTrap Nov 24, 2025
@PrasoonMishra
Copy link
Contributor Author

PrasoonMishra commented Nov 24, 2025

This bug was discovered while enabling the NPM pipeline for AMDGPU. Without this patch, the existing test trap-abis.ll fails under NPM.

The trap function (lines 12-104) expects loop header comments:

define amdgpu_kernel void @trap(ptr addrspace(1) nocapture readonly %arg0) {
; HSA-TRAP-GFX1100-O0-NEXT:  .LBB0_1: ; =>This Inner Loop Header: Depth=1
; HSA-TRAP-GFX1100-O0-NEXT:    s_sethalt 5
; HSA-TRAP-GFX1100-O0-NEXT:    s_branch .LBB0_1
  store volatile i32 1, ptr addrspace(1) %arg0
  call void @llvm.trap()
  unreachable
  store volatile i32 2, ptr addrspace(1) %arg0
  ret void
}

Error signature:
# .---command stderr------------                                                                                                                                                                                                             
# | /.../git/llvm-project/llvm/test/CodeGen/AMDGPU/trap-abis.ll:96:29: error: HSA-TRAP-GFX1100-O0-NEXT: expected string not found in input                                                                               
# | ; HSA-TRAP-GFX1100-O0-NEXT: .LBB0_1: ; =>This Inner Loop Header: Depth=1                                                                                                                                                                 
# |                             ^                                                                                                                                                                                                            
# | <stdin>:23:21: note: scanning from here                                                                                                                                                                                                  
# |  s_mov_b32 m0, ttmp2  

Expected: .LBB0_1: ; =>This Inner Loop Header: Depth=1
Actual (without fix): .LBB0_1: (missing loop header comment)

This happens only when using NPM pipeline because legacy pass manager conservatively recomputes analyses, so MachineLoopInfo is refreshed before AsmPrinter, masking the bug. NPM relies on accurate invalidation signals hence when insertSimulatedTrap incorrectly signals “no CFG change,” MachineLoopInfo remains stale, causing missing loop headers.

This fix resolves the bug.

@vikramRH vikramRH merged commit 1cea4a0 into llvm:main Nov 28, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 28, 2025

LLVM Buildbot has detected a new failure on builder mlir-nvidia-gcc7 running on mlir-nvidia while building llvm at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/116/builds/21528

Here is the relevant piece of the build log for the reference
Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x5a5d2f61ce80 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x5a5d2f61ce80 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

@PrasoonMishra PrasoonMishra deleted the npm-upstream-patch branch November 28, 2025 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants