[AMDGPU][GlobalISel] Fix issue with copy_scc_vcc on gfx7 #165355

vangthao95 · 2025-10-28T08:04:10Z

When selecting for G_AMDGPU_COPY_SCC_VCC, we use S_CMP_LG_U64 or S_CMP_LG_U32 for wave64 and wave32 respectively. However, on gfx7 we do not have the S_CMP_LG_U64 instruction. Work around this issue by using S_OR_B64 instead.

llvmbot · 2025-10-28T08:04:36Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: None (vangthao95)

Changes

When selecting for G_AMDGPU_COPY_SCC_VCC, we use S_CMP_LG_U64 or S_CMP_LG_U32 for wave64 and wave32 respectively. However, on gfx7 we do not have the S_CMP_LG_U64 instruction. Work around this issue by using S_OR_B64 instead.

Full diff: https://github.com/llvm/llvm-project/pull/165355.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+16-5)
(added) llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-copy-scc-vcc.mir (+38)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index 97c2c9c5316b3..1066601705d48 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -221,12 +221,23 @@ bool AMDGPUInstructionSelector::selectCOPY(MachineInstr &I) const {
 bool AMDGPUInstructionSelector::selectCOPY_SCC_VCC(MachineInstr &I) const {
   const DebugLoc &DL = I.getDebugLoc();
   MachineBasicBlock *BB = I.getParent();
+  Register VCCReg = I.getOperand(1).getReg();
+  MachineInstr *Cmp;
+
+  if (STI.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
+    unsigned CmpOpc = STI.isWave64() ? AMDGPU::S_CMP_LG_U64 : AMDGPU::S_CMP_LG_U32;
+    Cmp = BuildMI(*BB, &I, DL, TII.get(CmpOpc))
+              .addReg(VCCReg)
+              .addImm(0);
+  } else {
+    // For gfx7 and earlier, S_CMP_LG_U64 doesn't exist, so we use S_OR_B64
+    // which sets SCC as a side effect.
+    Register DeadDst = MRI->createVirtualRegister(&AMDGPU::SReg_64RegClass);
+    Cmp = BuildMI(*BB, &I, DL, TII.get(AMDGPU::S_OR_B64), DeadDst)
+              .addReg(VCCReg)
+              .addReg(VCCReg);
+  }
 
-  unsigned CmpOpc =
-      STI.isWave64() ? AMDGPU::S_CMP_LG_U64 : AMDGPU::S_CMP_LG_U32;
-  MachineInstr *Cmp = BuildMI(*BB, &I, DL, TII.get(CmpOpc))
-                          .addReg(I.getOperand(1).getReg())
-                          .addImm(0);
   if (!constrainSelectedInstRegOperands(*Cmp, TII, TRI, RBI))
     return false;
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-copy-scc-vcc.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-copy-scc-vcc.mir
new file mode 100644
index 0000000000000..8b5fc565bdc1c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-copy-scc-vcc.mir
@@ -0,0 +1,38 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn -mcpu=gfx700 -run-pass=instruction-select -verify-machineinstrs %s -o - | FileCheck -check-prefix=GFX7 %s
+# RUN: llc -mtriple=amdgcn -mcpu=gfx803 -run-pass=instruction-select -verify-machineinstrs %s -o - | FileCheck -check-prefix=GFX8 %s
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass=instruction-select -verify-machineinstrs %s -o - | FileCheck -check-prefix=GFX11 %s
+
+---
+name: test_copy_scc_vcc
+legalized: true
+regBankSelected: true
+tracksRegLiveness: true
+body: |
+  bb.0:
+
+    ; GFX7-LABEL: name: test_copy_scc_vcc
+    ; GFX7: [[DEF:%[0-9]+]]:sreg_64_xexec = IMPLICIT_DEF
+    ; GFX7-NEXT: [[S_OR_B64_:%[0-9]+]]:sreg_64 = S_OR_B64 [[DEF]], [[DEF]], implicit-def $scc
+    ; GFX7-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $scc
+    ; GFX7-NEXT: $sgpr0 = COPY [[COPY]]
+    ; GFX7-NEXT: S_ENDPGM 0, implicit $sgpr0
+    ;
+    ; GFX8-LABEL: name: test_copy_scc_vcc
+    ; GFX8: [[DEF:%[0-9]+]]:sreg_64_xexec = IMPLICIT_DEF
+    ; GFX8-NEXT: S_CMP_LG_U64 [[DEF]], 0, implicit-def $scc
+    ; GFX8-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $scc
+    ; GFX8-NEXT: $sgpr0 = COPY [[COPY]]
+    ; GFX8-NEXT: S_ENDPGM 0, implicit $sgpr0
+    ;
+    ; GFX11-LABEL: name: test_copy_scc_vcc
+    ; GFX11: [[DEF:%[0-9]+]]:sreg_32_xm0_xexec = IMPLICIT_DEF
+    ; GFX11-NEXT: S_CMP_LG_U32 [[DEF]], 0, implicit-def $scc
+    ; GFX11-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $scc
+    ; GFX11-NEXT: $sgpr0 = COPY [[COPY]]
+    ; GFX11-NEXT: S_ENDPGM 0, implicit $sgpr0
+    %0:vcc(s1) = G_IMPLICIT_DEF
+    %1:sgpr(s32) = G_AMDGPU_COPY_SCC_VCC %0
+    $sgpr0 = COPY %1
+    S_ENDPGM 0, implicit $sgpr0
+...

github-actions · 2025-10-28T08:05:56Z

✅ With the latest revision this PR passed the C/C++ code formatter.

petar-avramovic · 2025-10-28T11:37:49Z

Can you add one ll test?
Also could we do the OR on newer targets also?

vangthao95 · 2025-10-28T17:54:47Z

Can you add one ll test? Also could we do the OR on newer targets also?

Sure, using OR for all targets makes sense. I also added one ll test.

jayfoad · 2025-10-29T09:41:10Z

Also could we do the OR on newer targets also?

Won't that waste an SGPR in some cases, since we need to allocate one for the dead dest of the OR?

petar-avramovic · 2025-10-29T09:58:00Z

Also could we do the OR on newer targets also?

Won't that waste an SGPR in some cases, since we need to allocate one for the dead dest of the OR?

True, @vangthao95 can you restore use of compare on new targets

llvm-ci · 2025-10-30T15:31:30Z

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/29994

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang-Unit :: ./AllClangUnitTests/25/49' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/tools/clang/unittests/./AllClangUnitTests-Clang-Unit-79788-25-49.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=49 GTEST_SHARD_INDEX=25 /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/tools/clang/unittests/./AllClangUnitTests
--

Script:
--
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/tools/clang/unittests/./AllClangUnitTests --gtest_filter=TimeProfilerTest.ConstantEvaluationCxx20
--
/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/unittests/Support/TimeProfilerTest.cpp:247: Failure
Expected equality of these values:
  R"(
Frontend (test.cc)
| ParseDeclarationOrFunctionDefinition (test.cc:2:1)
| ParseDeclarationOrFunctionDefinition (test.cc:6:1)
| | ParseFunctionDefinition (slow_func)
| | | EvaluateAsRValue (<test.cc:8:21>)
| | | EvaluateForOverflow (<test.cc:8:21, col:25>)
| | | EvaluateForOverflow (<test.cc:8:30, col:32>)
| | | EvaluateAsRValue (<test.cc:9:14>)
| | | EvaluateForOverflow (<test.cc:9:9, col:14>)
| | | isPotentialConstantExpr (slow_namespace::slow_func)
| | | EvaluateAsBooleanCondition (<test.cc:8:21, col:25>)
| | | | EvaluateAsRValue (<test.cc:8:21, col:25>)
| | | EvaluateAsBooleanCondition (<test.cc:8:21, col:25>)
| | | | EvaluateAsRValue (<test.cc:8:21, col:25>)
| ParseDeclarationOrFunctionDefinition (test.cc:16:1)
| | ParseFunctionDefinition (slow_test)
| | | EvaluateAsInitializer (slow_value)
| | | EvaluateAsConstantExpr (<test.cc:17:33, col:59>)
| | | EvaluateAsConstantExpr (<test.cc:18:11, col:37>)
| ParseDeclarationOrFunctionDefinition (test.cc:22:1)
| | EvaluateAsConstantExpr (<test.cc:23:31, col:57>)
| | EvaluateAsRValue (<test.cc:22:14, line:23:58>)
| ParseDeclarationOrFunctionDefinition (test.cc:25:1)
| | EvaluateAsInitializer (slow_init_list)
| PerformPendingInstantiations
)"
    Which is: "\nFrontend (test.cc)\n| ParseDeclarationOrFunctionDefinition (test.cc:2:1)\n| ParseDeclarationOrFunctionDefinition (test.cc:6:1)\n| | ParseFunctionDefinition (slow_func)\n| | | EvaluateAsRValue (<test.cc:8:21>)\n| | | EvaluateForOverflow (<test.cc:8:21, col:25>)\n| | | EvaluateForOverflow (<test.cc:8:30, col:32>)\n| | | EvaluateAsRValue (<test.cc:9:14>)\n| | | EvaluateForOverflow (<test.cc:9:9, col:14>)\n| | | isPotentialConstantExpr (slow_namespace::slow_func)\n| | | EvaluateAsBooleanCondition (<test.cc:8:21, col:25>)\n| | | | EvaluateAsRValue (<test.cc:8:21, col:25>)\n| | | EvaluateAsBooleanCondition (<test.cc:8:21, col:25>)\n| | | | EvaluateAsRValue (<test.cc:8:21, col:25>)\n| ParseDeclarationOrFunctionDefinition (test.cc:16:1)\n| | ParseFunctionDefinition (slow_test)\n| | | EvaluateAsInitializer (slow_value)\n| | | EvaluateAsConstantExpr (<test.cc:17:33, col:59>)\n| | | EvaluateAsConstantExpr (<test.cc:18:11, col:37>)\n| ParseDeclarationOrFunctionDefinition (test.cc:22:1)\n| | EvaluateAsConstantExpr (<test.cc:23:31, col:57>)\n| | EvaluateAsRValue (<test.cc:22:14, line:23:58>)\n| ParseDeclarationOrFunctionDefinition (test.cc:25:1)\n| | EvaluateAsInitializer (slow_init_list)\n| PerformPendingInstantiations\n"
  buildTraceGraph(Json)
    Which is: "\nFrontend (test.cc)\n| ParseDeclarationOrFunctionDefinition (test.cc:2:1)\n| ParseDeclarationOrFunctionDefinition (test.cc:6:1)\n| | ParseFunctionDefinition (slow_func)\n| | | EvaluateAsRValue (<test.cc:8:21>)\n| | | EvaluateForOverflow (<test.cc:8:21, col:25>)\n| | | EvaluateForOverflow (<test.cc:8:30, col:32>)\n| | | EvaluateAsRValue (<test.cc:9:14>)\n| | | EvaluateForOverflow (<test.cc:9:9, col:14>)\n| | | isPotentialConstantExpr (slow_namespace::slow_func)\n| | | EvaluateAsBooleanCondition (<test.cc:8:21, col:25>)\n| | | | EvaluateAsRValue (<test.cc:8:21, col:25>)\n| | | | EvaluateAsBooleanCondition (<test.cc:8:21, col:25>)\n| | | | | EvaluateAsRValue (<test.cc:8:21, col:25>)\n| ParseDeclarationOrFunctionDefinition (test.cc:16:1)\n| | ParseFunctionDefinition (slow_test)\n| | | EvaluateAsInitializer (slow_value)\n| | | EvaluateAsConstantExpr (<test.cc:17:33, col:59>)\n| | | EvaluateAsConstantExpr (<test.cc:18:11, col:37>)\n| ParseDeclarationOrFunctionDefinition (test.cc:22:1)\n| | EvaluateAsConstantExpr (<test.cc:23:31, col:57>)\n| | EvaluateAsRValue (<test.cc:22:14, line:23:58>)\n| ParseDeclarationOrFunctionDefinition (test.cc:25:1)\n| | EvaluateAsInitializer (slow_init_list)\n| PerformPendingInstantiations\n"
With diff:
@@ -12,6 +12,6 @@
 | | | EvaluateAsBooleanCondition (<test.cc:8:21, col:25>)
 | | | | EvaluateAsRValue (<test.cc:8:21, col:25>)
-| | | EvaluateAsBooleanCondition (<test.cc:8:21, col:25>)
-| | | | EvaluateAsRValue (<test.cc:8:21, col:25>)
+| | | | EvaluateAsBooleanCondition (<test.cc:8:21, col:25>)
+| | | | | EvaluateAsRValue (<test.cc:8:21, col:25>)
...

When selecting for G_AMDGPU_COPY_SCC_VCC, we use S_CMP_LG_U64 or S_CMP_LG_U32 for wave64 and wave32 respectively. However, on gfx7 we do not have the S_CMP_LG_U64 instruction. Work around this issue by using S_OR_B64 instead.

arsenm · 2025-10-30T22:08:10Z

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

+  Register VCCReg = I.getOperand(1).getReg();
+  MachineInstr *Cmp;
+
+  if (STI.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {


Should be using ST.hasScalarCompareEq64()

arsenm · 2025-10-30T22:08:51Z

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

+    // For gfx7 and earlier, S_CMP_LG_U64 doesn't exist, so we use S_OR_B64
+    // which sets SCC as a side effect.


Both of these paths use sac

arsenm · 2025-10-30T22:09:02Z

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-copy-scc-vcc.ll

+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX7 %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn -mcpu=gfx803 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX8 %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX11 %s


Don't need -verify-machineinstrs

arsenm · 2025-10-30T22:09:16Z

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-copy-scc-vcc.mir

@@ -0,0 +1,37 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn -mcpu=gfx700 -run-pass=instruction-select -verify-machineinstrs %s -o - | FileCheck -check-prefixes=GFX7 %s


Don't need -verify-machineinstrs

Follow-up patch to address the comments in llvm#165355.

Follow-up patch to address the comments in #165355.

…(#165797) Follow-up patch to address the comments in llvm/llvm-project#165355.

When selecting for G_AMDGPU_COPY_SCC_VCC, we use S_CMP_LG_U64 or S_CMP_LG_U32 for wave64 and wave32 respectively. However, on gfx7 we do not have the S_CMP_LG_U64 instruction. Work around this issue by using S_OR_B64 instead.

[AMDGPU][GlobalISel] Fix issue with copy_scc_vcc on gfx7

0e86009

When selecting for G_AMDGPU_COPY_SCC_VCC, we use S_CMP_LG_U64 or S_CMP_LG_U32 for wave64 and wave32 respectively. However, on gfx7 we do not have the S_CMP_LG_U64 instruction. Work around this issue by using S_OR_B64 instead.

vangthao95 requested review from arsenm and petar-avramovic October 28, 2025 08:04

vangthao95 added backend:AMDGPU llvm:globalisel labels Oct 28, 2025

Fix formatting

bbae7b3

Add ll test and change to OR instruction on newer targets also.

74a2bea

On newer targets, revert back to using s_cmp

47acd93

petar-avramovic approved these changes Oct 30, 2025

View reviewed changes

vangthao95 merged commit ba5cde7 into llvm:main Oct 30, 2025
10 checks passed

vangthao95 deleted the fix-instselect-copy-scc-vcc-gfx7 branch October 30, 2025 15:19

arsenm reviewed Oct 30, 2025

View reviewed changes

vangthao95 added a commit to vangthao95/llvm-project that referenced this pull request Oct 30, 2025

[AMDGPU][GlobalISel] Clean up selectCOPY_SCC_VCC function

8316b73

Follow-up patch to address the comments in llvm#165355.

vangthao95 mentioned this pull request Oct 30, 2025

[AMDGPU][GlobalISel] Clean up selectCOPY_SCC_VCC function #165797

Merged

vangthao95 added a commit that referenced this pull request Oct 31, 2025

[AMDGPU][GlobalISel] Clean up selectCOPY_SCC_VCC function (#165797)

d1d6350

Follow-up patch to address the comments in #165355.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 31, 2025

Automerge: [AMDGPU][GlobalISel] Clean up selectCOPY_SCC_VCC function …

d2745a9

…(#165797) Follow-up patch to address the comments in llvm/llvm-project#165355.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][GlobalISel] Fix issue with copy_scc_vcc on gfx7 #165355

[AMDGPU][GlobalISel] Fix issue with copy_scc_vcc on gfx7 #165355

vangthao95 commented Oct 28, 2025

Uh oh!

llvmbot commented Oct 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 28, 2025 •

edited

Loading

Uh oh!

petar-avramovic commented Oct 28, 2025

Uh oh!

vangthao95 commented Oct 28, 2025

Uh oh!

jayfoad commented Oct 29, 2025

Uh oh!

petar-avramovic commented Oct 29, 2025

Uh oh!

Uh oh!

llvm-ci commented Oct 30, 2025

Uh oh!

arsenm Oct 30, 2025

Uh oh!

arsenm Oct 30, 2025

Uh oh!

arsenm Oct 30, 2025

Uh oh!

arsenm Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		// For gfx7 and earlier, S_CMP_LG_U64 doesn't exist, so we use S_OR_B64
		// which sets SCC as a side effect.

		@@ -0,0 +1,37 @@
		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
		# RUN: llc -mtriple=amdgcn -mcpu=gfx700 -run-pass=instruction-select -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=GFX7 %s

[AMDGPU][GlobalISel] Fix issue with copy_scc_vcc on gfx7 #165355

[AMDGPU][GlobalISel] Fix issue with copy_scc_vcc on gfx7 #165355

Conversation

vangthao95 commented Oct 28, 2025

Uh oh!

llvmbot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petar-avramovic commented Oct 28, 2025

Uh oh!

vangthao95 commented Oct 28, 2025

Uh oh!

jayfoad commented Oct 29, 2025

Uh oh!

petar-avramovic commented Oct 29, 2025

Uh oh!

Uh oh!

llvm-ci commented Oct 30, 2025

Uh oh!

arsenm Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

llvmbot commented Oct 28, 2025 •

edited

Loading

github-actions bot commented Oct 28, 2025 •

edited

Loading