GlobalISel: Relax verifier between physreg and typed vreg #159281

arsenm · 2025-09-17T08:17:37Z

Accept mismatched register size and type size if the type is legal
for the register class.

For AMDGPU boolean registers have 2 possible interpretations depending
on the use context type. e.g., these are both equally valid:

%0:(s1) = COPY $vcc
%1:(s64) = COPY $vcc

vcc is a 64-bit register, which can be interpreted as a 1-bit or 64-bit
value depending on the use context. SelectionDAG has never required exact
match between the register size and the used value type. You can assign
a type with a smaller size to a larger register class. Relax the verifier
to match. There are several hacks holding together these copies in
various places, and this is preparation to remove one of them.

The x86 test change is from what I would consider an X86 usage bug. X86
defines an FR32 register class and F16 register class, but the F16 register
class is functionally an alias of F32 with the same members and size. There's
no need to have the F16 class.

Accept mismatched register size and type size if the type is legal for the register class. For AMDGPU boolean registers have 2 possible interpretations depending on the use context type. e.g., these are both equally valid: %0:_(s1) = COPY $vcc %1:_(s64) = COPY $vcc vcc is a 64-bit register, which can be interpreted as a 1-bit or 64-bit value depending on the use context. SelectionDAG has never required exact match between the register size and the used value type. You can assign a type with a smaller size to a larger register class. Relax the verifier to match. There are several hacks holding together these copies in various places, and this is preparation to remove one of them. The x86 test change is from what I would consider an X86 usage bug. X86 defines an FR32 register class and F16 register class, but the F16 register class is functionally an alias of F32 with the same members and size. There's no need to have the F16 class.

arsenm · 2025-09-17T08:17:54Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-09-17T08:18:14Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-globalisel

Author: Matt Arsenault (arsenm)

Changes

Accept mismatched register size and type size if the type is legal
for the register class.

For AMDGPU boolean registers have 2 possible interpretations depending
on the use context type. e.g., these are both equally valid:

%0:(s1) = COPY $vcc
%1:(s64) = COPY $vcc

vcc is a 64-bit register, which can be interpreted as a 1-bit or 64-bit
value depending on the use context. SelectionDAG has never required exact
match between the register size and the used value type. You can assign
a type with a smaller size to a larger register class. Relax the verifier
to match. There are several hacks holding together these copies in
various places, and this is preparation to remove one of them.

The x86 test change is from what I would consider an X86 usage bug. X86
defines an FR32 register class and F16 register class, but the F16 register
class is functionally an alias of F32 with the same members and size. There's
no need to have the F16 class.

Full diff: https://github.com/llvm/llvm-project/pull/159281.diff

3 Files Affected:

(modified) llvm/lib/CodeGen/MachineVerifier.cpp (+10-6)
(added) llvm/test/MachineVerifier/AMDGPU/test_copy_physregs_llt_virtreg.mir (+58)
(modified) llvm/test/MachineVerifier/test_copy_physregs_x86.mir (+12-20)

diff --git a/llvm/lib/CodeGen/MachineVerifier.cpp b/llvm/lib/CodeGen/MachineVerifier.cpp
index 2b24fe49c970b..e911ce8a75828 100644
--- a/llvm/lib/CodeGen/MachineVerifier.cpp
+++ b/llvm/lib/CodeGen/MachineVerifier.cpp
@@ -2376,20 +2376,24 @@ void MachineVerifier::visitMachineInstrBefore(const MachineInstr *MI) {
 
     // If we have only one valid type, this is likely a copy between a virtual
     // and physical register.
-    TypeSize SrcSize = TRI->getRegSizeInBits(SrcReg, *MRI);
-    TypeSize DstSize = TRI->getRegSizeInBits(DstReg, *MRI);
+    TypeSize SrcSize = TypeSize::getZero();
+    TypeSize DstSize = TypeSize::getZero();
     if (SrcReg.isPhysical() && DstTy.isValid()) {
       const TargetRegisterClass *SrcRC =
           TRI->getMinimalPhysRegClassLLT(SrcReg, DstTy);
-      if (SrcRC)
-        SrcSize = TRI->getRegSizeInBits(*SrcRC);
+      if (!SrcRC)
+        SrcSize = TRI->getRegSizeInBits(SrcReg, *MRI);
+    } else {
+      SrcSize = TRI->getRegSizeInBits(SrcReg, *MRI);
     }
 
     if (DstReg.isPhysical() && SrcTy.isValid()) {
       const TargetRegisterClass *DstRC =
           TRI->getMinimalPhysRegClassLLT(DstReg, SrcTy);
-      if (DstRC)
-        DstSize = TRI->getRegSizeInBits(*DstRC);
+      if (!DstRC)
+        DstSize = TRI->getRegSizeInBits(DstReg, *MRI);
+    } else {
+      DstSize = TRI->getRegSizeInBits(DstReg, *MRI);
     }
 
     // The next two checks allow COPY between physical and virtual registers,
diff --git a/llvm/test/MachineVerifier/AMDGPU/test_copy_physregs_llt_virtreg.mir b/llvm/test/MachineVerifier/AMDGPU/test_copy_physregs_llt_virtreg.mir
new file mode 100644
index 0000000000000..0fd50391a7e3a
--- /dev/null
+++ b/llvm/test/MachineVerifier/AMDGPU/test_copy_physregs_llt_virtreg.mir
@@ -0,0 +1,58 @@
+# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -filetype=null %s 2>&1 | FileCheck -implicit-check-not="Bad machine code" %s
+
+---
+name:            test_valid_copies
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vcc
+    %0:_(s32) = COPY $vgpr0
+    %1:_(s16) = COPY $vgpr0
+    %2:_(s64) = COPY $vcc
+    %3:_(s1) = COPY $vcc
+    $vgpr0 = COPY %0
+    $vgpr0 = COPY %0
+    $vcc = COPY %2
+    $vcc = COPY %3
+...
+
+---
+name:            test_invalid_copies
+tracksRegLiveness: true
+body:             |
+  bb.0:
+  liveins: $vgpr0_vgpr1, $vgpr2, $vcc
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: %0:_(s32) = COPY $vgpr0_vgpr1
+  %0:_(s32) = COPY $vgpr0_vgpr1
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: %1:_(s64) = COPY $vgpr2
+  %1:_(s64) = COPY $vgpr2
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: %2:_(s32) = COPY $vcc
+  %2:_(s32) = COPY $vcc
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: %3:_(s8) = COPY $vgpr2
+  %3:_(s8) = COPY $vgpr2
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: $vgpr0_vgpr1 = COPY %0:_(s32)
+  $vgpr0_vgpr1 = COPY %0
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: $vgpr2 = COPY %1:_(s64)
+  $vgpr2 = COPY %1
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: $vcc = COPY %2:_(s32)
+  $vcc = COPY %2
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: $vgpr2 = COPY %3:_(s8)
+  $vgpr2 = COPY %3
+
+...
diff --git a/llvm/test/MachineVerifier/test_copy_physregs_x86.mir b/llvm/test/MachineVerifier/test_copy_physregs_x86.mir
index a239379a34e62..f3323c4353142 100644
--- a/llvm/test/MachineVerifier/test_copy_physregs_x86.mir
+++ b/llvm/test/MachineVerifier/test_copy_physregs_x86.mir
@@ -29,34 +29,26 @@ body:             |
     liveins: $xmm0, $xmm1, $xmm2, $xmm3
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: %0:_(s16) = COPY $xmm0
-    %0:_(s16) = COPY $xmm0
+    ; CHECK: - instruction: %0:_(<4 x s16>) = COPY $xmm1
+    %0:_(<4 x s16>) = COPY $xmm1
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: %1:_(<4 x s16>) = COPY $xmm1
-    %1:_(<4 x s16>) = COPY $xmm1
+    ; CHECK: - instruction: %1:_(s256) = COPY $xmm2
+    %1:_(s256) = COPY $xmm2
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: %2:_(s256) = COPY $xmm2
-    %2:_(s256) = COPY $xmm2
+    ; CHECK: - instruction: %2:_(<8 x s32>) = COPY $xmm3
+    %2:_(<8 x s32>) = COPY $xmm3
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: %3:_(<8 x s32>) = COPY $xmm3
-    %3:_(<8 x s32>) = COPY $xmm3
+    ; CHECK: - instruction: $xmm1 = COPY %0:_(<4 x s16>)
+    $xmm1 = COPY %0
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: $xmm0 = COPY %0:_(s16)
-    $xmm0 = COPY %0
-
-    ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: $xmm1 = COPY %1:_(<4 x s16>)
-    $xmm1 = COPY %1
+    ; CHECK: - instruction: $xmm2 = COPY %1:_(s256)
+    $xmm2 = COPY %1
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: $xmm2 = COPY %2:_(s256)
-    $xmm2 = COPY %2
-
-    ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: $xmm3 = COPY %3:_(<8 x s32>)
-    $xmm3 = COPY %3
+    ; CHECK: - instruction: $xmm3 = COPY %2:_(<8 x s32>)
+    $xmm3 = COPY %2
 ...

petar-avramovic · 2025-09-17T08:48:13Z

%0:(s1) = COPY $vcc looks more convenient then s64, why is copy better then doing %0:(s1) = call i1 @llvm.amdgcn.inverse.ballot.i64 $vcc
Why do we need to use $vcc explicitly in GlobalISel?

arsenm · 2025-09-17T08:51:03Z

%0:(s1) = COPY $vcc looks more convenient then s64, why is copy better then doing %0:(s1) = call i1 @llvm.amdgcn.inverse.ballot.i64 $vcc Why do we need to use $vcc explicitly in GlobalISel?

Not vcc specifically, but this is to support lowering i1 argument values to lanemask SGPRs

petar-avramovic · 2025-09-17T09:04:31Z

Ok, looks good so far. Does uniformity analysis correctly calculate if argument is uniform or divergent? If not it would be easier to use ballot or inverse ballot in argument lowering and normal copy

arsenm · 2025-09-17T09:59:46Z

Ok, looks good so far. Does uniformity analysis correctly calculate if argument is uniform or divergent? If not it would be easier to use ballot or inverse ballot in argument lowering and normal copy

I'd need to check, but it's not important for the purposes of this patch. Right now I'm just trying to avoid breaking existing tests using these copies after I rip out one of the wavesize hacks supporting them

Accept mismatched register size and type size if the type is legal for the register class. For AMDGPU boolean registers have 2 possible interpretations depending on the use context type. e.g., these are both equally valid: %0:_(s1) = COPY $vcc %1:_(s64) = COPY $vcc vcc is a 64-bit register, which can be interpreted as a 1-bit or 64-bit value depending on the use context. SelectionDAG has never required exact match between the register size and the used value type. You can assign a type with a smaller size to a larger register class. Relax the verifier to match. There are several hacks holding together these copies in various places, and this is preparation to remove one of them. The x86 test change is from what I would consider an X86 usage bug. X86 defines an FR32 register class and F16 register class, but the F16 register class is functionally an alias of F32 with the same members and size. There's no need to have the F16 class.

arsenm added the llvm:globalisel label Sep 17, 2025 — with Graphite App

arsenm requested review from aemerson, davemgreen, gandhi56, jayfoad and petar-avramovic September 17, 2025 08:18

arsenm marked this pull request as ready for review September 17, 2025 08:18

llvmbot added backend:AMDGPU llvm:codegen labels Sep 17, 2025

arsenm mentioned this pull request Sep 17, 2025

AMDGPU: Fixes for regbankselecting copies of i1 physregs to sgprs #159283

Merged

petar-avramovic approved these changes Sep 17, 2025

View reviewed changes

arsenm merged commit 1dbb932 into main Sep 17, 2025
15 checks passed

arsenm deleted the users/arsenm/global-isel/relax-verifier-copy-physreg-mismatched-types branch September 17, 2025 10:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GlobalISel: Relax verifier between physreg and typed vreg #159281

GlobalISel: Relax verifier between physreg and typed vreg #159281

Uh oh!

arsenm commented Sep 17, 2025

Uh oh!

arsenm commented Sep 17, 2025 •

edited

Loading

Uh oh!

llvmbot commented Sep 17, 2025 •

edited

Loading

Uh oh!

petar-avramovic commented Sep 17, 2025

Uh oh!

arsenm commented Sep 17, 2025

Uh oh!

petar-avramovic commented Sep 17, 2025

Uh oh!

arsenm commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

GlobalISel: Relax verifier between physreg and typed vreg #159281

GlobalISel: Relax verifier between physreg and typed vreg #159281

Uh oh!

Conversation

arsenm commented Sep 17, 2025

Uh oh!

arsenm commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petar-avramovic commented Sep 17, 2025

Uh oh!

arsenm commented Sep 17, 2025

Uh oh!

petar-avramovic commented Sep 17, 2025

Uh oh!

arsenm commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

arsenm commented Sep 17, 2025 •

edited

Loading

llvmbot commented Sep 17, 2025 •

edited

Loading