Skip to content

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Sep 17, 2025

Accept mismatched register size and type size if the type is legal
for the register class.

For AMDGPU boolean registers have 2 possible interpretations depending
on the use context type. e.g., these are both equally valid:

%0:(s1) = COPY $vcc
%1:
(s64) = COPY $vcc

vcc is a 64-bit register, which can be interpreted as a 1-bit or 64-bit
value depending on the use context. SelectionDAG has never required exact
match between the register size and the used value type. You can assign
a type with a smaller size to a larger register class. Relax the verifier
to match. There are several hacks holding together these copies in
various places, and this is preparation to remove one of them.

The x86 test change is from what I would consider an X86 usage bug. X86
defines an FR32 register class and F16 register class, but the F16 register
class is functionally an alias of F32 with the same members and size. There's
no need to have the F16 class.

Accept mismatched register size and type size if the type is legal
for the register class.

For AMDGPU boolean registers have 2 possible interpretations depending
on the use context type. e.g., these are both equally valid:

  %0:_(s1) = COPY $vcc
  %1:_(s64) = COPY $vcc

vcc is a 64-bit register, which can be interpreted as a 1-bit or 64-bit
value depending on the use context. SelectionDAG has never required exact
match between the register size and the used value type. You can assign
a type with a smaller size to a larger register class. Relax the verifier
to match.  There are several hacks holding together these copies in
various places, and this is preparation to remove one of them.

The x86 test change is from what I would consider an X86 usage bug. X86
defines an FR32 register class and F16 register class, but the F16 register
class is functionally an alias of F32 with the same members and size. There's
no need to have the F16 class.
Copy link
Contributor Author

arsenm commented Sep 17, 2025

@llvmbot
Copy link
Member

llvmbot commented Sep 17, 2025

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-globalisel

Author: Matt Arsenault (arsenm)

Changes

Accept mismatched register size and type size if the type is legal
for the register class.

For AMDGPU boolean registers have 2 possible interpretations depending
on the use context type. e.g., these are both equally valid:

%0:(s1) = COPY $vcc
%1:
(s64) = COPY $vcc

vcc is a 64-bit register, which can be interpreted as a 1-bit or 64-bit
value depending on the use context. SelectionDAG has never required exact
match between the register size and the used value type. You can assign
a type with a smaller size to a larger register class. Relax the verifier
to match. There are several hacks holding together these copies in
various places, and this is preparation to remove one of them.

The x86 test change is from what I would consider an X86 usage bug. X86
defines an FR32 register class and F16 register class, but the F16 register
class is functionally an alias of F32 with the same members and size. There's
no need to have the F16 class.


Full diff: https://github.com/llvm/llvm-project/pull/159281.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineVerifier.cpp (+10-6)
  • (added) llvm/test/MachineVerifier/AMDGPU/test_copy_physregs_llt_virtreg.mir (+58)
  • (modified) llvm/test/MachineVerifier/test_copy_physregs_x86.mir (+12-20)
diff --git a/llvm/lib/CodeGen/MachineVerifier.cpp b/llvm/lib/CodeGen/MachineVerifier.cpp
index 2b24fe49c970b..e911ce8a75828 100644
--- a/llvm/lib/CodeGen/MachineVerifier.cpp
+++ b/llvm/lib/CodeGen/MachineVerifier.cpp
@@ -2376,20 +2376,24 @@ void MachineVerifier::visitMachineInstrBefore(const MachineInstr *MI) {
 
     // If we have only one valid type, this is likely a copy between a virtual
     // and physical register.
-    TypeSize SrcSize = TRI->getRegSizeInBits(SrcReg, *MRI);
-    TypeSize DstSize = TRI->getRegSizeInBits(DstReg, *MRI);
+    TypeSize SrcSize = TypeSize::getZero();
+    TypeSize DstSize = TypeSize::getZero();
     if (SrcReg.isPhysical() && DstTy.isValid()) {
       const TargetRegisterClass *SrcRC =
           TRI->getMinimalPhysRegClassLLT(SrcReg, DstTy);
-      if (SrcRC)
-        SrcSize = TRI->getRegSizeInBits(*SrcRC);
+      if (!SrcRC)
+        SrcSize = TRI->getRegSizeInBits(SrcReg, *MRI);
+    } else {
+      SrcSize = TRI->getRegSizeInBits(SrcReg, *MRI);
     }
 
     if (DstReg.isPhysical() && SrcTy.isValid()) {
       const TargetRegisterClass *DstRC =
           TRI->getMinimalPhysRegClassLLT(DstReg, SrcTy);
-      if (DstRC)
-        DstSize = TRI->getRegSizeInBits(*DstRC);
+      if (!DstRC)
+        DstSize = TRI->getRegSizeInBits(DstReg, *MRI);
+    } else {
+      DstSize = TRI->getRegSizeInBits(DstReg, *MRI);
     }
 
     // The next two checks allow COPY between physical and virtual registers,
diff --git a/llvm/test/MachineVerifier/AMDGPU/test_copy_physregs_llt_virtreg.mir b/llvm/test/MachineVerifier/AMDGPU/test_copy_physregs_llt_virtreg.mir
new file mode 100644
index 0000000000000..0fd50391a7e3a
--- /dev/null
+++ b/llvm/test/MachineVerifier/AMDGPU/test_copy_physregs_llt_virtreg.mir
@@ -0,0 +1,58 @@
+# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -filetype=null %s 2>&1 | FileCheck -implicit-check-not="Bad machine code" %s
+
+---
+name:            test_valid_copies
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vcc
+    %0:_(s32) = COPY $vgpr0
+    %1:_(s16) = COPY $vgpr0
+    %2:_(s64) = COPY $vcc
+    %3:_(s1) = COPY $vcc
+    $vgpr0 = COPY %0
+    $vgpr0 = COPY %0
+    $vcc = COPY %2
+    $vcc = COPY %3
+...
+
+---
+name:            test_invalid_copies
+tracksRegLiveness: true
+body:             |
+  bb.0:
+  liveins: $vgpr0_vgpr1, $vgpr2, $vcc
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: %0:_(s32) = COPY $vgpr0_vgpr1
+  %0:_(s32) = COPY $vgpr0_vgpr1
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: %1:_(s64) = COPY $vgpr2
+  %1:_(s64) = COPY $vgpr2
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: %2:_(s32) = COPY $vcc
+  %2:_(s32) = COPY $vcc
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: %3:_(s8) = COPY $vgpr2
+  %3:_(s8) = COPY $vgpr2
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: $vgpr0_vgpr1 = COPY %0:_(s32)
+  $vgpr0_vgpr1 = COPY %0
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: $vgpr2 = COPY %1:_(s64)
+  $vgpr2 = COPY %1
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: $vcc = COPY %2:_(s32)
+  $vcc = COPY %2
+
+  ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
+  ; CHECK: - instruction: $vgpr2 = COPY %3:_(s8)
+  $vgpr2 = COPY %3
+
+...
diff --git a/llvm/test/MachineVerifier/test_copy_physregs_x86.mir b/llvm/test/MachineVerifier/test_copy_physregs_x86.mir
index a239379a34e62..f3323c4353142 100644
--- a/llvm/test/MachineVerifier/test_copy_physregs_x86.mir
+++ b/llvm/test/MachineVerifier/test_copy_physregs_x86.mir
@@ -29,34 +29,26 @@ body:             |
     liveins: $xmm0, $xmm1, $xmm2, $xmm3
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: %0:_(s16) = COPY $xmm0
-    %0:_(s16) = COPY $xmm0
+    ; CHECK: - instruction: %0:_(<4 x s16>) = COPY $xmm1
+    %0:_(<4 x s16>) = COPY $xmm1
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: %1:_(<4 x s16>) = COPY $xmm1
-    %1:_(<4 x s16>) = COPY $xmm1
+    ; CHECK: - instruction: %1:_(s256) = COPY $xmm2
+    %1:_(s256) = COPY $xmm2
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: %2:_(s256) = COPY $xmm2
-    %2:_(s256) = COPY $xmm2
+    ; CHECK: - instruction: %2:_(<8 x s32>) = COPY $xmm3
+    %2:_(<8 x s32>) = COPY $xmm3
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: %3:_(<8 x s32>) = COPY $xmm3
-    %3:_(<8 x s32>) = COPY $xmm3
+    ; CHECK: - instruction: $xmm1 = COPY %0:_(<4 x s16>)
+    $xmm1 = COPY %0
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: $xmm0 = COPY %0:_(s16)
-    $xmm0 = COPY %0
-
-    ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: $xmm1 = COPY %1:_(<4 x s16>)
-    $xmm1 = COPY %1
+    ; CHECK: - instruction: $xmm2 = COPY %1:_(s256)
+    $xmm2 = COPY %1
 
     ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: $xmm2 = COPY %2:_(s256)
-    $xmm2 = COPY %2
-
-    ; CHECK: *** Bad machine code: Copy Instruction is illegal with mismatching sizes ***
-    ; CHECK: - instruction: $xmm3 = COPY %3:_(<8 x s32>)
-    $xmm3 = COPY %3
+    ; CHECK: - instruction: $xmm3 = COPY %2:_(<8 x s32>)
+    $xmm3 = COPY %2
 ...

@petar-avramovic
Copy link
Collaborator

%0:(s1) = COPY $vcc looks more convenient then s64, why is copy better then doing %0:(s1) = call i1 @llvm.amdgcn.inverse.ballot.i64 $vcc
Why do we need to use $vcc explicitly in GlobalISel?

@arsenm
Copy link
Contributor Author

arsenm commented Sep 17, 2025

%0:(s1) = COPY $vcc looks more convenient then s64, why is copy better then doing %0:(s1) = call i1 @llvm.amdgcn.inverse.ballot.i64 $vcc Why do we need to use $vcc explicitly in GlobalISel?

Not vcc specifically, but this is to support lowering i1 argument values to lanemask SGPRs

@petar-avramovic
Copy link
Collaborator

Ok, looks good so far. Does uniformity analysis correctly calculate if argument is uniform or divergent? If not it would be easier to use ballot or inverse ballot in argument lowering and normal copy

@arsenm
Copy link
Contributor Author

arsenm commented Sep 17, 2025

Ok, looks good so far. Does uniformity analysis correctly calculate if argument is uniform or divergent? If not it would be easier to use ballot or inverse ballot in argument lowering and normal copy

I'd need to check, but it's not important for the purposes of this patch. Right now I'm just trying to avoid breaking existing tests using these copies after I rip out one of the wavesize hacks supporting them

@arsenm arsenm merged commit 1dbb932 into main Sep 17, 2025
15 checks passed
@arsenm arsenm deleted the users/arsenm/global-isel/relax-verifier-copy-physreg-mismatched-types branch September 17, 2025 10:43
kimsh02 pushed a commit to kimsh02/llvm-project that referenced this pull request Sep 19, 2025
Accept mismatched register size and type size if the type is legal
for the register class.

For AMDGPU boolean registers have 2 possible interpretations depending
on the use context type. e.g., these are both equally valid:

  %0:_(s1) = COPY $vcc
  %1:_(s64) = COPY $vcc

vcc is a 64-bit register, which can be interpreted as a 1-bit or 64-bit
value depending on the use context. SelectionDAG has never required
exact
match between the register size and the used value type. You can assign
a type with a smaller size to a larger register class. Relax the
verifier
to match.  There are several hacks holding together these copies in
various places, and this is preparation to remove one of them.

The x86 test change is from what I would consider an X86 usage bug. X86
defines an FR32 register class and F16 register class, but the F16
register
class is functionally an alias of F32 with the same members and size.
There's
no need to have the F16 class.
SeongjaeP pushed a commit to SeongjaeP/llvm-project that referenced this pull request Sep 23, 2025
Accept mismatched register size and type size if the type is legal
for the register class.

For AMDGPU boolean registers have 2 possible interpretations depending
on the use context type. e.g., these are both equally valid:

  %0:_(s1) = COPY $vcc
  %1:_(s64) = COPY $vcc

vcc is a 64-bit register, which can be interpreted as a 1-bit or 64-bit
value depending on the use context. SelectionDAG has never required
exact
match between the register size and the used value type. You can assign
a type with a smaller size to a larger register class. Relax the
verifier
to match.  There are several hacks holding together these copies in
various places, and this is preparation to remove one of them.

The x86 test change is from what I would consider an X86 usage bug. X86
defines an FR32 register class and F16 register class, but the F16
register
class is functionally an alias of F32 with the same members and size.
There's
no need to have the F16 class.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants