[X86][AVX512] Add pseudos for `AVX512_*_SETALLONES` #169009

abhishek-kaushik22 · 2025-11-21T07:07:28Z

Introduce AVX512_128_SETALLONES, AVX512_256_SETALLONES pseudos to generate all-ones vectors.

Post-RA expansion:

Use VEX vpcmpeqd for XMM/YMM0–15 when available (matches current codegen as AVX512_128/256_SETALLONES will be preferred over AVX1/2_SETALLONES for AVX512VL target).
Use EVEX vpternlogd imm=0xFF for high regs.

Includes MIR tests for both VEX and EVEX paths.

Introduce AVX512_128_SETALLONES, AVX512_256_SETALLONES pseudos to generate all-ones vectors. Post-RA expansion: Use VEX vpcmpeqd for XMM/YMM0–15 when available (matches current codegen as `AVX512_128/256_SETALLONES` will be preferred over `AVX1/2_SETALLONES` for AVX512VL target). Use EVEX `vpternlogd imm=0xFF` for high regs. Includes MIR tests for both VEX and EVEX paths.

llvmbot · 2025-11-21T07:08:01Z

@llvm/pr-subscribers-backend-x86

Author: Abhishek Kaushik (abhishek-kaushik22)

Changes

Introduce AVX512_128_SETALLONES, AVX512_256_SETALLONES pseudos to generate all-ones vectors.

Post-RA expansion:

Use VEX vpcmpeqd for XMM/YMM0–15 when available (matches current codegen as AVX512_128/256_SETALLONES will be preferred over AVX1/2_SETALLONES for AVX512VL target).
Use EVEX vpternlogd imm=0xFF for high regs.

Includes MIR tests for both VEX and EVEX paths.

Full diff: https://github.com/llvm/llvm-project/pull/169009.diff

4 Files Affected:

(modified) llvm/lib/Target/X86/X86InstrAVX512.td (+6)
(modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+31-1)
(added) llvm/test/CodeGen/X86/avx512-setallones-pseudo.mir (+30)
(modified) llvm/test/CodeGen/X86/eq-or-eq-range-of-2.ll (+2-1)

diff --git a/llvm/lib/Target/X86/X86InstrAVX512.td b/llvm/lib/Target/X86/X86InstrAVX512.td
index 1b748b7355716..efb9f1309a528 100644
--- a/llvm/lib/Target/X86/X86InstrAVX512.td
+++ b/llvm/lib/Target/X86/X86InstrAVX512.td
@@ -300,6 +300,12 @@ def AVX512_512_SET0 : I<0, Pseudo, (outs VR512:$dst), (ins), "",
                [(set VR512:$dst, (v16i32 immAllZerosV))]>;
 def AVX512_512_SETALLONES : I<0, Pseudo, (outs VR512:$dst), (ins), "",
                [(set VR512:$dst, (v16i32 immAllOnesV))]>;
+let AddedComplexity = 1, Predicates = [HasVLX] in {
+  def AVX512_128_SETALLONES : I<0, Pseudo, (outs VR128X:$dst), (ins),
+                                "", [(set VR128X:$dst, (v4i32 immAllOnesV))]>;
+  def AVX512_256_SETALLONES : I<0, Pseudo, (outs VR256X:$dst), (ins),
+                                "", [(set VR256X:$dst, (v8i32 immAllOnesV))]>;
+}
 }
 
 let Predicates = [HasAVX512] in {
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 5c23f917d0530..3136ad36ca5cb 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -779,6 +779,8 @@ bool X86InstrInfo::isReMaterializableImpl(
   case X86::AVX512_128_SET0:
   case X86::AVX512_256_SET0:
   case X86::AVX512_512_SET0:
+  case X86::AVX512_128_SETALLONES:
+  case X86::AVX512_256_SETALLONES:
   case X86::AVX512_512_SETALLONES:
   case X86::AVX512_FsFLD0SD:
   case X86::AVX512_FsFLD0SH:
@@ -6253,9 +6255,31 @@ bool X86InstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
     MIB.addReg(Reg, RegState::Undef).addReg(Reg, RegState::Undef).addImm(0xf);
     return true;
   }
+  case X86::AVX512_128_SETALLONES:
+  case X86::AVX512_256_SETALLONES:
   case X86::AVX512_512_SETALLONES: {
     Register Reg = MIB.getReg(0);
-    MIB->setDesc(get(X86::VPTERNLOGDZrri));
+    unsigned Opc;
+    switch (MI.getOpcode()) {
+    case X86::AVX512_128_SETALLONES: {
+      if (X86::VR128RegClass.contains(Reg))
+        return Expand2AddrUndef(MIB, get(X86::VPCMPEQDrr));
+
+      Opc = X86::VPTERNLOGDZ128rri;
+      break;
+    }
+    case X86::AVX512_256_SETALLONES: {
+      if (X86::VR256RegClass.contains(Reg))
+        return Expand2AddrUndef(MIB, get(X86::VPCMPEQDYrr));
+
+      Opc = X86::VPTERNLOGDZ256rri;
+      break;
+    }
+    case X86::AVX512_512_SETALLONES:
+      Opc = X86::VPTERNLOGDZrri;
+      break;
+    }
+    MIB->setDesc(get(Opc));
     // VPTERNLOGD needs 3 register inputs and an immediate.
     // 0xff will return 1s for any input.
     MIB.addReg(Reg, RegState::Undef)
@@ -8194,6 +8218,7 @@ MachineInstr *X86InstrInfo::foldMemoryOperandImpl(
     case X86::AVX1_SETALLONES:
     case X86::AVX_SET0:
     case X86::AVX512_256_SET0:
+    case X86::AVX512_256_SETALLONES:
       Alignment = Align(32);
       break;
     case X86::V_SET0:
@@ -8201,6 +8226,7 @@ MachineInstr *X86InstrInfo::foldMemoryOperandImpl(
     case X86::AVX512_128_SET0:
     case X86::FsFLD0F128:
     case X86::AVX512_FsFLD0F128:
+    case X86::AVX512_128_SETALLONES:
       Alignment = Align(16);
       break;
     case X86::MMX_SET0:
@@ -8259,6 +8285,8 @@ MachineInstr *X86InstrInfo::foldMemoryOperandImpl(
   case X86::AVX512_128_SET0:
   case X86::AVX512_256_SET0:
   case X86::AVX512_512_SET0:
+  case X86::AVX512_128_SETALLONES:
+  case X86::AVX512_256_SETALLONES:
   case X86::AVX512_512_SETALLONES:
   case X86::FsFLD0SH:
   case X86::AVX512_FsFLD0SH:
@@ -8319,6 +8347,7 @@ MachineInstr *X86InstrInfo::foldMemoryOperandImpl(
       break;
     case X86::AVX1_SETALLONES:
     case X86::AVX2_SETALLONES:
+    case X86::AVX512_256_SETALLONES:
       IsAllOnes = true;
       [[fallthrough]];
     case X86::AVX512_256_SET0:
@@ -8332,6 +8361,7 @@ MachineInstr *X86InstrInfo::foldMemoryOperandImpl(
                                 2);
       break;
     case X86::V_SETALLONES:
+    case X86::AVX512_128_SETALLONES:
       IsAllOnes = true;
       [[fallthrough]];
     case X86::V_SET0:
diff --git a/llvm/test/CodeGen/X86/avx512-setallones-pseudo.mir b/llvm/test/CodeGen/X86/avx512-setallones-pseudo.mir
new file mode 100644
index 0000000000000..7e5ddc4cd632f
--- /dev/null
+++ b/llvm/test/CodeGen/X86/avx512-setallones-pseudo.mir
@@ -0,0 +1,30 @@
+# NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+# RUN: llc %s -mtriple=x86_64-- -start-before=postrapseudos -o - | FileCheck %s
+
+--- |
+  target triple = "x86_64-unknown-unknown"
+
+  define void @setallones() #0 {
+  ; CHECK-LABEL: setallones:
+  ; CHECK:       # %bb.0:
+  ; CHECK-NEXT:    vpcmpeqd %xmm14, %xmm14, %xmm14
+  ; CHECK-NEXT:    vpternlogd {{.*#+}} xmm16 = -1
+  ; CHECK-NEXT:    vpcmpeqd %ymm15, %ymm15, %ymm15
+  ; CHECK-NEXT:    vpternlogd {{.*#+}} ymm17 = -1
+  entry:
+  unreachable
+  }
+
+  attributes #0 = { "target-features"="+avx512f,+avx512vl" }
+---
+name:            setallones
+tracksRegLiveness: true
+liveins: []
+body:             |
+  bb.0:
+    $xmm14 = AVX512_128_SETALLONES
+    $xmm16 = AVX512_128_SETALLONES
+    $ymm15 = AVX512_256_SETALLONES
+    $ymm17 = AVX512_256_SETALLONES
+
+...
diff --git a/llvm/test/CodeGen/X86/eq-or-eq-range-of-2.ll b/llvm/test/CodeGen/X86/eq-or-eq-range-of-2.ll
index 3243d950740ca..e2400fbe2c4ff 100644
--- a/llvm/test/CodeGen/X86/eq-or-eq-range-of-2.ll
+++ b/llvm/test/CodeGen/X86/eq-or-eq-range-of-2.ll
@@ -106,7 +106,8 @@ define <4 x i32> @eq_or_eq_ult_2_fail_multiuse(<4 x i32> %x) {
 ; AVX512:       # %bb.0:
 ; AVX512-NEXT:    subq $24, %rsp
 ; AVX512-NEXT:    .cfi_def_cfa_offset 32
-; AVX512-NEXT:    vpaddd {{\.?LCPI[0-9]+_[0-9]+}}(%rip){1to4}, %xmm0, %xmm0
+; AVX512-NEXT:    vpcmpeqd %xmm1, %xmm1, %xmm1
+; AVX512-NEXT:    vpaddd %xmm1, %xmm0, %xmm0
 ; AVX512-NEXT:    vmovdqa %xmm0, (%rsp) # 16-byte Spill
 ; AVX512-NEXT:    callq use.v4.i32@PLT
 ; AVX512-NEXT:    vmovdqa (%rsp), %xmm0 # 16-byte Reload

github-actions · 2025-11-21T07:52:15Z

🐧 Linux x64 Test Results

186500 tests passed
4873 tests skipped

phoebewang · 2025-11-21T08:26:17Z

llvm/lib/Target/X86/X86InstrAVX512.td

Add isReMaterializable = 1, isAsCheapAsAMove = 1, canFoldAsLoad = 1, isPseudo = 1, SchedRW = [WriteZero]

Why AddedComplexity = 1 needed here?

Without the AddedComplexity AVX2_SETALLONES is selected during isel instead of this one

RKSimon · 2025-11-21T10:46:12Z

Something we're struggling with is duplicated all ones/zeros expansions for different widths, I'm worried this could make this even worse if we don't address it.

abhishek-kaushik22 · 2025-11-21T11:18:39Z

Something we're struggling with is duplicated all ones/zeros expansions for different widths, I'm worried this could make this even worse if we don't address it.

The motivation behind this patch is this crash during regalloc https://godbolt.org/z/ern6PxqKj
When we try and rematerialize AVX2_SETALLONES we end up with an incorrect reg class (VR256X), we can constrain the destination register in X86InstrInfo::reMaterialize but that seems more like a hack than fixing the root cause which is we don't have a way to set all ones vector using avx512vl.

llvm/test/CodeGen/X86/avx512-setallones-pseudo.mir

RKSimon

LGTM

Introduce `AVX512_128_SETALLONES`, `AVX512_256_SETALLONES` pseudos to generate all-ones vectors. Post-RA expansion: - Use VEX vpcmpeqd for XMM/YMM0–15 when available (matches current codegen as `AVX512_128/256_SETALLONES` will be preferred over `AVX1/2_SETALLONES` for AVX512VL target). - Use EVEX `vpternlogd imm=0xFF` for high regs. Includes MIR tests for both VEX and EVEX paths.

abhishek-kaushik22 requested review from RKSimon, arsenm, e-kud and phoebewang November 21, 2025 07:07

llvmbot added the backend:X86 label Nov 21, 2025

phoebewang reviewed Nov 21, 2025

View reviewed changes

RKSimon requested changes Nov 22, 2025

View reviewed changes

llvm/test/CodeGen/X86/avx512-setallones-pseudo.mir Show resolved Hide resolved

Add 32-bit test

d235eed

abhishek-kaushik22 requested review from RKSimon and phoebewang November 24, 2025 07:19

Merge branch 'main' into setallones

e89dfa4

RKSimon approved these changes Nov 24, 2025

View reviewed changes

abhishek-kaushik22 merged commit d148407 into llvm:main Nov 24, 2025
10 checks passed

abhishek-kaushik22 deleted the setallones branch November 24, 2025 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86][AVX512] Add pseudos for `AVX512_*_SETALLONES` #169009

[X86][AVX512] Add pseudos for `AVX512_*_SETALLONES` #169009

abhishek-kaushik22 commented Nov 21, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

phoebewang Nov 21, 2025

Uh oh!

abhishek-kaushik22 Nov 21, 2025

Uh oh!

RKSimon commented Nov 21, 2025

Uh oh!

abhishek-kaushik22 commented Nov 21, 2025

Uh oh!

Uh oh!

RKSimon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[X86][AVX512] Add pseudos for AVX512_*_SETALLONES #169009

[X86][AVX512] Add pseudos for AVX512_*_SETALLONES #169009

Conversation

abhishek-kaushik22 commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

phoebewang Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

abhishek-kaushik22 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon commented Nov 21, 2025

Uh oh!

abhishek-kaushik22 commented Nov 21, 2025

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[X86][AVX512] Add pseudos for `AVX512_*_SETALLONES` #169009

[X86][AVX512] Add pseudos for `AVX512_*_SETALLONES` #169009

abhishek-kaushik22 commented Nov 21, 2025 •

edited

Loading

github-actions bot commented Nov 21, 2025 •

edited

Loading