Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86][APX] Avoid generating illegal MI_ND ndd instructions #78233

Closed
wants to merge 7 commits into from

Conversation

XinWang10
Copy link
Contributor

@XinWang10 XinWang10 commented Jan 16, 2024

The instruction-size limit of 15 bytes still applies to APX instructions.

Note that it is possible for an EVEX-encoded legacy instruction to reach the 15-byte instruction length limit: 4
bytes of EVEX prefix + 1 byte of opcode + 1 byte of ModRM + 1 byte of SIB + 4 bytes of displacement + 4
bytes of immediate = 15 bytes in total, e.g.

addq    $184, -96, %rax   # encoding: [0x62,0xf4,0xfc,0x18,0x81,0x04,0x25,0xa0,0xff,0xff,0xff,0xb8,0x00,0x00,0x00]

If we added a segment prefix like fs, the length would be 16.

In such a case, no additional (ASIZE or segment override) prefix can be used. Since this limit is reached only when there is a long immediate, compiler can first load the immediate into a register and then apply the desired prefix(es) to the shorter register-source version of the same instruction class.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 16, 2024

@llvm/pr-subscribers-backend-x86

Author: None (XinWang10)

Changes

After using evex prefix in APX instructions, some instructions like 'addq $184, -96, %rax', it is evex-encoding with 15 bytes long, '0x62,0xf4,0xfc,0x18,0x81,0x84,0x8b,0xd2,0x04,0x00,0x00,0xb8,0x00,0x00,0x00', if we set segment register for these instructions, they need segment prefix and exceed the maximum length of x86 target.
Hence this patch add judgement when do pattern match to avoid selecting to MI_ND version if segment prefixes required.
Also, check it when we do memfolding.
And add the missing test for and.


Patch is 40.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78233.diff

7 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelDAGToDAG.cpp (+35)
  • (modified) llvm/lib/Target/X86/X86InstrArithmetic.td (+4-4)
  • (modified) llvm/lib/Target/X86/X86InstrFragments.td (+3)
  • (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+35)
  • (modified) llvm/lib/Target/X86/X86InstrUtils.td (+2-2)
  • (added) llvm/test/CodeGen/X86/apx/and.ll (+595)
  • (added) llvm/test/CodeGen/X86/apx/invalid-ndd-encoding-with-segment-prefix.ll (+447)
diff --git a/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp b/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
index 53ce720be2da4cc..44d42ce929797b5 100644
--- a/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
@@ -222,6 +222,9 @@ namespace {
     bool selectAddr(SDNode *Parent, SDValue N, SDValue &Base,
                     SDValue &Scale, SDValue &Index, SDValue &Disp,
                     SDValue &Segment);
+    bool selectNoSegADDRAddr(SDNode *Parent, SDValue N, SDValue &Base,
+                             SDValue &Scale, SDValue &Index, SDValue &Disp,
+                             SDValue &Segment);
     bool selectVectorAddr(MemSDNode *Parent, SDValue BasePtr, SDValue IndexOp,
                           SDValue ScaleOp, SDValue &Base, SDValue &Scale,
                           SDValue &Index, SDValue &Disp, SDValue &Segment);
@@ -2923,6 +2926,38 @@ bool X86DAGToDAGISel::selectAddr(SDNode *Parent, SDValue N, SDValue &Base,
   return true;
 }
 
+/// Returns true if it is able to pattern match an addressing mode with no
+/// segment reg prefix. It returns the operands which make up the maximal
+/// addressing mode it can match by reference.
+bool X86DAGToDAGISel::selectNoSegADDRAddr(SDNode *Parent, SDValue N,
+                                          SDValue &Base, SDValue &Scale,
+                                          SDValue &Index, SDValue &Disp,
+                                          SDValue &Segment) {
+  X86ISelAddressMode AM;
+
+  if (Parent) {
+    unsigned AddrSpace =
+        cast<MemSDNode>(Parent)->getPointerInfo().getAddrSpace();
+    if (AddrSpace == X86AS::GS || AddrSpace == X86AS::FS ||
+        AddrSpace == X86AS::SS)
+      return false;
+  }
+
+  // Save the DL and VT before calling matchAddress, it can invalidate N.
+  SDLoc DL(N);
+  MVT VT = N.getSimpleValueType();
+
+  if (matchAddress(N, AM))
+    return false;
+
+  // TLS variable using FS prefix in linux.
+  if (AM.SymbolFlags == X86II::MO_TPOFF)
+    return false;
+
+  getAddressOperands(AM, DL, VT, Base, Scale, Index, Disp, Segment);
+  return true;
+}
+
 bool X86DAGToDAGISel::selectMOV64Imm32(SDValue N, SDValue &Imm) {
   // Cannot use 32 bit constants to reference objects in kernel code model.
   // Cannot use 32 bit constants to reference objects in large PIC mode since
diff --git a/llvm/lib/Target/X86/X86InstrArithmetic.td b/llvm/lib/Target/X86/X86InstrArithmetic.td
index 76b0fe5f5cad188..4bbae0af8115617 100644
--- a/llvm/lib/Target/X86/X86InstrArithmetic.td
+++ b/llvm/lib/Target/X86/X86InstrArithmetic.td
@@ -755,8 +755,8 @@ multiclass ArithBinOp_RF<bits<8> BaseOpc, bits<8> BaseOpc2, bits<8> BaseOpc4,
     def 64mi8_ND  : BinOpMI8_RF<mnemonic, Xi64, MemMRM>;
     def 8mi_ND    : BinOpMI_RF<0x80, mnemonic, Xi8 , opnode, MemMRM>;
     def 16mi_ND   : BinOpMI_RF<0x81, mnemonic, Xi16, opnode, MemMRM>, PD;
-    def 32mi_ND   : BinOpMI_RF<0x81, mnemonic, Xi32, opnode, MemMRM>;
-    def 64mi32_ND : BinOpMI_RF<0x81, mnemonic, Xi64, opnode, MemMRM>;
+    def 32mi_ND   : BinOpMI_RF<0x81, mnemonic, Xi32, opnode, MemMRM, nosegaddr>;
+    def 64mi32_ND : BinOpMI_RF<0x81, mnemonic, Xi64, opnode, MemMRM, nosegaddr>;
     def 16mi8_NF_ND  : BinOpMI8_R<mnemonic, Xi16, MemMRM>, NF, PD;
     def 32mi8_NF_ND  : BinOpMI8_R<mnemonic, Xi32, MemMRM>, NF;
     def 64mi8_NF_ND  : BinOpMI8_R<mnemonic, Xi64, MemMRM>, NF;
@@ -940,8 +940,8 @@ multiclass ArithBinOp_RFF<bits<8> BaseOpc, bits<8> BaseOpc2, bits<8> BaseOpc4,
     def 32mi8_ND  : BinOpMI8F_RF<mnemonic, Xi32, MemMRM>;
     def 64mi8_ND  : BinOpMI8F_RF<mnemonic, Xi64, MemMRM>;
     def 16mi_ND   : BinOpMIF_RF<0x81, mnemonic, Xi16, opnode, MemMRM>, PD;
-    def 32mi_ND   : BinOpMIF_RF<0x81, mnemonic, Xi32, opnode, MemMRM>;
-    def 64mi32_ND : BinOpMIF_RF<0x81, mnemonic, Xi64, opnode, MemMRM>;
+    def 32mi_ND   : BinOpMIF_RF<0x81, mnemonic, Xi32, opnode, MemMRM, nosegaddr>;
+    def 64mi32_ND : BinOpMIF_RF<0x81, mnemonic, Xi64, opnode, MemMRM, nosegaddr>;
   }
   let Predicates = [In64BitMode] in {
     def 8mi_EVEX    : BinOpMIF_MF<0x80, mnemonic, Xi8 , opnode, MemMRM>, PL;
diff --git a/llvm/lib/Target/X86/X86InstrFragments.td b/llvm/lib/Target/X86/X86InstrFragments.td
index adf527d72f5b43f..d3e847a3c6b9e36 100644
--- a/llvm/lib/Target/X86/X86InstrFragments.td
+++ b/llvm/lib/Target/X86/X86InstrFragments.td
@@ -362,6 +362,9 @@ def tls64addr : ComplexPattern<i64, 5, "selectTLSADDRAddr",
 def tls64baseaddr : ComplexPattern<i64, 5, "selectTLSADDRAddr",
                                [tglobaltlsaddr], []>;
 
+def nosegaddr : ComplexPattern<i64, 5, "selectNoSegADDRAddr",
+                               [], [SDNPWantParent]>;
+
 def vectoraddr : ComplexPattern<iPTR, 5, "selectVectorAddr", [],[SDNPWantParent]>;
 
 // A relocatable immediate is an operand that can be relocated by the linker to
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index bddda6891356e87..5ddd9fc6cc2ae1c 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -7251,6 +7251,41 @@ MachineInstr *X86InstrInfo::foldMemoryOperandImpl(
       MI.getOpcode() != X86::ADD64rr)
     return nullptr;
 
+  // MI_ND Instructions with 32 bit imm would exceed maximum code length if they
+  // need segment register prefix.
+  if (MOs.size() == X86::AddrNumOperands &&
+      (MOs[4].getReg() == X86::GS || MOs[4].getReg() == X86::FS ||
+       MOs[4].getReg() == X86::SS))
+    switch (MI.getOpcode()) {
+    default:
+      break;
+    case X86::ADD32ri_ND:
+    case X86::ADD64ri32_ND:
+    case X86::ADD32ri_NF_ND:
+    case X86::ADD64ri32_NF_ND:
+    case X86::SUB32ri_ND:
+    case X86::SUB64ri32_ND:
+    case X86::SUB32ri_NF_ND:
+    case X86::SUB64ri32_NF_ND:
+    case X86::AND32ri_ND:
+    case X86::AND64ri32_ND:
+    case X86::AND32ri_NF_ND:
+    case X86::AND64ri32_NF_ND:
+    case X86::XOR32ri_ND:
+    case X86::XOR64ri32_ND:
+    case X86::XOR32ri_NF_ND:
+    case X86::XOR64ri32_NF_ND:
+    case X86::OR32ri_ND:
+    case X86::OR64ri32_ND:
+    case X86::OR32ri_NF_ND:
+    case X86::OR64ri32_NF_ND:
+    case X86::ADC32ri_ND:
+    case X86::ADC64ri32_ND:
+    case X86::SBB32ri_ND:
+    case X86::SBB64ri32_ND:
+      return nullptr;
+    }
+
   // Don't fold loads into indirect calls that need a KCFI check as we'll
   // have to unfold these in X86TargetLowering::EmitKCFICheck anyway.
   if (MI.isCall() && MI.getCFIType())
diff --git a/llvm/lib/Target/X86/X86InstrUtils.td b/llvm/lib/Target/X86/X86InstrUtils.td
index f4ae15837fbf542..729b385b3767816 100644
--- a/llvm/lib/Target/X86/X86InstrUtils.td
+++ b/llvm/lib/Target/X86/X86InstrUtils.td
@@ -1182,7 +1182,7 @@ class BinOpMI_R<bits<8> o, string m, X86TypeInfo t, Format f>
     Sched<[WriteALU.Folded, WriteALU.ReadAfterFold]>, NDD<1>;
 // BinOpMI_R - Instructions that read "[mem], imm" and write "reg", EFLAGS.
 class BinOpMI_RF<bits<8> o, string m, X86TypeInfo t, SDPatternOperator node,
-                Format f>
+                Format f, ComplexPattern addr = addr>
   : BinOpMI<o, m, binop_ndd_args, t, f, (outs t.RegClass:$dst),
             [(set t.RegClass:$dst, EFLAGS, (node (t.LoadNode addr:$src1), t.ImmOperator:$src2))]>,
     Sched<[WriteALU.Folded, WriteALU.ReadAfterFold]>, DefEFLAGS, NDD<1>;
@@ -1201,7 +1201,7 @@ class BinOpMI_MF<bits<8> o, string m, X86TypeInfo t, SDPatternOperator node, For
 }
 // BinOpMIF_RF - Instructions that read "[mem], imm", write "reg" and
 // read/write EFLAGS.
-class BinOpMIF_RF<bits<8> o, string m, X86TypeInfo t, SDNode node, Format f>
+class BinOpMIF_RF<bits<8> o, string m, X86TypeInfo t, SDNode node, Format f, ComplexPattern addr = addr>
   : BinOpMI<o, m, binop_ndd_args, t, f, (outs t.RegClass:$dst),
             [(set t.RegClass:$dst, EFLAGS, (node (t.VT (load addr:$src1)),
              t.ImmOperator:$src2, EFLAGS))]>,
diff --git a/llvm/test/CodeGen/X86/apx/and.ll b/llvm/test/CodeGen/X86/apx/and.ll
new file mode 100644
index 000000000000000..2a8875d094ba25a
--- /dev/null
+++ b/llvm/test/CodeGen/X86/apx/and.ll
@@ -0,0 +1,595 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ndd -verify-machineinstrs | FileCheck %s
+
+define i8 @and8rr(i8 noundef %a, i8 noundef %b) {
+; CHECK-LABEL: and8rr:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl %esi, %edi, %eax
+; CHECK-NEXT:    # kill: def $al killed $al killed $eax
+; CHECK-NEXT:    retq
+entry:
+    %and = and i8 %a, %b
+    ret i8 %and
+}
+
+define i16 @and16rr(i16 noundef %a, i16 noundef %b) {
+; CHECK-LABEL: and16rr:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl %esi, %edi, %eax
+; CHECK-NEXT:    # kill: def $ax killed $ax killed $eax
+; CHECK-NEXT:    retq
+entry:
+    %and = and i16 %a, %b
+    ret i16 %and
+}
+
+define i32 @and32rr(i32 noundef %a, i32 noundef %b) {
+; CHECK-LABEL: and32rr:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl %esi, %edi, %eax
+; CHECK-NEXT:    retq
+entry:
+    %and = and i32 %a, %b
+    ret i32 %and
+}
+
+define i64 @and64rr(i64 noundef %a, i64 noundef %b) {
+; CHECK-LABEL: and64rr:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andq %rsi, %rdi, %rax
+; CHECK-NEXT:    retq
+entry:
+    %and = and i64 %a, %b
+    ret i64 %and
+}
+
+define i8 @and8rm(i8 noundef %a, ptr %b) {
+; CHECK-LABEL: and8rm:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andb (%rsi), %dil, %al
+; CHECK-NEXT:    retq
+entry:
+    %t = load i8, ptr %b
+    %and = and i8 %a, %t
+    ret i8 %and
+}
+
+define i16 @and16rm(i16 noundef %a, ptr %b) {
+; CHECK-LABEL: and16rm:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andw (%rsi), %di, %ax
+; CHECK-NEXT:    retq
+entry:
+    %t = load i16, ptr %b
+    %and = and i16 %a, %t
+    ret i16 %and
+}
+
+define i32 @and32rm(i32 noundef %a, ptr %b) {
+; CHECK-LABEL: and32rm:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl (%rsi), %edi, %eax
+; CHECK-NEXT:    retq
+entry:
+    %t = load i32, ptr %b
+    %and = and i32 %a, %t
+    ret i32 %and
+}
+
+define i64 @and64rm(i64 noundef %a, ptr %b) {
+; CHECK-LABEL: and64rm:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andq (%rsi), %rdi, %rax
+; CHECK-NEXT:    retq
+entry:
+    %t = load i64, ptr %b
+    %and = and i64 %a, %t
+    ret i64 %and
+}
+
+define i16 @and16ri8(i16 noundef %a) {
+; CHECK-LABEL: and16ri8:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl $123, %edi, %eax
+; CHECK-NEXT:    # kill: def $ax killed $ax killed $eax
+; CHECK-NEXT:    retq
+entry:
+    %and = and i16 %a, 123
+    ret i16 %and
+}
+
+define i32 @and32ri8(i32 noundef %a) {
+; CHECK-LABEL: and32ri8:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl $123, %edi, %eax
+; CHECK-NEXT:    retq
+entry:
+    %and = and i32 %a, 123
+    ret i32 %and
+}
+
+define i64 @and64ri8(i64 noundef %a) {
+; CHECK-LABEL: and64ri8:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl $123, %edi, %eax
+; CHECK-NEXT:    retq
+entry:
+    %and = and i64 %a, 123
+    ret i64 %and
+}
+
+define i8 @and8ri(i8 noundef %a) {
+; CHECK-LABEL: and8ri:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andb $123, %dil, %al
+; CHECK-NEXT:    retq
+entry:
+    %and = and i8 %a, 123
+    ret i8 %and
+}
+
+define i16 @and16ri(i16 noundef %a) {
+; CHECK-LABEL: and16ri:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl $1234, %edi, %eax # imm = 0x4D2
+; CHECK-NEXT:    # kill: def $ax killed $ax killed $eax
+; CHECK-NEXT:    retq
+entry:
+    %and = and i16 %a, 1234
+    ret i16 %and
+}
+
+define i32 @and32ri(i32 noundef %a) {
+; CHECK-LABEL: and32ri:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl $123456, %edi, %eax # imm = 0x1E240
+; CHECK-NEXT:    retq
+entry:
+    %and = and i32 %a, 123456
+    ret i32 %and
+}
+
+define i64 @and64ri(i64 noundef %a) {
+; CHECK-LABEL: and64ri:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl $123456, %edi, %eax # imm = 0x1E240
+; CHECK-NEXT:    retq
+entry:
+    %and = and i64 %a, 123456
+    ret i64 %and
+}
+
+define i8 @and8mr(ptr %a, i8 noundef %b) {
+; CHECK-LABEL: and8mr:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andb %sil, (%rdi), %al
+; CHECK-NEXT:    retq
+entry:
+  %t= load i8, ptr %a
+  %and = and i8 %t, %b
+  ret i8 %and
+}
+
+define i16 @and16mr(ptr %a, i16 noundef %b) {
+; CHECK-LABEL: and16mr:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andw %si, (%rdi), %ax
+; CHECK-NEXT:    retq
+entry:
+  %t= load i16, ptr %a
+  %and = and i16 %t, %b
+  ret i16 %and
+}
+
+define i32 @and32mr(ptr %a, i32 noundef %b) {
+; CHECK-LABEL: and32mr:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl %esi, (%rdi), %eax
+; CHECK-NEXT:    retq
+entry:
+  %t= load i32, ptr %a
+  %and = and i32 %t, %b
+  ret i32 %and
+}
+
+define i64 @and64mr(ptr %a, i64 noundef %b) {
+; CHECK-LABEL: and64mr:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andq %rsi, (%rdi), %rax
+; CHECK-NEXT:    retq
+entry:
+  %t= load i64, ptr %a
+  %and = and i64 %t, %b
+  ret i64 %and
+}
+
+define i16 @and16mi8(ptr %a) {
+; CHECK-LABEL: and16mi8:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    movzwl (%rdi), %eax
+; CHECK-NEXT:    andl $123, %eax, %eax
+; CHECK-NEXT:    # kill: def $ax killed $ax killed $eax
+; CHECK-NEXT:    retq
+entry:
+  %t= load i16, ptr %a
+  %and = and i16 %t, 123
+  ret i16 %and
+}
+
+define i32 @and32mi8(ptr %a) {
+; CHECK-LABEL: and32mi8:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl $123, (%rdi), %eax
+; CHECK-NEXT:    retq
+entry:
+  %t= load i32, ptr %a
+  %and = and i32 %t, 123
+  ret i32 %and
+}
+
+define i64 @and64mi8(ptr %a) {
+; CHECK-LABEL: and64mi8:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    movq (%rdi), %rax
+; CHECK-NEXT:    andl $123, %eax, %eax
+; CHECK-NEXT:    retq
+entry:
+  %t= load i64, ptr %a
+  %and = and i64 %t, 123
+  ret i64 %and
+}
+
+define i8 @and8mi(ptr %a) {
+; CHECK-LABEL: and8mi:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andb $123, (%rdi), %al
+; CHECK-NEXT:    retq
+entry:
+  %t= load i8, ptr %a
+  %and = and i8 %t, 123
+  ret i8 %and
+}
+
+define i16 @and16mi(ptr %a) {
+; CHECK-LABEL: and16mi:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    movzwl (%rdi), %eax
+; CHECK-NEXT:    andl $1234, %eax, %eax # imm = 0x4D2
+; CHECK-NEXT:    # kill: def $ax killed $ax killed $eax
+; CHECK-NEXT:    retq
+entry:
+  %t= load i16, ptr %a
+  %and = and i16 %t, 1234
+  ret i16 %and
+}
+
+define i32 @and32mi(ptr %a) {
+; CHECK-LABEL: and32mi:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    andl $123456, (%rdi), %eax # imm = 0x1E240
+; CHECK-NEXT:    retq
+entry:
+  %t= load i32, ptr %a
+  %and = and i32 %t, 123456
+  ret i32 %and
+}
+
+define i64 @and64mi(ptr %a) {
+; CHECK-LABEL: and64mi:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    movq (%rdi), %rax
+; CHECK-NEXT:    andl $123456, %eax, %eax # imm = 0x1E240
+; CHECK-NEXT:    retq
+entry:
+  %t= load i64, ptr %a
+  %and = and i64 %t, 123456
+  ret i64 %and
+}
+
+@d64 = dso_local global i64 0
+
+define i1 @andflag8rr(i8 %a, i8 %b) {
+; CHECK-LABEL: andflag8rr:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    notb %sil, %al
+; CHECK-NEXT:    andb %al, %dil, %cl
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movb %cl, d64(%rip)
+; CHECK-NEXT:    retq
+  %xor = xor i8 %b, -1
+  %v0 = and i8 %a, %xor  ; 0xff << 50
+  %v1 = icmp eq i8 %v0, 0
+  store i8 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag16rr(i16 %a, i16 %b) {
+; CHECK-LABEL: andflag16rr:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    notl %esi, %eax
+; CHECK-NEXT:    andw %ax, %di, %cx
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movw %cx, d64(%rip)
+; CHECK-NEXT:    retq
+  %xor = xor i16 %b, -1
+  %v0 = and i16 %a, %xor  ; 0xff << 50
+  %v1 = icmp eq i16 %v0, 0
+  store i16 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag32rr(i32 %a, i32 %b) {
+; CHECK-LABEL: andflag32rr:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    andl %esi, %edi, %ecx
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movl %ecx, d64(%rip)
+; CHECK-NEXT:    retq
+  %v0 = and i32 %a, %b  ; 0xff << 50
+  %v1 = icmp eq i32 %v0, 0
+  store i32 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag64rr(i64 %a, i64 %b) {
+; CHECK-LABEL: andflag64rr:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    andq %rsi, %rdi, %rcx
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movq %rcx, d64(%rip)
+; CHECK-NEXT:    retq
+  %v0 = and i64 %a, %b  ; 0xff << 50
+  %v1 = icmp eq i64 %v0, 0
+  store i64 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag8rm(ptr %ptr, i8 %b) {
+; CHECK-LABEL: andflag8rm:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    notb %sil, %al
+; CHECK-NEXT:    andb (%rdi), %al, %cl
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movb %cl, d64(%rip)
+; CHECK-NEXT:    retq
+  %a = load i8, ptr %ptr
+  %xor = xor i8 %b, -1
+  %v0 = and i8 %a, %xor  ; 0xff << 50
+  %v1 = icmp eq i8 %v0, 0
+  store i8 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag16rm(ptr %ptr, i16 %b) {
+; CHECK-LABEL: andflag16rm:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    notl %esi, %eax
+; CHECK-NEXT:    andw (%rdi), %ax, %cx
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movw %cx, d64(%rip)
+; CHECK-NEXT:    retq
+  %a = load i16, ptr %ptr
+  %xor = xor i16 %b, -1
+  %v0 = and i16 %a, %xor  ; 0xff << 50
+  %v1 = icmp eq i16 %v0, 0
+  store i16 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag32rm(ptr %ptr, i32 %b) {
+; CHECK-LABEL: andflag32rm:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    andl (%rdi), %esi, %ecx
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movl %ecx, d64(%rip)
+; CHECK-NEXT:    retq
+  %a = load i32, ptr %ptr
+  %v0 = and i32 %a, %b  ; 0xff << 50
+  %v1 = icmp eq i32 %v0, 0
+  store i32 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag64rm(ptr %ptr, i64 %b) {
+; CHECK-LABEL: andflag64rm:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    andq (%rdi), %rsi, %rcx
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movq %rcx, d64(%rip)
+; CHECK-NEXT:    retq
+  %a = load i64, ptr %ptr
+  %v0 = and i64 %a, %b  ; 0xff << 50
+  %v1 = icmp eq i64 %v0, 0
+  store i64 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag8ri(i8 %a) {
+; CHECK-LABEL: andflag8ri:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    andb $-124, %dil, %cl
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movb %cl, d64(%rip)
+; CHECK-NEXT:    retq
+  %xor = xor i8 123, -1
+  %v0 = and i8 %a, %xor  ; 0xff << 50
+  %v1 = icmp eq i8 %v0, 0
+  store i8 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag16ri(i16 %a) {
+; CHECK-LABEL: andflag16ri:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    andw $-1235, %di, %cx # imm = 0xFB2D
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movw %cx, d64(%rip)
+; CHECK-NEXT:    retq
+  %xor = xor i16 1234, -1
+  %v0 = and i16 %a, %xor  ; 0xff << 50
+  %v1 = icmp eq i16 %v0, 0
+  store i16 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag32ri(i32 %a) {
+; CHECK-LABEL: andflag32ri:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    andl $123456, %edi, %ecx # imm = 0x1E240
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movl %ecx, d64(%rip)
+; CHECK-NEXT:    retq
+  %v0 = and i32 %a, 123456  ; 0xff << 50
+  %v1 = icmp eq i32 %v0, 0
+  store i32 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag64ri(i64 %a) {
+; CHECK-LABEL: andflag64ri:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    andq $123456, %rdi, %rcx # imm = 0x1E240
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movq %rcx, d64(%rip)
+; CHECK-NEXT:    retq
+  %v0 = and i64 %a, 123456  ; 0xff << 50
+  %v1 = icmp eq i64 %v0, 0
+  store i64 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag16ri8(i16 %a) {
+; CHECK-LABEL: andflag16ri8:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    andw $-124, %di, %cx
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movw %cx, d64(%rip)
+; CHECK-NEXT:    retq
+  %xor = xor i16 123, -1
+  %v0 = and i16 %a, %xor  ; 0xff << 50
+  %v1 = icmp eq i16 %v0, 0
+  store i16 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag32ri8(i32 %a) {
+; CHECK-LABEL: andflag32ri8:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    andl $123, %edi, %ecx
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movl %ecx, d64(%rip)
+; CHECK-NEXT:    retq
+  %v0 = and i32 %a, 123  ; 0xff << 50
+  %v1 = icmp eq i32 %v0, 0
+  store i32 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define i1 @andflag64ri8(i64 %a) {
+; CHECK-LABEL: andflag64ri8:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    andq $123, %rdi, %rcx
+; CHECK-NEXT:    sete %al
+; CHECK-NEXT:    movq %rcx, d64(%rip)
+; CHECK-NEXT:    retq
+  %v0 = and i64 %a, 123  ; 0xff << 50
+  %v1 = icmp eq i64 %v0, 0
+  store i64 %v0, ptr @d64
+  ret i1 %v1
+}
+
+define void @and8mr_legacy(ptr %a, i8...
[truncated]

@XinWang10 XinWang10 changed the title [X86] Avoid generating illegal MI_ND ndd instructions [X86][APX] Avoid generating illegal MI_ND ndd instructions Jan 16, 2024
Comment on lines +7256 to +7286
if (MOs.size() == X86::AddrNumOperands &&
(MOs[4].getReg() == X86::GS || MOs[4].getReg() == X86::FS ||
MOs[4].getReg() == X86::SS))
switch (MI.getOpcode()) {
default:
break;
case X86::ADD32ri_ND:
case X86::ADD64ri32_ND:
case X86::ADD32ri_NF_ND:
case X86::ADD64ri32_NF_ND:
case X86::SUB32ri_ND:
case X86::SUB64ri32_ND:
case X86::SUB32ri_NF_ND:
case X86::SUB64ri32_NF_ND:
case X86::AND32ri_ND:
case X86::AND64ri32_ND:
case X86::AND32ri_NF_ND:
case X86::AND64ri32_NF_ND:
case X86::XOR32ri_ND:
case X86::XOR64ri32_ND:
case X86::XOR32ri_NF_ND:
case X86::XOR64ri32_NF_ND:
case X86::OR32ri_ND:
case X86::OR64ri32_ND:
case X86::OR32ri_NF_ND:
case X86::OR64ri32_NF_ND:
case X86::ADC32ri_ND:
case X86::ADC64ri32_ND:
case X86::SBB32ri_ND:
case X86::SBB64ri32_ND:
return nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tests this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all tested, ADC and SUB tests could test this, they generate MI_ND from mem fold.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not got your idea. Do you mean there is no any test for this yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean ADC and SBB test like adc32mi_FS sbb32mi_FS could test this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to test all the opcodes here. And add string like "_fold" to function name to indicate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ndd -verify-machineinstrs | FileCheck %s

define i8 @and8rr(i8 noundef %a, i8 noundef %b) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a separate PR for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Comment on lines +2941 to +2943
if (AddrSpace == X86AS::GS || AddrSpace == X86AS::FS ||
AddrSpace == X86AS::SS)
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we not check other segment prefixes here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing, I didn't find reproducer for others so far, let me try again.

Comment on lines +943 to +944
def 32mi_ND : BinOpMIF_RF<0x81, mnemonic, Xi32, opnode, MemMRM, nosegaddr>;
def 64mi32_ND : BinOpMIF_RF<0x81, mnemonic, Xi64, opnode, MemMRM, nosegaddr>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we always select mi variant and compress it to mi8 if possible. This implementation prevents mi32_ND being used for segment address even when the immediate is 8-bit. Namely, mi8_ND can not be used for segment address either, which does not make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May add condition for imm size.

@XinWang10 XinWang10 marked this pull request as draft January 17, 2024 02:24
@KanRobert
Copy link
Contributor

The instruction-size limit of 15 bytes still applies to APX instructions.

Note that it is possible for an EVEX-encoded legacy instruction to reach the 15-byte instruction length limit: 4 bytes of EVEX prefix + 1 byte of opcode + 1 byte of ModRM + 1 byte of SIB + 4 bytes of displacement + 4 bytes of immediate = 15 bytes in total, e.g.

addq    $184, -96, %rax   # encoding: [0x62,0xf4,0xfc,0x18,0x81,0x04,0x25,0xa0,0xff,0xff,0xff,0xb8,0x00,0x00,0x00]

If we added a segment prefix like fs, the length would be 16.

In such a case, no additional (ASIZE or segment override) prefix can be used. Since this limit is reached only when there is a long immediate, compiler can first load the immediate into a register and then apply the desired prefix(es) to the shorter register-source version of the same instruction class.

@XinWang10 Updated the description for you. In the expand-pseudo solution (discussed offline), you also need to consider the address size prefix.

@KanRobert
Copy link
Contributor

Conditional compare/test instructions also have same issues about instruction length limit. I will send a RFC for possible solutions one or two weeks later.

We found this bug when building Node.js with APX features. Such illegal instruction was not emitted when building SPEC17. Maybe we don't need to rush to fix it.

@KanRobert
Copy link
Contributor

@KanRobert KanRobert closed this Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants