Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for x87 registers on GISel register selection #83528

Merged
merged 3 commits into from
Mar 5, 2024

Conversation

MalaySanghi
Copy link
Contributor

@MalaySanghi MalaySanghi commented Mar 1, 2024

We handle 3 register classes for x87 - rfp32, rfp64 and rfp80.

  1. X87 registers are assigned a pseudo register class. We need a new register bank for these classes.
  2. We add bank information and enums for these.
  3. Legalizer is updated to allow 32b and 64b even in absence of SSE. This is required because with SSE enabled, x86 doesn't use fp stack and instead uses SSE registers.
  4. Functions in X86RegisterBankInfo need to decide whether to use the pseudo classes or SSE-enabled classes. I add MachineInstr as an argument to static helper function getPartialMappingIdx to this end.

Add/Update tests.

We handle 3 register classes for x87 - rfp32, rfp64 and rfp80.
1. X87 registers are assigned a pseudo register class. We need a new
   register bank for these classes.
2. We add bank information and enums for these.
3. Legalizer is updated to allow 32b and 64b even in absence of SSE.
   This is required because with SSE enabled, x86 doesn't use fp stack
   and instead uses AVX registers.
4. Functions in X86RegisterBankInfo need to decide whether to use the
   pseudo classes or SSE-enabled classes. I addi MachineInstr as an
   argument to static helper function getPartialMappingIdx to this end.

Add/Update tests.
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 1, 2024

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-globalisel

Author: None (MalaySanghiIntel)

Changes

We handle 3 register classes for x87 - rfp32, rfp64 and rfp80.

  1. X87 registers are assigned a pseudo register class. We need a new register bank for these classes.
  2. We add bank information and enums for these.
  3. Legalizer is updated to allow 32b and 64b even in absence of SSE. This is required because with SSE enabled, x86 doesn't use fp stack and instead uses AVX registers.
  4. Functions in X86RegisterBankInfo need to decide whether to use the pseudo classes or SSE-enabled classes. I add MachineInstr as an argument to static helper function getPartialMappingIdx to this end.

Add/Update tests.


Patch is 24.05 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83528.diff

7 Files Affected:

  • (modified) llvm/lib/Target/X86/GISel/X86LegalizerInfo.cpp (+8-5)
  • (modified) llvm/lib/Target/X86/GISel/X86RegisterBankInfo.cpp (+28-12)
  • (modified) llvm/lib/Target/X86/GISel/X86RegisterBankInfo.h (+2-1)
  • (modified) llvm/lib/Target/X86/X86GenRegisterBankInfo.def (+15-2)
  • (modified) llvm/lib/Target/X86/X86RegisterBanks.td (+3)
  • (added) llvm/test/CodeGen/X86/GlobalISel/regbankselect-x87.ll (+237)
  • (modified) llvm/test/CodeGen/X86/GlobalISel/x86_64-fallback.ll (+1-1)
diff --git a/llvm/lib/Target/X86/GISel/X86LegalizerInfo.cpp b/llvm/lib/Target/X86/GISel/X86LegalizerInfo.cpp
index 27381dff338e2d..2e33adaed7a847 100644
--- a/llvm/lib/Target/X86/GISel/X86LegalizerInfo.cpp
+++ b/llvm/lib/Target/X86/GISel/X86LegalizerInfo.cpp
@@ -39,6 +39,7 @@ X86LegalizerInfo::X86LegalizerInfo(const X86Subtarget &STI,
   bool HasVLX = Subtarget.hasVLX();
   bool HasDQI = Subtarget.hasAVX512() && Subtarget.hasDQI();
   bool HasBWI = Subtarget.hasAVX512() && Subtarget.hasBWI();
+  bool UseX87 = !Subtarget.useSoftFloat() && Subtarget.hasX87();
 
   const LLT p0 = LLT::pointer(0, TM.getPointerSizeInBits(0));
   const LLT s1 = LLT::scalar(1);
@@ -415,17 +416,19 @@ X86LegalizerInfo::X86LegalizerInfo(const X86Subtarget &STI,
   // fp constants
   getActionDefinitionsBuilder(G_FCONSTANT)
       .legalIf([=](const LegalityQuery &Query) -> bool {
-        return (HasSSE1 && typeInSet(0, {s32})(Query)) ||
-               (HasSSE2 && typeInSet(0, {s64})(Query));
+        return (typeInSet(0, {s32, s64})(Query)) ||
+               (UseX87 && typeInSet(0, {s80})(Query));
       });
 
   // fp arithmetic
   getActionDefinitionsBuilder({G_FADD, G_FSUB, G_FMUL, G_FDIV})
       .legalIf([=](const LegalityQuery &Query) {
-        return (HasSSE1 && typeInSet(0, {s32, v4s32})(Query)) ||
-               (HasSSE2 && typeInSet(0, {s64, v2s64})(Query)) ||
+        return (typeInSet(0, {s32, s64})(Query)) ||
+               (HasSSE1 && typeInSet(0, {v4s32})(Query)) ||
+               (HasSSE2 && typeInSet(0, {v2s64})(Query)) ||
                (HasAVX && typeInSet(0, {v8s32, v4s64})(Query)) ||
-               (HasAVX512 && typeInSet(0, {v16s32, v8s64})(Query));
+               (HasAVX512 && typeInSet(0, {v16s32, v8s64})(Query)) ||
+               (UseX87 && typeInSet(0, {s80})(Query));
       });
 
   // fp comparison
diff --git a/llvm/lib/Target/X86/GISel/X86RegisterBankInfo.cpp b/llvm/lib/Target/X86/GISel/X86RegisterBankInfo.cpp
index 72828f961f9317..ad6fb8a601315a 100644
--- a/llvm/lib/Target/X86/GISel/X86RegisterBankInfo.cpp
+++ b/llvm/lib/Target/X86/GISel/X86RegisterBankInfo.cpp
@@ -12,6 +12,7 @@
 
 #include "X86RegisterBankInfo.h"
 #include "X86InstrInfo.h"
+#include "X86Subtarget.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/RegisterBank.h"
 #include "llvm/CodeGen/RegisterBankInfo.h"
@@ -59,11 +60,24 @@ X86RegisterBankInfo::getRegBankFromRegClass(const TargetRegisterClass &RC,
       X86::VR512RegClass.hasSubClassEq(&RC))
     return getRegBank(X86::VECRRegBankID);
 
+  if (X86::RFP80RegClass.hasSubClassEq(&RC) ||
+      X86::RFP32RegClass.hasSubClassEq(&RC) ||
+      X86::RFP64RegClass.hasSubClassEq(&RC))
+    return getRegBank(X86::PSRRegBankID);
+
   llvm_unreachable("Unsupported register kind yet.");
 }
 
 X86GenRegisterBankInfo::PartialMappingIdx
-X86GenRegisterBankInfo::getPartialMappingIdx(const LLT &Ty, bool isFP) {
+X86GenRegisterBankInfo::getPartialMappingIdx(const MachineInstr &MI,
+                                             const LLT &Ty, bool isFP) {
+  const MachineFunction *MF = MI.getMF();
+  const X86Subtarget *ST = &MF->getSubtarget<X86Subtarget>();
+  bool HasSSE1 = ST->hasSSE1();
+  bool HasSSE2 = ST->hasSSE2();
+  // 80 bits is only generated for X87 floating points.
+  if (Ty.getSizeInBits() == 80)
+    isFP = true;
   if ((Ty.isScalar() && !isFP) || Ty.isPointer()) {
     switch (Ty.getSizeInBits()) {
     case 1:
@@ -84,11 +98,13 @@ X86GenRegisterBankInfo::getPartialMappingIdx(const LLT &Ty, bool isFP) {
   } else if (Ty.isScalar()) {
     switch (Ty.getSizeInBits()) {
     case 32:
-      return PMI_FP32;
+      return HasSSE1 ? PMI_FP32 : PMI_PSR32;
     case 64:
-      return PMI_FP64;
+      return HasSSE2 ? PMI_FP64 : PMI_PSR64;
     case 128:
       return PMI_VEC128;
+    case 80:
+      return PMI_PSR80;
     default:
       llvm_unreachable("Unsupported register size.");
     }
@@ -118,7 +134,8 @@ void X86RegisterBankInfo::getInstrPartialMappingIdxs(
     if (!MO.isReg() || !MO.getReg())
       OpRegBankIdx[Idx] = PMI_None;
     else
-      OpRegBankIdx[Idx] = getPartialMappingIdx(MRI.getType(MO.getReg()), isFP);
+      OpRegBankIdx[Idx] =
+          getPartialMappingIdx(MI, MRI.getType(MO.getReg()), isFP);
   }
 }
 
@@ -156,7 +173,7 @@ X86RegisterBankInfo::getSameOperandsMapping(const MachineInstr &MI,
       (Ty != MRI.getType(MI.getOperand(2).getReg())))
     llvm_unreachable("Unsupported operand mapping yet.");
 
-  auto Mapping = getValueMapping(getPartialMappingIdx(Ty, isFP), 3);
+  auto Mapping = getValueMapping(getPartialMappingIdx(MI, Ty, isFP), 3);
   return getInstructionMapping(DefaultMappingID, 1, Mapping, NumOperands);
 }
 
@@ -190,9 +207,8 @@ X86RegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
     unsigned NumOperands = MI.getNumOperands();
     LLT Ty = MRI.getType(MI.getOperand(0).getReg());
 
-    auto Mapping = getValueMapping(getPartialMappingIdx(Ty, false), 3);
+    auto Mapping = getValueMapping(getPartialMappingIdx(MI, Ty, false), 3);
     return getInstructionMapping(DefaultMappingID, 1, Mapping, NumOperands);
-
   }
   default:
     break;
@@ -219,8 +235,8 @@ X86RegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
 
     bool FirstArgIsFP = Opc == TargetOpcode::G_SITOFP;
     bool SecondArgIsFP = Opc == TargetOpcode::G_FPTOSI;
-    OpRegBankIdx[0] = getPartialMappingIdx(Ty0, /* isFP */ FirstArgIsFP);
-    OpRegBankIdx[1] = getPartialMappingIdx(Ty1, /* isFP */ SecondArgIsFP);
+    OpRegBankIdx[0] = getPartialMappingIdx(MI, Ty0, /* isFP */ FirstArgIsFP);
+    OpRegBankIdx[1] = getPartialMappingIdx(MI, Ty1, /* isFP */ SecondArgIsFP);
     break;
   }
   case TargetOpcode::G_FCMP: {
@@ -234,7 +250,7 @@ X86RegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
     (void)Size;
     assert((Size == 32 || Size == 64) && "Unsupported size for G_FCMP");
 
-    auto FpRegBank = getPartialMappingIdx(Ty1, /* isFP */ true);
+    auto FpRegBank = getPartialMappingIdx(MI, Ty1, /* isFP */ true);
     OpRegBankIdx = {PMI_GPR8,
                     /* Predicate */ PMI_None, FpRegBank, FpRegBank};
     break;
@@ -288,9 +304,9 @@ X86RegisterBankInfo::getInstrAlternativeMappings(const MachineInstr &MI) const {
   case TargetOpcode::G_LOAD:
   case TargetOpcode::G_STORE:
   case TargetOpcode::G_IMPLICIT_DEF: {
-    // we going to try to map 32/64 bit to PMI_FP32/PMI_FP64
+    // we going to try to map 32/64/80 bit to PMI_FP32/PMI_FP64/PMI_FP80
     unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, TRI);
-    if (Size != 32 && Size != 64)
+    if (Size != 32 && Size != 64 && Size != 80)
       break;
 
     unsigned NumOperands = MI.getNumOperands();
diff --git a/llvm/lib/Target/X86/GISel/X86RegisterBankInfo.h b/llvm/lib/Target/X86/GISel/X86RegisterBankInfo.h
index 9a4e23d8b34d54..989c5956ad5917 100644
--- a/llvm/lib/Target/X86/GISel/X86RegisterBankInfo.h
+++ b/llvm/lib/Target/X86/GISel/X86RegisterBankInfo.h
@@ -32,7 +32,8 @@ class X86GenRegisterBankInfo : public RegisterBankInfo {
   static RegisterBankInfo::PartialMapping PartMappings[];
   static RegisterBankInfo::ValueMapping ValMappings[];
 
-  static PartialMappingIdx getPartialMappingIdx(const LLT &Ty, bool isFP);
+  static PartialMappingIdx getPartialMappingIdx(const MachineInstr &MI,
+                                                const LLT &Ty, bool isFP);
   static const RegisterBankInfo::ValueMapping *
   getValueMapping(PartialMappingIdx Idx, unsigned NumOperands);
 };
diff --git a/llvm/lib/Target/X86/X86GenRegisterBankInfo.def b/llvm/lib/Target/X86/X86GenRegisterBankInfo.def
index 0fdea9071c290d..11a28806710d74 100644
--- a/llvm/lib/Target/X86/X86GenRegisterBankInfo.def
+++ b/llvm/lib/Target/X86/X86GenRegisterBankInfo.def
@@ -25,6 +25,10 @@ RegisterBankInfo::PartialMapping X86GenRegisterBankInfo::PartMappings[]{
     {0, 128, X86::VECRRegBank}, // :6
     {0, 256, X86::VECRRegBank}, // :7
     {0, 512, X86::VECRRegBank}, // :8   
+    // RFP32/64/80
+    {0, 32, X86::PSRRegBank},   // :9
+    {0, 64, X86::PSRRegBank},   // :10
+    {0, 80, X86::PSRRegBank},   // :11
 };
 #endif // GET_TARGET_REGBANK_INFO_IMPL
 
@@ -39,7 +43,10 @@ enum PartialMappingIdx {
   PMI_FP64,
   PMI_VEC128,
   PMI_VEC256,
-  PMI_VEC512
+  PMI_VEC512,
+  PMI_PSR32,
+  PMI_PSR64,
+  PMI_PSR80
 };
 #endif // GET_TARGET_REGBANK_INFO_CLASS
 
@@ -61,6 +68,9 @@ RegisterBankInfo::ValueMapping X86GenRegisterBankInfo::ValMappings[]{
     INSTR_3OP(BREAKDOWN(PMI_VEC128, 1)) // 18: Vec128
     INSTR_3OP(BREAKDOWN(PMI_VEC256, 1)) // 21: Vec256
     INSTR_3OP(BREAKDOWN(PMI_VEC512, 1)) // 24: Vec512    
+    INSTR_3OP(BREAKDOWN(PMI_PSR32, 1))  // 25: Rfp32
+    INSTR_3OP(BREAKDOWN(PMI_PSR64, 1))  // 26: Rfp64
+    INSTR_3OP(BREAKDOWN(PMI_PSR80, 1))  // 27: Rfp80
 };
 #undef INSTR_3OP
 #undef BREAKDOWN
@@ -78,6 +88,9 @@ enum ValueMappingIdx {
   VMI_3OpsVec128Idx = PMI_VEC128 * 3,
   VMI_3OpsVec256Idx = PMI_VEC256 * 3,
   VMI_3OpsVec512Idx = PMI_VEC512 * 3,
+  VMI_3OpsPs32Idx = PMI_PSR32 * 3,
+  VMI_3OpsPs64Idx = PMI_PSR64 * 3,
+  VMI_3OpsPs80Idx = PMI_PSR80 * 3,
 };
 #undef GET_TARGET_REGBANK_INFO_CLASS
 #endif // GET_TARGET_REGBANK_INFO_CLASS
@@ -89,7 +102,7 @@ X86GenRegisterBankInfo::getValueMapping(PartialMappingIdx Idx,
                                         unsigned NumOperands) {
   
   // We can use VMI_3Ops Mapping for all the cases.
-  if (NumOperands <= 3 && (Idx >= PMI_GPR8 && Idx <= PMI_VEC512))
+  if (NumOperands <= 3 && (Idx >= PMI_GPR8 && Idx <= PMI_PSR80))
     return &ValMappings[(unsigned)Idx * 3];
   
   llvm_unreachable("Unsupported PartialMappingIdx.");
diff --git a/llvm/lib/Target/X86/X86RegisterBanks.td b/llvm/lib/Target/X86/X86RegisterBanks.td
index 91a49725259543..05aa880bc6cb28 100644
--- a/llvm/lib/Target/X86/X86RegisterBanks.td
+++ b/llvm/lib/Target/X86/X86RegisterBanks.td
@@ -14,3 +14,6 @@ def GPRRegBank : RegisterBank<"GPR", [GR64]>;
 
 /// Floating Point/Vector Registers
 def VECRRegBank : RegisterBank<"VECR", [VR512]>;
+
+/// Pseudo Registers: RFP80
+def PSRRegBank : RegisterBank<"PSR", [RFP32, RFP64, RFP80]>;
diff --git a/llvm/test/CodeGen/X86/GlobalISel/regbankselect-x87.ll b/llvm/test/CodeGen/X86/GlobalISel/regbankselect-x87.ll
new file mode 100644
index 00000000000000..49343022f4c0ee
--- /dev/null
+++ b/llvm/test/CodeGen/X86/GlobalISel/regbankselect-x87.ll
@@ -0,0 +1,237 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+; RUN: llc < %s -mtriple=i686-- -mattr=+x87,-sse,-sse2 -global-isel -stop-after=regbankselect | FileCheck %s --check-prefix=X32
+; RUN: llc < %s -mtriple=x86_64-- -mattr=+x87,-sse,-sse2 -global-isel -stop-after=regbankselect | FileCheck %s --check-prefix=X64
+
+define x86_fp80 @f0(x86_fp80 noundef %a) {
+  ; X32-LABEL: name: f0
+  ; X32: bb.1.entry:
+  ; X32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.0
+  ; X32-NEXT:   [[LOAD:%[0-9]+]]:psr(s80) = G_LOAD [[FRAME_INDEX]](p0) :: (invariant load (s80) from %fixed-stack.0, align 4)
+  ; X32-NEXT:   [[C:%[0-9]+]]:psr(s80) = G_FCONSTANT x86_fp80 0xK400A8000000000000000
+  ; X32-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %stack.0.a.addr
+  ; X32-NEXT:   [[FRAME_INDEX2:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %stack.1.x
+  ; X32-NEXT:   G_STORE [[LOAD]](s80), [[FRAME_INDEX1]](p0) :: (store (s80) into %ir.a.addr, align 16)
+  ; X32-NEXT:   G_STORE [[C]](s80), [[FRAME_INDEX2]](p0) :: (store (s80) into %ir.x, align 16)
+  ; X32-NEXT:   [[LOAD1:%[0-9]+]]:psr(s80) = G_LOAD [[FRAME_INDEX1]](p0) :: (dereferenceable load (s80) from %ir.a.addr, align 16)
+  ; X32-NEXT:   [[LOAD2:%[0-9]+]]:psr(s80) = G_LOAD [[FRAME_INDEX2]](p0) :: (dereferenceable load (s80) from %ir.x, align 16)
+  ; X32-NEXT:   [[FADD:%[0-9]+]]:psr(s80) = G_FADD [[LOAD1]], [[LOAD2]]
+  ; X32-NEXT:   $fp0 = COPY [[FADD]](s80)
+  ; X32-NEXT:   RET 0, implicit $fp0
+  ;
+  ; X64-LABEL: name: f0
+  ; X64: bb.1.entry:
+  ; X64-NEXT:   [[FRAME_INDEX:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.0
+  ; X64-NEXT:   [[LOAD:%[0-9]+]]:psr(s80) = G_LOAD [[FRAME_INDEX]](p0) :: (invariant load (s80) from %fixed-stack.0, align 16)
+  ; X64-NEXT:   [[C:%[0-9]+]]:psr(s80) = G_FCONSTANT x86_fp80 0xK400A8000000000000000
+  ; X64-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %stack.0.a.addr
+  ; X64-NEXT:   [[FRAME_INDEX2:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %stack.1.x
+  ; X64-NEXT:   G_STORE [[LOAD]](s80), [[FRAME_INDEX1]](p0) :: (store (s80) into %ir.a.addr, align 16)
+  ; X64-NEXT:   G_STORE [[C]](s80), [[FRAME_INDEX2]](p0) :: (store (s80) into %ir.x, align 16)
+  ; X64-NEXT:   [[LOAD1:%[0-9]+]]:psr(s80) = G_LOAD [[FRAME_INDEX1]](p0) :: (dereferenceable load (s80) from %ir.a.addr, align 16)
+  ; X64-NEXT:   [[LOAD2:%[0-9]+]]:psr(s80) = G_LOAD [[FRAME_INDEX2]](p0) :: (dereferenceable load (s80) from %ir.x, align 16)
+  ; X64-NEXT:   [[FADD:%[0-9]+]]:psr(s80) = G_FADD [[LOAD1]], [[LOAD2]]
+  ; X64-NEXT:   $fp0 = COPY [[FADD]](s80)
+  ; X64-NEXT:   RET 0, implicit $fp0
+entry:
+  %a.addr = alloca x86_fp80, align 16
+  %x = alloca x86_fp80, align 16
+  store x86_fp80 %a, ptr %a.addr, align 16
+  store x86_fp80 0xK400A8000000000000000, ptr %x, align 16
+  %0 = load x86_fp80, ptr %a.addr, align 16
+  %1 = load x86_fp80, ptr %x, align 16
+  %add = fadd x86_fp80 %0, %1
+  ret x86_fp80 %add
+}
+
+declare x86_fp80 @llvm.sqrt.f32(x86_fp80)
+
+define void @f1(ptr %a, ptr %b) {
+  ; X32-LABEL: name: f1
+  ; X32: bb.1 (%ir-block.0):
+  ; X32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.1
+  ; X32-NEXT:   [[LOAD:%[0-9]+]]:gpr(p0) = G_LOAD [[FRAME_INDEX]](p0) :: (invariant load (p0) from %fixed-stack.1)
+  ; X32-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.0
+  ; X32-NEXT:   [[LOAD1:%[0-9]+]]:gpr(p0) = G_LOAD [[FRAME_INDEX1]](p0) :: (invariant load (p0) from %fixed-stack.0)
+  ; X32-NEXT:   [[LOAD2:%[0-9]+]]:psr(s80) = G_LOAD [[LOAD]](p0) :: (load (s80) from %ir.a, align 4)
+  ; X32-NEXT:   [[LOAD3:%[0-9]+]]:psr(s80) = G_LOAD [[LOAD1]](p0) :: (load (s80) from %ir.b, align 4)
+  ; X32-NEXT:   [[FSUB:%[0-9]+]]:psr(s80) = G_FSUB [[LOAD2]], [[LOAD3]]
+  ; X32-NEXT:   G_STORE [[FSUB]](s80), [[LOAD]](p0) :: (store (s80) into %ir.a, align 4)
+  ; X32-NEXT:   RET 0
+  ;
+  ; X64-LABEL: name: f1
+  ; X64: bb.1 (%ir-block.0):
+  ; X64-NEXT:   liveins: $rdi, $rsi
+  ; X64-NEXT: {{  $}}
+  ; X64-NEXT:   [[COPY:%[0-9]+]]:gpr(p0) = COPY $rdi
+  ; X64-NEXT:   [[COPY1:%[0-9]+]]:gpr(p0) = COPY $rsi
+  ; X64-NEXT:   [[LOAD:%[0-9]+]]:psr(s80) = G_LOAD [[COPY]](p0) :: (load (s80) from %ir.a, align 4)
+  ; X64-NEXT:   [[LOAD1:%[0-9]+]]:psr(s80) = G_LOAD [[COPY1]](p0) :: (load (s80) from %ir.b, align 4)
+  ; X64-NEXT:   [[FSUB:%[0-9]+]]:psr(s80) = G_FSUB [[LOAD]], [[LOAD1]]
+  ; X64-NEXT:   G_STORE [[FSUB]](s80), [[COPY]](p0) :: (store (s80) into %ir.a, align 4)
+  ; X64-NEXT:   RET 0
+  %1 = load x86_fp80, ptr %a, align 4
+  %2 = load x86_fp80, ptr %b, align 4
+  %sub = fsub x86_fp80 %1, %2
+  store x86_fp80 %sub, ptr %a, align 4
+  ret void
+}
+
+define void @f2(ptr %a, ptr %b) {
+  ; X32-LABEL: name: f2
+  ; X32: bb.1 (%ir-block.0):
+  ; X32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.1
+  ; X32-NEXT:   [[LOAD:%[0-9]+]]:gpr(p0) = G_LOAD [[FRAME_INDEX]](p0) :: (invariant load (p0) from %fixed-stack.1)
+  ; X32-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.0
+  ; X32-NEXT:   [[LOAD1:%[0-9]+]]:gpr(p0) = G_LOAD [[FRAME_INDEX1]](p0) :: (invariant load (p0) from %fixed-stack.0)
+  ; X32-NEXT:   [[LOAD2:%[0-9]+]]:psr(s80) = G_LOAD [[LOAD]](p0) :: (load (s80) from %ir.a, align 16)
+  ; X32-NEXT:   [[LOAD3:%[0-9]+]]:psr(s80) = G_LOAD [[LOAD1]](p0) :: (load (s80) from %ir.b, align 16)
+  ; X32-NEXT:   [[FMUL:%[0-9]+]]:psr(s80) = G_FMUL [[LOAD2]], [[LOAD3]]
+  ; X32-NEXT:   G_STORE [[FMUL]](s80), [[LOAD]](p0) :: (store (s80) into %ir.a, align 16)
+  ; X32-NEXT:   RET 0
+  ;
+  ; X64-LABEL: name: f2
+  ; X64: bb.1 (%ir-block.0):
+  ; X64-NEXT:   liveins: $rdi, $rsi
+  ; X64-NEXT: {{  $}}
+  ; X64-NEXT:   [[COPY:%[0-9]+]]:gpr(p0) = COPY $rdi
+  ; X64-NEXT:   [[COPY1:%[0-9]+]]:gpr(p0) = COPY $rsi
+  ; X64-NEXT:   [[LOAD:%[0-9]+]]:psr(s80) = G_LOAD [[COPY]](p0) :: (load (s80) from %ir.a, align 16)
+  ; X64-NEXT:   [[LOAD1:%[0-9]+]]:psr(s80) = G_LOAD [[COPY1]](p0) :: (load (s80) from %ir.b, align 16)
+  ; X64-NEXT:   [[FMUL:%[0-9]+]]:psr(s80) = G_FMUL [[LOAD]], [[LOAD1]]
+  ; X64-NEXT:   G_STORE [[FMUL]](s80), [[COPY]](p0) :: (store (s80) into %ir.a, align 16)
+  ; X64-NEXT:   RET 0
+  %1 = load x86_fp80, ptr %a, align 16
+  %2 = load x86_fp80, ptr %b, align 16
+  %mul = fmul x86_fp80 %1, %2
+  store x86_fp80 %mul, ptr %a, align 16
+  ret void
+}
+
+define void @f3(ptr %a, ptr %b) {
+  ; X32-LABEL: name: f3
+  ; X32: bb.1 (%ir-block.0):
+  ; X32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.1
+  ; X32-NEXT:   [[LOAD:%[0-9]+]]:gpr(p0) = G_LOAD [[FRAME_INDEX]](p0) :: (invariant load (p0) from %fixed-stack.1)
+  ; X32-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.0
+  ; X32-NEXT:   [[LOAD1:%[0-9]+]]:gpr(p0) = G_LOAD [[FRAME_INDEX1]](p0) :: (invariant load (p0) from %fixed-stack.0)
+  ; X32-NEXT:   [[LOAD2:%[0-9]+]]:psr(s80) = G_LOAD [[LOAD]](p0) :: (load (s80) from %ir.a, align 4)
+  ; X32-NEXT:   [[LOAD3:%[0-9]+]]:psr(s80) = G_LOAD [[LOAD1]](p0) :: (load (s80) from %ir.b, align 4)
+  ; X32-NEXT:   [[FDIV:%[0-9]+]]:psr(s80) = G_FDIV [[LOAD2]], [[LOAD3]]
+  ; X32-NEXT:   G_STORE [[FDIV]](s80), [[LOAD]](p0) :: (store (s80) into %ir.a, align 4)
+  ; X32-NEXT:   RET 0
+  ;
+  ; X64-LABEL: name: f3
+  ; X64: bb.1 (%ir-block.0):
+  ; X64-NEXT:   liveins: $rdi, $rsi
+  ; X64-NEXT: {{  $}}
+  ; X64-NEXT:   [[COPY:%[0-9]+]]:gpr(p0) = COPY $rdi
+  ; X64-NEXT:   [[COPY1:%[0-9]+]]:gpr(p0) = COPY $rsi
+  ; X64-NEXT:   [[LOAD:%[0-9]+]]:psr(s80) = G_LOAD [[COPY]](p0) :: (load (s80) from %ir.a, align 4)
+  ; X64-NEXT:   [[LOAD1:%[0-9]+]]:psr(s80) = G_LOAD [[COPY1]](p0) :: (load (s80) from %ir.b, align 4)
+  ; X64-NEXT:   [[FDIV:%[0-9]+]]:psr(s80) = G_FDIV [[LOAD]], [[LOAD1]]
+  ; X64-NEXT:   G_STORE [[FDIV]](s80), [[COPY]](p0) :: (store (s80) into %ir.a, align 4)
+  ; X64-NEXT:   RET 0
+  %1 = load x86_fp80, ptr %a, align 4
+  %2 = load x86_fp80, ptr %b, align 4
+  %div = fdiv x86_fp80 %1, %2
+  store x86_fp80 %div, ptr %a, align 4
+  ret void
+}
+
+define float @f4(float %val) {
+  ; X32-LABEL: name: f4
+  ; X32: bb.1 (%ir-block.0):
+  ; X32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.0
+  ; X32-NEXT:   [[LOAD:%[0-9]+]]:gpr(s32) = G_LOAD [[FRAME_INDEX]](p0) :: (invariant load (s32) from %fixed-stack.0)
+  ; X32-NEXT:   $fp0 = COPY [[LOAD]](s32)
+  ; X32-NEXT:   RET 0, implicit $fp0
+  ;
+  ; X64-LABEL: name: f4
+  ; X64: bb.1 (%ir-block.0):
+  ; X64-NEXT:   [[FRAME_INDEX:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.0
+  ; X64-NEXT:   [[LOAD:%[0-9]+]]:gpr(s32) = G_LOAD [[FRAME_INDEX]](p0) :: (invariant load (s32) from %fixed-stack.0, align 16)
+  ; X64-NEXT:   $xmm0 = COPY [[LOAD]](s32)
+  ; X64-NEXT:   RET 0, implicit $xmm0
+  ret float %val
+}
+
+define void @f5(ptr %a, ptr %b) {
+  ; X32-LABEL: name: f5
+  ; X32: bb.1 (%ir-block.0):
+  ; X32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.1
+  ; X32-NEXT:   [[LOAD:%[0-9]+]]:gpr(p0) = G_LOAD [[FRAME_INDEX]](p0) :: (invariant load (p0) from %fixed-stack.1)
+  ; X32-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:gpr(p0) = G_FRAME_INDEX %fixed-stack.0
+  ; X32-NEXT:   [[LOAD1:%[0-9]+]]:gpr(p0) = G_LOAD [[FRAME_INDEX1]](p0) :: (invariant load (p0) from %fixed-stack.0)
+  ; X32-NEXT:   [[LOAD2:%[0-9]+]]:gpr(s32) = G_LOAD [[LOAD]](p0) :: (load (s32) from %ir.a, align 8)
+  ; X32-NEXT:   [[C:%[0-9]+]]:gpr(s32) = G_CONSTANT i32 4
+  ; X32-NEXT:   [[PTR_ADD:%[0-9]+]]:gpr(p0) = G_PTR_ADD [[LOAD]], [[C]](s32)
+  ; X32-NEXT:   [[COPY:%[0-9]+]]:gpr(p0) = COPY [[PTR_ADD]](p0)
+  ; X32-NEXT:   [[LOAD3:%[0-9]+]]:gpr(s32) = G_LOAD [[COPY]](p0) :: (load (s32) from %ir.a + 4, basealign 8)
+  ; X32-NEXT:   [[MV:%[0-9]+]]:gpr(s64) = G_MERGE_VALUES [[LOAD2]](s32), [[LOAD3]](s32)
+  ; X32-NEXT:   [[LOAD4:%[0-9]+]]:gpr(s32) = G_LOAD [[LOAD1]](p0) :: (load (s32) from %ir.b, align 8)
+  ; X32-NEXT:   [[PTR_ADD1:%[0-9]+]]:gpr(p0) = G_PTR_ADD [[LOAD1]], [[C]](s32)
+  ; X32-NEXT:   [[LOAD5:%[0-9]+]]:gpr(s32) = G_LOAD [[PTR_ADD1]](p0) :: (load (s32) from %ir.b + 4, basealign 8)
+  ; X32-NEXT:   [[MV1:%[0-9]+]]:gpr(s64) = G_MERGE_VALUES [[LOAD4]](s32), [[LOAD5]](s32)
+  ; X32-NEXT: ...
[truncated]

@MalaySanghi
Copy link
Contributor Author

Tagging @e-kud for review. Please add others for opinion/review.

});

// fp arithmetic
getActionDefinitionsBuilder({G_FADD, G_FSUB, G_FMUL, G_FDIV})
.legalIf([=](const LegalityQuery &Query) {
return (HasSSE1 && typeInSet(0, {s32, v4s32})(Query)) ||
(HasSSE2 && typeInSet(0, {s64, v2s64})(Query)) ||
return (typeInSet(0, {s32, s64})(Query)) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing issue but I think this is an overly verbose way of expressing the simplest type of operation with a single type operand

llvm/lib/Target/X86/GISel/X86RegisterBankInfo.cpp Outdated Show resolved Hide resolved
llvm/test/CodeGen/X86/GlobalISel/regbankselect-x87.ll Outdated Show resolved Hide resolved
@tschuett
Copy link
Member

tschuett commented Mar 1, 2024

This is required because with SSE enabled, x86 doesn't use fp stack and instead uses AVX registers.

Did you mean SSE registers?

@tschuett tschuett requested a review from RKSimon March 1, 2024 08:49
@MalaySanghi
Copy link
Contributor Author

This is required because with SSE enabled, x86 doesn't use fp stack and instead uses AVX registers.

Did you mean SSE registers?

Yes, I did. Updated. Thanks for pointing this out

@MalaySanghi
Copy link
Contributor Author

FYI, the windows failure is random. Related issue: #83534

@RKSimon RKSimon requested a review from e-kud March 1, 2024 10:33
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GISel have support for the stackifier?

@MalaySanghi
Copy link
Contributor Author

Does GISel have support for the stackifier?

The stackifier is an independent post RA pass. It is enabled on the GISel path as weel. However, it doesn't work as expected today and many x87 programs fail at instruction selection in GISel.

I'm not sure what you mean by support here - stackifier is pattern based i.e. it requires the opcodes from OpcodeTable in the IR. On the GISel path, these instructions are not yet generated (and it is WIP). The current PR is the first step. I plan to have follow up changes which will improve the porting of patterns from DAG to GISel. That will enable us to get instructions like LD_Fp32m and we'll be able to evaluate stackifier status better.

@MalaySanghi
Copy link
Contributor Author

Does GISel have support for the stackifier?

When you say stackifier, do you mean the post-RA pass or something else? I don't recall anything seeing anything specific to stackifier in the DAG passes. Maybe I missed something. Can you please explain if I missed it?

@RKSimon
Copy link
Collaborator

RKSimon commented Mar 1, 2024

Does GISel have support for the stackifier?

Sorry that was a vague question - but I think you answered it :)

@e-kud
Copy link
Contributor

e-kud commented Mar 5, 2024

LGTM. @MalaySanghiIntel thanks!

@e-kud e-kud merged commit 564b81d into llvm:main Mar 5, 2024
3 of 4 checks passed
@e-kud
Copy link
Contributor

e-kud commented Mar 5, 2024

Ashes on my head, I haven't fixed the tags. I will be more attentive next time.

@MalaySanghi MalaySanghi deleted the ms_x87_regbank branch March 5, 2024 03:47
@MaskRay
Copy link
Member

MaskRay commented Mar 5, 2024

This commit is associated with a users.noreply.github.com email address. Per
https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it/74223/32 , proper email addresses are preferred.

@MalaySanghi
Copy link
Contributor Author

This commit is associated with a users.noreply.github.com email address. Per https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it/74223/32 , proper email addresses are preferred.

I've made my email public. Thanks for pointing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants