[llvm] Ensure that soft float targets don't use float/vector code for memops. #107022

alexrp · 2024-09-02T21:37:52Z

The RISC-V backend still gets this very wrong, hence why I'm not adding a test for it. In its case, the reason appears to be that it doesn't support these attributes at all (yet?). This will require much deeper surgery to fix, which I might do in follow-up patches.

Closes #105978.

llvmbot · 2024-09-02T21:38:24Z

@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-hexagon
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-backend-aarch64

Author: Alex Rønne Petersen (alexrp)

Changes

Callers of TargetLowering::findOptimalMemOpLowering() and ::getOptimalMemOpLLT() now pass an argument indicating whether they're interested in float/vector types, which is decided based on the use-soft-float and noimplicitfloat function attributes. If the implementation returns a float or vector type when the caller is not interested in those, the caller will just act as if EVT::Other had been returned. In this way, there's no need for a typical implementation to worry about getting these rarer cases right (which, in practice, some of them didn't previously). Still, an implementation can use the new parameter in its logic if that's helpful; for example, x86 and PPC do so because, for integer scalars, they want to handle misaligned operations differently from the target-agnostic code's handling of EVT::Other.

Note: The RISC-V backend still gets this very wrong, hence why I'm not adding a test for it. In its case, the reason appears to be that it doesn't support these attributes at all (yet?). This will require much deeper surgery to fix, which I might do in follow-up patches.

Closes #105978.

Patch is 68.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107022.diff

28 Files Affected:

(modified) llvm/include/llvm/CodeGen/TargetLowering.h (+10-5)
(modified) llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp (+8-3)
(modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+9-3)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+10-16)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+4-2)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-1)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.h (+2-1)
(modified) llvm/lib/Target/ARM/ARMISelLowering.cpp (+3-3)
(modified) llvm/lib/Target/ARM/ARMISelLowering.h (+2-1)
(modified) llvm/lib/Target/BPF/BPFISelLowering.h (+2-1)
(modified) llvm/lib/Target/Hexagon/HexagonISelLowering.cpp (+2-1)
(modified) llvm/lib/Target/Hexagon/HexagonISelLowering.h (+2-1)
(modified) llvm/lib/Target/Mips/MipsISelLowering.cpp (-8)
(modified) llvm/lib/Target/Mips/MipsISelLowering.h (-3)
(modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+4-2)
(modified) llvm/lib/Target/PowerPC/PPCISelLowering.h (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+2-4)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+2-1)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+2-1)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.h (+2-1)
(modified) llvm/lib/Target/X86/X86ISelLowering.h (+2-1)
(modified) llvm/lib/Target/X86/X86ISelLoweringCall.cpp (+3-2)
(added) llvm/test/CodeGen/ARM/memop-soft-float.ll (+128)
(added) llvm/test/CodeGen/Mips/memop-soft-float.ll (+234)
(added) llvm/test/CodeGen/PowerPC/memop-soft-float.ll (+296)
(added) llvm/test/CodeGen/SPARC/memop-soft-float.ll (+128)
(added) llvm/test/CodeGen/SystemZ/memop-soft-float.ll (+51)
(added) llvm/test/CodeGen/X86/memop-soft-float.ll (+464)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index eda38cd8a564d6..fd44223f3d4ce0 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -1957,19 +1957,24 @@ class TargetLoweringBase {
                           unsigned *Fast = nullptr) const;
 
   /// Returns the target specific optimal type for load and store operations as
-  /// a result of memset, memcpy, and memmove lowering.
-  /// It returns EVT::Other if the type should be determined using generic
-  /// target-independent logic.
+  /// a result of memset, memcpy, and memmove lowering. It returns EVT::Other if
+  /// the type should be determined using generic target-independent logic.
+  ///
+  /// The PreferIntScalar parameter is a hint from the caller; if true *and* the
+  /// implementation returns a float or vector type, the caller will discard the
+  /// result and proceed as if EVT::Other had been returned.
   virtual EVT
   getOptimalMemOpType(const MemOp &Op,
-                      const AttributeList & /*FuncAttributes*/) const {
+                      const AttributeList & /*FuncAttributes*/,
+                      bool /*PreferIntScalar*/) const {
     return MVT::Other;
   }
 
   /// LLT returning variant.
   virtual LLT
   getOptimalMemOpLLT(const MemOp &Op,
-                     const AttributeList & /*FuncAttributes*/) const {
+                     const AttributeList & /*FuncAttributes*/,
+                     bool /*PreferIntScalar*/) const {
     return LLT();
   }
 
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 3fece81df1f2fd..c3fe0c46f8d8d5 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -8846,9 +8846,14 @@ static bool findGISelOptimalMemOpLowering(std::vector<LLT> &MemOps,
   if (Op.isMemcpyWithFixedDstAlign() && Op.getSrcAlign() < Op.getDstAlign())
     return false;
 
-  LLT Ty = TLI.getOptimalMemOpLLT(Op, FuncAttributes);
-
-  if (Ty == LLT()) {
+  bool WantIntScalar = TLI.useSoftFloat() ||
+                       FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
+  LLT Ty = TLI.getOptimalMemOpLLT(Op, FuncAttributes, WantIntScalar);
+
+  // The target may well report supporting vector operations to do the
+  // operation, but we don't want to use those as a matter of policy if we're
+  // using soft float or if implicit float operations are disallowed.
+  if (Ty == LLT() || (WantIntScalar && Ty.isVector())) {
     // Use the largest scalar type whose alignment constraints are satisfied.
     // We only need to check DstAlign here as SrcAlign is always greater or
     // equal to DstAlign (or zero).
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 4e796289cff0a1..0dac6085950dc3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -203,9 +203,15 @@ bool TargetLowering::findOptimalMemOpLowering(
       Op.getSrcAlign() < Op.getDstAlign())
     return false;
 
-  EVT VT = getOptimalMemOpType(Op, FuncAttributes);
-
-  if (VT == MVT::Other) {
+  bool WantIntScalar = useSoftFloat() ||
+                       FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
+  EVT VT = getOptimalMemOpType(Op, FuncAttributes, WantIntScalar);
+
+  // The target may well report supporting float/vector operations to do the
+  // operation, but we don't want to use those as a matter of policy if we're
+  // using soft float or if implicit float operations are disallowed.
+  if (VT == MVT::Other ||
+      (WantIntScalar && (VT.isVector() || VT.isFloatingPoint()))) {
     // Use the largest integer type whose alignment constraints are satisfied.
     // We only need to check DstAlign here as SrcAlign is always greater or
     // equal to DstAlign (or zero).
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 5ac5b7f8a5ab18..6fb386e5e4bfbb 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -17657,10 +17657,10 @@ bool AArch64TargetLowering::lowerInterleaveIntrinsicToStore(
 }
 
 EVT AArch64TargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
-  bool CanImplicitFloat = !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
-  bool CanUseNEON = Subtarget->hasNEON() && CanImplicitFloat;
-  bool CanUseFP = Subtarget->hasFPARMv8() && CanImplicitFloat;
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
+  bool CanUseNEON = Subtarget->hasNEON();
+  bool CanUseFP = Subtarget->hasFPARMv8();
   // Only use AdvSIMD to implement memset of 32-byte and above. It would have
   // taken one instruction to materialize the v2i64 zero and one store (with
   // restrictive addressing mode). Just do i64 stores.
@@ -17679,18 +17679,15 @@ EVT AArch64TargetLowering::getOptimalMemOpType(
     return MVT::v16i8;
   if (CanUseFP && !IsSmallMemset && AlignmentIsAcceptable(MVT::f128, Align(16)))
     return MVT::f128;
-  if (Op.size() >= 8 && AlignmentIsAcceptable(MVT::i64, Align(8)))
-    return MVT::i64;
-  if (Op.size() >= 4 && AlignmentIsAcceptable(MVT::i32, Align(4)))
-    return MVT::i32;
+
   return MVT::Other;
 }
 
 LLT AArch64TargetLowering::getOptimalMemOpLLT(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
-  bool CanImplicitFloat = !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
-  bool CanUseNEON = Subtarget->hasNEON() && CanImplicitFloat;
-  bool CanUseFP = Subtarget->hasFPARMv8() && CanImplicitFloat;
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
+  bool CanUseNEON = Subtarget->hasNEON();
+  bool CanUseFP = Subtarget->hasFPARMv8();
   // Only use AdvSIMD to implement memset of 32-byte and above. It would have
   // taken one instruction to materialize the v2i64 zero and one store (with
   // restrictive addressing mode). Just do i64 stores.
@@ -17709,10 +17706,7 @@ LLT AArch64TargetLowering::getOptimalMemOpLLT(
     return LLT::fixed_vector(2, 64);
   if (CanUseFP && !IsSmallMemset && AlignmentIsAcceptable(MVT::f128, Align(16)))
     return LLT::scalar(128);
-  if (Op.size() >= 8 && AlignmentIsAcceptable(MVT::i64, Align(8)))
-    return LLT::scalar(64);
-  if (Op.size() >= 4 && AlignmentIsAcceptable(MVT::i32, Align(4)))
-    return LLT::scalar(32);
+
   return LLT();
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 39d5df0de0eec7..0192012b717a74 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -721,10 +721,12 @@ class AArch64TargetLowering : public TargetLowering {
   bool shouldConsiderGEPOffsetSplit() const override;
 
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override;
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override;
 
   LLT getOptimalMemOpLLT(const MemOp &Op,
-                         const AttributeList &FuncAttributes) const override;
+                         const AttributeList &FuncAttributes,
+                         bool PreferIntScalar) const override;
 
   /// Return true if the addressing mode represented by AM is legal for this
   /// target, for a load/store of the specified type.
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1437f3d58b5e79..5d114cab93ce58 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1862,7 +1862,8 @@ bool SITargetLowering::allowsMisalignedMemoryAccesses(
 }
 
 EVT SITargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
   // FIXME: Should account for address space here.
 
   // The default fallback uses the private pointer size as a guess for a type to
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h
index eed4b3e79cdeee..c80bd56a7cf48d 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h
@@ -345,7 +345,8 @@ class SITargetLowering final : public AMDGPUTargetLowering {
       unsigned *IsFast = nullptr) const override;
 
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override;
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override;
 
   bool isMemOpUniform(const SDNode *N) const;
   bool isMemOpHasNoClobberedMemOperand(const SDNode *N) const;
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 4ab0433069ae66..db163398755e33 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -19191,10 +19191,10 @@ bool ARMTargetLowering::allowsMisalignedMemoryAccesses(EVT VT, unsigned,
 
 
 EVT ARMTargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
   // See if we can use NEON instructions for this...
-  if ((Op.isMemcpy() || Op.isZeroMemset()) && Subtarget->hasNEON() &&
-      !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat)) {
+  if ((Op.isMemcpy() || Op.isZeroMemset()) && Subtarget->hasNEON()) {
     unsigned Fast;
     if (Op.size() >= 16 &&
         (Op.isAligned(Align(16)) ||
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.h b/llvm/lib/Target/ARM/ARMISelLowering.h
index a255e9b6fc365f..e8d8eab0c9a031 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.h
+++ b/llvm/lib/Target/ARM/ARMISelLowering.h
@@ -455,7 +455,8 @@ class VectorType;
                                         unsigned *Fast) const override;
 
     EVT getOptimalMemOpType(const MemOp &Op,
-                            const AttributeList &FuncAttributes) const override;
+                            const AttributeList &FuncAttributes,
+                            bool PreferIntScalar) const override;
 
     bool isTruncateFree(Type *SrcTy, Type *DstTy) const override;
     bool isTruncateFree(EVT SrcVT, EVT DstVT) const override;
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.h b/llvm/lib/Target/BPF/BPFISelLowering.h
index 42707949e864cd..85b969a3edb6e1 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -114,7 +114,8 @@ class BPFTargetLowering : public TargetLowering {
                           SelectionDAG &DAG) const override;
 
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override {
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override {
     return Op.size() >= 8 ? MVT::i64 : MVT::i32;
   }
 
diff --git a/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp b/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
index 856c952e785dac..83023021d79d5a 100644
--- a/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
@@ -3783,7 +3783,8 @@ bool HexagonTargetLowering::IsEligibleForTailCallOptimization(
 /// does not need to be loaded.  It returns EVT::Other if the type should be
 /// determined using generic target-independent logic.
 EVT HexagonTargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
   if (Op.size() >= 8 && Op.isAligned(Align(8)))
     return MVT::i64;
   if (Op.size() >= 4 && Op.isAligned(Align(4)))
diff --git a/llvm/lib/Target/Hexagon/HexagonISelLowering.h b/llvm/lib/Target/Hexagon/HexagonISelLowering.h
index 3fd961f5a74623..a568d67c5fdf91 100644
--- a/llvm/lib/Target/Hexagon/HexagonISelLowering.h
+++ b/llvm/lib/Target/Hexagon/HexagonISelLowering.h
@@ -326,7 +326,8 @@ class HexagonTargetLowering : public TargetLowering {
   bool isLegalICmpImmediate(int64_t Imm) const override;
 
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override;
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override;
 
   bool allowsMemoryAccess(LLVMContext &Context, const DataLayout &DL, EVT VT,
                           unsigned AddrSpace, Align Alignment,
diff --git a/llvm/lib/Target/Mips/MipsISelLowering.cpp b/llvm/lib/Target/Mips/MipsISelLowering.cpp
index 0f2047fcac640e..4448a502ec4db3 100644
--- a/llvm/lib/Target/Mips/MipsISelLowering.cpp
+++ b/llvm/lib/Target/Mips/MipsISelLowering.cpp
@@ -4331,14 +4331,6 @@ MipsTargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
   return false;
 }
 
-EVT MipsTargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
-  if (Subtarget.hasMips64())
-    return MVT::i64;
-
-  return MVT::i32;
-}
-
 bool MipsTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,
                                       bool ForCodeSize) const {
   if (VT != MVT::f32 && VT != MVT::f64)
diff --git a/llvm/lib/Target/Mips/MipsISelLowering.h b/llvm/lib/Target/Mips/MipsISelLowering.h
index 84ad40d6bbbe26..e2a077bd9104af 100644
--- a/llvm/lib/Target/Mips/MipsISelLowering.h
+++ b/llvm/lib/Target/Mips/MipsISelLowering.h
@@ -664,9 +664,6 @@ class TargetRegisterClass;
 
     bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;
 
-    EVT getOptimalMemOpType(const MemOp &Op,
-                            const AttributeList &FuncAttributes) const override;
-
     /// isFPImmLegal - Returns true if the target can instruction select the
     /// specified FP immediate natively. If false, the legalizer will
     /// materialize the FP immediate as a load from a constant pool.
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 459a96eca1ff20..17b26178f7960f 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -17400,8 +17400,10 @@ bool PPCTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
 /// It returns EVT::Other if the type should be determined using generic
 /// target-independent logic.
 EVT PPCTargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
-  if (getTargetMachine().getOptLevel() != CodeGenOptLevel::None) {
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
+  if (getTargetMachine().getOptLevel() != CodeGenOptLevel::None &&
+      !PreferIntScalar) {
     // We should use Altivec/VSX loads and stores when available. For unaligned
     // addresses, unaligned VSX loads are only fast starting with the P8.
     if (Subtarget.hasAltivec() && Op.size() >= 16) {
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index 0bdfdcd15441f4..978a2ed6a606de 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -1072,7 +1072,8 @@ namespace llvm {
     /// It returns EVT::Other if the type should be determined using generic
     /// target-independent logic.
     EVT getOptimalMemOpType(const MemOp &Op,
-                            const AttributeList &FuncAttributes) const override;
+                            const AttributeList &FuncAttributes,
+                            bool PreferIntScalar) const override;
 
     /// Is unaligned memory access allowed for the given type, and is it fast
     /// relative to software emulation.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 670dee2edb1dfb..fdbf202946a093 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -21047,13 +21047,11 @@ bool RISCVTargetLowering::allowsMisalignedMemoryAccesses(
 
 
 EVT RISCVTargetLowering::getOptimalMemOpType(const MemOp &Op,
-                                             const AttributeList &FuncAttributes) const {
+                                             const AttributeList &FuncAttributes,
+                                             bool PreferIntScalar) const {
   if (!Subtarget.hasVInstructions())
     return MVT::Other;
 
-  if (FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat))
-    return MVT::Other;
-
   // We use LMUL1 memory operations here for a non-obvious reason.  Our caller
   // has an expansion threshold, and we want the number of hardware memory
   // operations to correspond roughly to that threshold.  LMUL>1 operations
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 1b91ab43a4637f..5a99d764190a7c 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -783,7 +783,8 @@ class RISCVTargetLowering : public TargetLowering {
       unsigned *Fast = nullptr) const override;
 
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override;
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override;
 
   bool splitValueIntoRegisterParts(
       SelectionDAG & DAG, const SDLoc &DL, SDValue Val, SDValue *Parts,
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 6f84bd6c6e4ff4..a5b0dc2d8ca268 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -1133,7 +1133,8 @@ bool SystemZTargetLowering::findOptimalMemOpLowering(
 }
 
 EVT SystemZTargetLowering::getOptimalMemOpType(const MemOp &Op,
-                                   const AttributeList &FuncAttributes) const {
+                                   const AttributeList &FuncAttributes,
+                                   bool PreferIntScalar) const {
   return Subtarget.hasVector() ? MVT::v2i64 : MVT::Other;
 }
 
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.h b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
index 1e7285e3e0fc53..0046078f829050 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.h
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
@@ -494,7 +494,8 @@ class SystemZTargetLowering : public TargetLowering {
                            const MemOp &Op, unsigned DstAS, unsigned SrcAS,
                            const AttributeList &FuncAttributes) const override;
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override;
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override;
   bool isTruncateFree(Type *, Type *) const override;
   bool isTruncateFree(EVT, EVT) const override;
 
diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h
index 93d2b3e65742b2..0f137a4f4a2e02 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.h
+++ b/llvm/lib/Target/X86/X86ISelLowering.h
@@ -1057,7 +1057,8 @@ namespace llvm {
                                    const DataLayout &DL) const override;
 
     EVT getOptimalMemOpType(const MemOp &Op,
-                            const AttributeList &FuncAttributes) const override;
+                            const AttributeList &FuncAttributes,
+                            bool PreferIntScalar) const override;
 
     /// Returns true if it's safe to use load / store of the
     /// specified type to expand memcpy / memset inline. This is mostly true
diff --git a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
index ab1eeb4111ccdb..75a8fd24078a56 100644
--- a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+++ b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@@ -280,8 +280,9 @@ uint64_t X86TargetLowerin...
[truncated]

llvmbot · 2024-09-02T21:38:24Z

@llvm/pr-subscribers-backend-amdgpu

Author: Alex Rønne Petersen (alexrp)

Changes

Callers of TargetLowering::findOptimalMemOpLowering() and ::getOptimalMemOpLLT() now pass an argument indicating whether they're interested in float/vector types, which is decided based on the use-soft-float and noimplicitfloat function attributes. If the implementation returns a float or vector type when the caller is not interested in those, the caller will just act as if EVT::Other had been returned. In this way, there's no need for a typical implementation to worry about getting these rarer cases right (which, in practice, some of them didn't previously). Still, an implementation can use the new parameter in its logic if that's helpful; for example, x86 and PPC do so because, for integer scalars, they want to handle misaligned operations differently from the target-agnostic code's handling of EVT::Other.

Note: The RISC-V backend still gets this very wrong, hence why I'm not adding a test for it. In its case, the reason appears to be that it doesn't support these attributes at all (yet?). This will require much deeper surgery to fix, which I might do in follow-up patches.

Closes #105978.

Patch is 68.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107022.diff

28 Files Affected:

(modified) llvm/include/llvm/CodeGen/TargetLowering.h (+10-5)
(modified) llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp (+8-3)
(modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+9-3)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+10-16)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+4-2)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-1)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.h (+2-1)
(modified) llvm/lib/Target/ARM/ARMISelLowering.cpp (+3-3)
(modified) llvm/lib/Target/ARM/ARMISelLowering.h (+2-1)
(modified) llvm/lib/Target/BPF/BPFISelLowering.h (+2-1)
(modified) llvm/lib/Target/Hexagon/HexagonISelLowering.cpp (+2-1)
(modified) llvm/lib/Target/Hexagon/HexagonISelLowering.h (+2-1)
(modified) llvm/lib/Target/Mips/MipsISelLowering.cpp (-8)
(modified) llvm/lib/Target/Mips/MipsISelLowering.h (-3)
(modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+4-2)
(modified) llvm/lib/Target/PowerPC/PPCISelLowering.h (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+2-4)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+2-1)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+2-1)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.h (+2-1)
(modified) llvm/lib/Target/X86/X86ISelLowering.h (+2-1)
(modified) llvm/lib/Target/X86/X86ISelLoweringCall.cpp (+3-2)
(added) llvm/test/CodeGen/ARM/memop-soft-float.ll (+128)
(added) llvm/test/CodeGen/Mips/memop-soft-float.ll (+234)
(added) llvm/test/CodeGen/PowerPC/memop-soft-float.ll (+296)
(added) llvm/test/CodeGen/SPARC/memop-soft-float.ll (+128)
(added) llvm/test/CodeGen/SystemZ/memop-soft-float.ll (+51)
(added) llvm/test/CodeGen/X86/memop-soft-float.ll (+464)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index eda38cd8a564d6..fd44223f3d4ce0 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -1957,19 +1957,24 @@ class TargetLoweringBase {
                           unsigned *Fast = nullptr) const;
 
   /// Returns the target specific optimal type for load and store operations as
-  /// a result of memset, memcpy, and memmove lowering.
-  /// It returns EVT::Other if the type should be determined using generic
-  /// target-independent logic.
+  /// a result of memset, memcpy, and memmove lowering. It returns EVT::Other if
+  /// the type should be determined using generic target-independent logic.
+  ///
+  /// The PreferIntScalar parameter is a hint from the caller; if true *and* the
+  /// implementation returns a float or vector type, the caller will discard the
+  /// result and proceed as if EVT::Other had been returned.
   virtual EVT
   getOptimalMemOpType(const MemOp &Op,
-                      const AttributeList & /*FuncAttributes*/) const {
+                      const AttributeList & /*FuncAttributes*/,
+                      bool /*PreferIntScalar*/) const {
     return MVT::Other;
   }
 
   /// LLT returning variant.
   virtual LLT
   getOptimalMemOpLLT(const MemOp &Op,
-                     const AttributeList & /*FuncAttributes*/) const {
+                     const AttributeList & /*FuncAttributes*/,
+                     bool /*PreferIntScalar*/) const {
     return LLT();
   }
 
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 3fece81df1f2fd..c3fe0c46f8d8d5 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -8846,9 +8846,14 @@ static bool findGISelOptimalMemOpLowering(std::vector<LLT> &MemOps,
   if (Op.isMemcpyWithFixedDstAlign() && Op.getSrcAlign() < Op.getDstAlign())
     return false;
 
-  LLT Ty = TLI.getOptimalMemOpLLT(Op, FuncAttributes);
-
-  if (Ty == LLT()) {
+  bool WantIntScalar = TLI.useSoftFloat() ||
+                       FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
+  LLT Ty = TLI.getOptimalMemOpLLT(Op, FuncAttributes, WantIntScalar);
+
+  // The target may well report supporting vector operations to do the
+  // operation, but we don't want to use those as a matter of policy if we're
+  // using soft float or if implicit float operations are disallowed.
+  if (Ty == LLT() || (WantIntScalar && Ty.isVector())) {
     // Use the largest scalar type whose alignment constraints are satisfied.
     // We only need to check DstAlign here as SrcAlign is always greater or
     // equal to DstAlign (or zero).
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 4e796289cff0a1..0dac6085950dc3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -203,9 +203,15 @@ bool TargetLowering::findOptimalMemOpLowering(
       Op.getSrcAlign() < Op.getDstAlign())
     return false;
 
-  EVT VT = getOptimalMemOpType(Op, FuncAttributes);
-
-  if (VT == MVT::Other) {
+  bool WantIntScalar = useSoftFloat() ||
+                       FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
+  EVT VT = getOptimalMemOpType(Op, FuncAttributes, WantIntScalar);
+
+  // The target may well report supporting float/vector operations to do the
+  // operation, but we don't want to use those as a matter of policy if we're
+  // using soft float or if implicit float operations are disallowed.
+  if (VT == MVT::Other ||
+      (WantIntScalar && (VT.isVector() || VT.isFloatingPoint()))) {
     // Use the largest integer type whose alignment constraints are satisfied.
     // We only need to check DstAlign here as SrcAlign is always greater or
     // equal to DstAlign (or zero).
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 5ac5b7f8a5ab18..6fb386e5e4bfbb 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -17657,10 +17657,10 @@ bool AArch64TargetLowering::lowerInterleaveIntrinsicToStore(
 }
 
 EVT AArch64TargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
-  bool CanImplicitFloat = !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
-  bool CanUseNEON = Subtarget->hasNEON() && CanImplicitFloat;
-  bool CanUseFP = Subtarget->hasFPARMv8() && CanImplicitFloat;
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
+  bool CanUseNEON = Subtarget->hasNEON();
+  bool CanUseFP = Subtarget->hasFPARMv8();
   // Only use AdvSIMD to implement memset of 32-byte and above. It would have
   // taken one instruction to materialize the v2i64 zero and one store (with
   // restrictive addressing mode). Just do i64 stores.
@@ -17679,18 +17679,15 @@ EVT AArch64TargetLowering::getOptimalMemOpType(
     return MVT::v16i8;
   if (CanUseFP && !IsSmallMemset && AlignmentIsAcceptable(MVT::f128, Align(16)))
     return MVT::f128;
-  if (Op.size() >= 8 && AlignmentIsAcceptable(MVT::i64, Align(8)))
-    return MVT::i64;
-  if (Op.size() >= 4 && AlignmentIsAcceptable(MVT::i32, Align(4)))
-    return MVT::i32;
+
   return MVT::Other;
 }
 
 LLT AArch64TargetLowering::getOptimalMemOpLLT(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
-  bool CanImplicitFloat = !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
-  bool CanUseNEON = Subtarget->hasNEON() && CanImplicitFloat;
-  bool CanUseFP = Subtarget->hasFPARMv8() && CanImplicitFloat;
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
+  bool CanUseNEON = Subtarget->hasNEON();
+  bool CanUseFP = Subtarget->hasFPARMv8();
   // Only use AdvSIMD to implement memset of 32-byte and above. It would have
   // taken one instruction to materialize the v2i64 zero and one store (with
   // restrictive addressing mode). Just do i64 stores.
@@ -17709,10 +17706,7 @@ LLT AArch64TargetLowering::getOptimalMemOpLLT(
     return LLT::fixed_vector(2, 64);
   if (CanUseFP && !IsSmallMemset && AlignmentIsAcceptable(MVT::f128, Align(16)))
     return LLT::scalar(128);
-  if (Op.size() >= 8 && AlignmentIsAcceptable(MVT::i64, Align(8)))
-    return LLT::scalar(64);
-  if (Op.size() >= 4 && AlignmentIsAcceptable(MVT::i32, Align(4)))
-    return LLT::scalar(32);
+
   return LLT();
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 39d5df0de0eec7..0192012b717a74 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -721,10 +721,12 @@ class AArch64TargetLowering : public TargetLowering {
   bool shouldConsiderGEPOffsetSplit() const override;
 
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override;
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override;
 
   LLT getOptimalMemOpLLT(const MemOp &Op,
-                         const AttributeList &FuncAttributes) const override;
+                         const AttributeList &FuncAttributes,
+                         bool PreferIntScalar) const override;
 
   /// Return true if the addressing mode represented by AM is legal for this
   /// target, for a load/store of the specified type.
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1437f3d58b5e79..5d114cab93ce58 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1862,7 +1862,8 @@ bool SITargetLowering::allowsMisalignedMemoryAccesses(
 }
 
 EVT SITargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
   // FIXME: Should account for address space here.
 
   // The default fallback uses the private pointer size as a guess for a type to
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h
index eed4b3e79cdeee..c80bd56a7cf48d 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h
@@ -345,7 +345,8 @@ class SITargetLowering final : public AMDGPUTargetLowering {
       unsigned *IsFast = nullptr) const override;
 
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override;
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override;
 
   bool isMemOpUniform(const SDNode *N) const;
   bool isMemOpHasNoClobberedMemOperand(const SDNode *N) const;
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 4ab0433069ae66..db163398755e33 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -19191,10 +19191,10 @@ bool ARMTargetLowering::allowsMisalignedMemoryAccesses(EVT VT, unsigned,
 
 
 EVT ARMTargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
   // See if we can use NEON instructions for this...
-  if ((Op.isMemcpy() || Op.isZeroMemset()) && Subtarget->hasNEON() &&
-      !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat)) {
+  if ((Op.isMemcpy() || Op.isZeroMemset()) && Subtarget->hasNEON()) {
     unsigned Fast;
     if (Op.size() >= 16 &&
         (Op.isAligned(Align(16)) ||
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.h b/llvm/lib/Target/ARM/ARMISelLowering.h
index a255e9b6fc365f..e8d8eab0c9a031 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.h
+++ b/llvm/lib/Target/ARM/ARMISelLowering.h
@@ -455,7 +455,8 @@ class VectorType;
                                         unsigned *Fast) const override;
 
     EVT getOptimalMemOpType(const MemOp &Op,
-                            const AttributeList &FuncAttributes) const override;
+                            const AttributeList &FuncAttributes,
+                            bool PreferIntScalar) const override;
 
     bool isTruncateFree(Type *SrcTy, Type *DstTy) const override;
     bool isTruncateFree(EVT SrcVT, EVT DstVT) const override;
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.h b/llvm/lib/Target/BPF/BPFISelLowering.h
index 42707949e864cd..85b969a3edb6e1 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -114,7 +114,8 @@ class BPFTargetLowering : public TargetLowering {
                           SelectionDAG &DAG) const override;
 
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override {
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override {
     return Op.size() >= 8 ? MVT::i64 : MVT::i32;
   }
 
diff --git a/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp b/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
index 856c952e785dac..83023021d79d5a 100644
--- a/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
@@ -3783,7 +3783,8 @@ bool HexagonTargetLowering::IsEligibleForTailCallOptimization(
 /// does not need to be loaded.  It returns EVT::Other if the type should be
 /// determined using generic target-independent logic.
 EVT HexagonTargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
   if (Op.size() >= 8 && Op.isAligned(Align(8)))
     return MVT::i64;
   if (Op.size() >= 4 && Op.isAligned(Align(4)))
diff --git a/llvm/lib/Target/Hexagon/HexagonISelLowering.h b/llvm/lib/Target/Hexagon/HexagonISelLowering.h
index 3fd961f5a74623..a568d67c5fdf91 100644
--- a/llvm/lib/Target/Hexagon/HexagonISelLowering.h
+++ b/llvm/lib/Target/Hexagon/HexagonISelLowering.h
@@ -326,7 +326,8 @@ class HexagonTargetLowering : public TargetLowering {
   bool isLegalICmpImmediate(int64_t Imm) const override;
 
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override;
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override;
 
   bool allowsMemoryAccess(LLVMContext &Context, const DataLayout &DL, EVT VT,
                           unsigned AddrSpace, Align Alignment,
diff --git a/llvm/lib/Target/Mips/MipsISelLowering.cpp b/llvm/lib/Target/Mips/MipsISelLowering.cpp
index 0f2047fcac640e..4448a502ec4db3 100644
--- a/llvm/lib/Target/Mips/MipsISelLowering.cpp
+++ b/llvm/lib/Target/Mips/MipsISelLowering.cpp
@@ -4331,14 +4331,6 @@ MipsTargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
   return false;
 }
 
-EVT MipsTargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
-  if (Subtarget.hasMips64())
-    return MVT::i64;
-
-  return MVT::i32;
-}
-
 bool MipsTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,
                                       bool ForCodeSize) const {
   if (VT != MVT::f32 && VT != MVT::f64)
diff --git a/llvm/lib/Target/Mips/MipsISelLowering.h b/llvm/lib/Target/Mips/MipsISelLowering.h
index 84ad40d6bbbe26..e2a077bd9104af 100644
--- a/llvm/lib/Target/Mips/MipsISelLowering.h
+++ b/llvm/lib/Target/Mips/MipsISelLowering.h
@@ -664,9 +664,6 @@ class TargetRegisterClass;
 
     bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;
 
-    EVT getOptimalMemOpType(const MemOp &Op,
-                            const AttributeList &FuncAttributes) const override;
-
     /// isFPImmLegal - Returns true if the target can instruction select the
     /// specified FP immediate natively. If false, the legalizer will
     /// materialize the FP immediate as a load from a constant pool.
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 459a96eca1ff20..17b26178f7960f 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -17400,8 +17400,10 @@ bool PPCTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
 /// It returns EVT::Other if the type should be determined using generic
 /// target-independent logic.
 EVT PPCTargetLowering::getOptimalMemOpType(
-    const MemOp &Op, const AttributeList &FuncAttributes) const {
-  if (getTargetMachine().getOptLevel() != CodeGenOptLevel::None) {
+    const MemOp &Op, const AttributeList &FuncAttributes,
+    bool PreferIntScalar) const {
+  if (getTargetMachine().getOptLevel() != CodeGenOptLevel::None &&
+      !PreferIntScalar) {
     // We should use Altivec/VSX loads and stores when available. For unaligned
     // addresses, unaligned VSX loads are only fast starting with the P8.
     if (Subtarget.hasAltivec() && Op.size() >= 16) {
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index 0bdfdcd15441f4..978a2ed6a606de 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -1072,7 +1072,8 @@ namespace llvm {
     /// It returns EVT::Other if the type should be determined using generic
     /// target-independent logic.
     EVT getOptimalMemOpType(const MemOp &Op,
-                            const AttributeList &FuncAttributes) const override;
+                            const AttributeList &FuncAttributes,
+                            bool PreferIntScalar) const override;
 
     /// Is unaligned memory access allowed for the given type, and is it fast
     /// relative to software emulation.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 670dee2edb1dfb..fdbf202946a093 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -21047,13 +21047,11 @@ bool RISCVTargetLowering::allowsMisalignedMemoryAccesses(
 
 
 EVT RISCVTargetLowering::getOptimalMemOpType(const MemOp &Op,
-                                             const AttributeList &FuncAttributes) const {
+                                             const AttributeList &FuncAttributes,
+                                             bool PreferIntScalar) const {
   if (!Subtarget.hasVInstructions())
     return MVT::Other;
 
-  if (FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat))
-    return MVT::Other;
-
   // We use LMUL1 memory operations here for a non-obvious reason.  Our caller
   // has an expansion threshold, and we want the number of hardware memory
   // operations to correspond roughly to that threshold.  LMUL>1 operations
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 1b91ab43a4637f..5a99d764190a7c 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -783,7 +783,8 @@ class RISCVTargetLowering : public TargetLowering {
       unsigned *Fast = nullptr) const override;
 
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override;
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override;
 
   bool splitValueIntoRegisterParts(
       SelectionDAG & DAG, const SDLoc &DL, SDValue Val, SDValue *Parts,
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 6f84bd6c6e4ff4..a5b0dc2d8ca268 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -1133,7 +1133,8 @@ bool SystemZTargetLowering::findOptimalMemOpLowering(
 }
 
 EVT SystemZTargetLowering::getOptimalMemOpType(const MemOp &Op,
-                                   const AttributeList &FuncAttributes) const {
+                                   const AttributeList &FuncAttributes,
+                                   bool PreferIntScalar) const {
   return Subtarget.hasVector() ? MVT::v2i64 : MVT::Other;
 }
 
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.h b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
index 1e7285e3e0fc53..0046078f829050 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.h
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
@@ -494,7 +494,8 @@ class SystemZTargetLowering : public TargetLowering {
                            const MemOp &Op, unsigned DstAS, unsigned SrcAS,
                            const AttributeList &FuncAttributes) const override;
   EVT getOptimalMemOpType(const MemOp &Op,
-                          const AttributeList &FuncAttributes) const override;
+                          const AttributeList &FuncAttributes,
+                          bool PreferIntScalar) const override;
   bool isTruncateFree(Type *, Type *) const override;
   bool isTruncateFree(EVT, EVT) const override;
 
diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h
index 93d2b3e65742b2..0f137a4f4a2e02 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.h
+++ b/llvm/lib/Target/X86/X86ISelLowering.h
@@ -1057,7 +1057,8 @@ namespace llvm {
                                    const DataLayout &DL) const override;
 
     EVT getOptimalMemOpType(const MemOp &Op,
-                            const AttributeList &FuncAttributes) const override;
+                            const AttributeList &FuncAttributes,
+                            bool PreferIntScalar) const override;
 
     /// Returns true if it's safe to use load / store of the
     /// specified type to expand memcpy / memset inline. This is mostly true
diff --git a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
index ab1eeb4111ccdb..75a8fd24078a56 100644
--- a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+++ b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@@ -280,8 +280,9 @@ uint64_t X86TargetLowerin...
[truncated]

github-actions · 2024-09-02T21:41:44Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Target/Mips/MipsISelLowering.cpp

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

… memops. Note: The RISC-V backend still gets this very wrong, hence why I'm not adding a test for it. In its case, the reason appears to be that it doesn't support these attributes at all (yet?). This will require much deeper surgery to fix, which I might do in follow-up patches. Closes llvm#105978.

arsenm · 2025-05-05T10:30:44Z

llvm/test/CodeGen/ARM/memop-soft-float.ll

+    call void @llvm.memcpy.p0.p0.i32(ptr null, ptr null, i32 32, i1 true)
+    call void @llvm.memset.p0.i32(ptr null, i8 0, i32 32, i1 true)
+    ret void


2 space indent. Also avoid the UB null accesses?

arsenm · 2025-05-05T10:32:42Z

llvm/include/llvm/CodeGen/TargetLowering.h

@@ -2023,6 +2023,11 @@ class TargetLoweringBase {
    return LLT();
  }

+  bool useIntScalarMemOps(const AttributeList &FuncAttributes) const {


Needs documentation. Also the name should reflect the meaning, not the implementation. This seems to me like it's just an extension of useSoftFloat which contains the function context. Can you just add the argument to useSoftFloat instead?

Also the name should reflect the meaning, not the implementation.

Hmm, I'm not sure I follow; it seems to me that's what it does already? What do you have in mind?

This seems to me like it's just an extension of useSoftFloat which contains the function context. Can you just add the argument to useSoftFloat instead?

The backends are inconsistent about whether they use TM.Options.FloatABIType or useSoftFloat() for ABI questions. The ideal (IMO) would be that they all agree on using the former for ABI questions and the latter for "can we use float/vector code within ABI boundaries?" (which would just be about the presence of noimplicitfloat, at least for now).

This PR is intended to be an incremental step towards that ideal. But if you think it makes more sense, I can try to tackle that whole inconsistency in one go.

"useIntScalarMemOps" Why int? Why scalar? What mem ops? In what context? The core change has nothing to do with any of these things, this is about handling of some fuzzy definition of floating point operations which I've never understood

useSoftFloat is better but also vague. use "soft" float for what?

"useIntScalarMemOps" Why int? Why scalar? What mem ops? In what context? The core change has nothing to do with any of these things

The IntScalar term is used in other places in the repo and just seems to mean non-vector integers. Granted, the Mem part is unnecessary because the function could equally well apply to other areas than just getOptimalMemOpType() which I'm modifying in this PR.

Essentially the question being asked is: Are we able/allowed to use float/vector instructions? If useSoftFloat() is true (usually a result of CPU feature flags), then the answer is technically yes but it would be pointless work because the instructions would just be converted to soft float equivalents immediately after, and likely result in worse code than had we just used the int-scalar path to begin with. If noimplicitfloat is set on the function, then the answer is a definite no, probably because we're compiling kernel code or similar.

So, perhaps avoidFloatOrVectorOps() or something like that?

useSoftFloat is better but also vague. use "soft" float for what?

Well, indeed, that's the problem I was alluding to above re: how backends are inconsistent about what they take it to mean.

I also forgot to mention the use-soft-float function attribute, which each backend typically transforms into a subtarget CPU feature flag (like soft-float or hard-float) which finally informs useSoftFloat(). It's all quite confusing and messy.

As I see it, the term "soft float" has traditionally been used for one of two distinct concepts:

Hardware features associated with floating-point (and/or vector) are not available. This includes instructions, but more importantly also registers. These may not be available because the HW doesn't actually have them, or because they must not be used in the given context (e.g. Linux kernel code).

The ABI for passing floating-point (and/or vector) types does not use any floating-point (vector) HW features (specifically, registers).

Now clearly if we have no float/vector registers, we cannot use an ABI that requires them. The inverse is not necessarily true - you could implement an ABI that does not use float registers, but the generated code is still allowed to use float registers within a function. That's the case that I think sometimes causes confusion.

At least on SystemZ, the intent of the -msoft-float flag in all its permutations has always been both 1) and 2) - i.e. it implements a non-standard ABI, and also prohibits any use of floating-point or vector registers.

The inverse is not necessarily true - you could implement an ABI that does not use float registers, but the generated code is still allowed to use float registers within a function. That's the case that I think sometimes causes confusion.

I think ARM is the only backend that supports this configuration right now, although it is pretty fragile if you don't massage the CPU feature flags and function attributes just right IIRC. I think ideally every backend should allow this configuration, though; not doing so seems mostly like an artificial restriction.

In my ideal world, you'd set TM.Options.FloatABIType to pick the soft/hard ABI, and in the case of soft ABI, optionally set the use-soft-float function attribute if you want to also lower all float/vector computation to soft float. noimplicitfloat would keep its current meaning, i.e. the compiler isn't allowed to introduce float/vector operations that weren't already there.

Clang options would then translate like this:

-mfloat-abi=soft/-msoft-float: FloatABI::ABIType::Soft + use-soft-float + noimplicitfloat

-mfloat-abi=softfp: FloatABI::ABIType::Soft

-mfloat-abi=hard/-mhard-float: FloatABI::ABIType::Hard

-mno-implicit-float: noimplicitfloat

Just one additional comment: "if you want to also lower all float/vector computation to soft float" is not the actual concern for the primary use case of -msoft-float on our platform: the Linux kernel. This code base already never uses any float/vector computation in the first place. The primary concern is to prevent the compiler from introducing uses of float/vector registers even for code that does have any float/vector operations at the source level.

uweigand · 2025-05-05T10:40:46Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

-  return Subtarget.hasVector() ? MVT::v2i64 : MVT::Other;
+EVT SystemZTargetLowering::getOptimalMemOpType(
+    const MemOp &Op, const AttributeList &FuncAttributes) const {
+  return Subtarget.hasVector() && !useIntScalarMemOps(FuncAttributes)


Not sure why this is necessary. This is supposed to handled once and for all in SystemZSubtarget::initializeSubtargetDependencies by switching off the vector facility completely for soft-float:

// -msoft-float implies -mno-vx. if (HasSoftFloat) HasVector = false;

I think you might be right that it's not needed. I don't actually recall why I modified this function in the initial version of the PR.

I'll re-check and remove it if it's unnecessary.

llvmbot added backend:ARM backend:AArch64 backend:AMDGPU backend:Hexagon backend:RISC-V backend:PowerPC backend:SystemZ backend:X86 llvm:globalisel llvm:SelectionDAG SelectionDAGISel as well labels Sep 2, 2024

alexrp force-pushed the llvm-memcpy-soft-float branch from 7dc7e9a to ce5e25e Compare September 2, 2024 21:53

alexrp commented Sep 2, 2024

View reviewed changes

llvm/lib/Target/Mips/MipsISelLowering.cpp Outdated Show resolved Hide resolved

arsenm reviewed Sep 4, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp Outdated Show resolved Hide resolved

alexrp mentioned this pull request Jan 18, 2025

enhance memcpy and remove redundant implementations ziglang/zig#22513

Merged

alexrp force-pushed the llvm-memcpy-soft-float branch from ce5e25e to 8145b6f Compare May 4, 2025 12:45

alexrp requested a review from arsenm May 4, 2025 12:46

arsenm reviewed May 5, 2025

View reviewed changes

uweigand reviewed May 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llvm] Ensure that soft float targets don't use float/vector code for memops. #107022

[llvm] Ensure that soft float targets don't use float/vector code for memops. #107022

Uh oh!

alexrp commented Sep 2, 2024 •

edited

Loading

Uh oh!

llvmbot commented Sep 2, 2024 •

edited

Loading

Uh oh!

llvmbot commented Sep 2, 2024

Uh oh!

github-actions bot commented Sep 2, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

arsenm May 5, 2025

Uh oh!

arsenm May 5, 2025

Uh oh!

alexrp May 5, 2025 •

edited

Loading

Uh oh!

arsenm May 5, 2025

Uh oh!

arsenm May 5, 2025

Uh oh!

alexrp May 5, 2025

Uh oh!

uweigand May 5, 2025

Uh oh!

alexrp May 5, 2025 •

edited

Loading

Uh oh!

uweigand May 5, 2025

Uh oh!

uweigand May 5, 2025

Uh oh!

alexrp May 5, 2025

Uh oh!

Uh oh!

[llvm] Ensure that soft float targets don't use float/vector code for memops. #107022

Are you sure you want to change the base?

[llvm] Ensure that soft float targets don't use float/vector code for memops. #107022

Uh oh!

Conversation

alexrp commented Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 2, 2024

Uh oh!

github-actions bot commented Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexrp May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexrp May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexrp commented Sep 2, 2024 •

edited

Loading

llvmbot commented Sep 2, 2024 •

edited

Loading

github-actions bot commented Sep 2, 2024 •

edited

Loading

alexrp May 5, 2025 •

edited

Loading

alexrp May 5, 2025 •

edited

Loading