-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[llvm] Ensure that soft float targets don't use float/vector code for memops. #107022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-aarch64 Author: Alex Rønne Petersen (alexrp) ChangesCallers of Note: The RISC-V backend still gets this very wrong, hence why I'm not adding a test for it. In its case, the reason appears to be that it doesn't support these attributes at all (yet?). This will require much deeper surgery to fix, which I might do in follow-up patches. Closes #105978. Patch is 68.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107022.diff 28 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index eda38cd8a564d6..fd44223f3d4ce0 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -1957,19 +1957,24 @@ class TargetLoweringBase {
unsigned *Fast = nullptr) const;
/// Returns the target specific optimal type for load and store operations as
- /// a result of memset, memcpy, and memmove lowering.
- /// It returns EVT::Other if the type should be determined using generic
- /// target-independent logic.
+ /// a result of memset, memcpy, and memmove lowering. It returns EVT::Other if
+ /// the type should be determined using generic target-independent logic.
+ ///
+ /// The PreferIntScalar parameter is a hint from the caller; if true *and* the
+ /// implementation returns a float or vector type, the caller will discard the
+ /// result and proceed as if EVT::Other had been returned.
virtual EVT
getOptimalMemOpType(const MemOp &Op,
- const AttributeList & /*FuncAttributes*/) const {
+ const AttributeList & /*FuncAttributes*/,
+ bool /*PreferIntScalar*/) const {
return MVT::Other;
}
/// LLT returning variant.
virtual LLT
getOptimalMemOpLLT(const MemOp &Op,
- const AttributeList & /*FuncAttributes*/) const {
+ const AttributeList & /*FuncAttributes*/,
+ bool /*PreferIntScalar*/) const {
return LLT();
}
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 3fece81df1f2fd..c3fe0c46f8d8d5 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -8846,9 +8846,14 @@ static bool findGISelOptimalMemOpLowering(std::vector<LLT> &MemOps,
if (Op.isMemcpyWithFixedDstAlign() && Op.getSrcAlign() < Op.getDstAlign())
return false;
- LLT Ty = TLI.getOptimalMemOpLLT(Op, FuncAttributes);
-
- if (Ty == LLT()) {
+ bool WantIntScalar = TLI.useSoftFloat() ||
+ FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
+ LLT Ty = TLI.getOptimalMemOpLLT(Op, FuncAttributes, WantIntScalar);
+
+ // The target may well report supporting vector operations to do the
+ // operation, but we don't want to use those as a matter of policy if we're
+ // using soft float or if implicit float operations are disallowed.
+ if (Ty == LLT() || (WantIntScalar && Ty.isVector())) {
// Use the largest scalar type whose alignment constraints are satisfied.
// We only need to check DstAlign here as SrcAlign is always greater or
// equal to DstAlign (or zero).
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 4e796289cff0a1..0dac6085950dc3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -203,9 +203,15 @@ bool TargetLowering::findOptimalMemOpLowering(
Op.getSrcAlign() < Op.getDstAlign())
return false;
- EVT VT = getOptimalMemOpType(Op, FuncAttributes);
-
- if (VT == MVT::Other) {
+ bool WantIntScalar = useSoftFloat() ||
+ FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
+ EVT VT = getOptimalMemOpType(Op, FuncAttributes, WantIntScalar);
+
+ // The target may well report supporting float/vector operations to do the
+ // operation, but we don't want to use those as a matter of policy if we're
+ // using soft float or if implicit float operations are disallowed.
+ if (VT == MVT::Other ||
+ (WantIntScalar && (VT.isVector() || VT.isFloatingPoint()))) {
// Use the largest integer type whose alignment constraints are satisfied.
// We only need to check DstAlign here as SrcAlign is always greater or
// equal to DstAlign (or zero).
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 5ac5b7f8a5ab18..6fb386e5e4bfbb 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -17657,10 +17657,10 @@ bool AArch64TargetLowering::lowerInterleaveIntrinsicToStore(
}
EVT AArch64TargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
- bool CanImplicitFloat = !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
- bool CanUseNEON = Subtarget->hasNEON() && CanImplicitFloat;
- bool CanUseFP = Subtarget->hasFPARMv8() && CanImplicitFloat;
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
+ bool CanUseNEON = Subtarget->hasNEON();
+ bool CanUseFP = Subtarget->hasFPARMv8();
// Only use AdvSIMD to implement memset of 32-byte and above. It would have
// taken one instruction to materialize the v2i64 zero and one store (with
// restrictive addressing mode). Just do i64 stores.
@@ -17679,18 +17679,15 @@ EVT AArch64TargetLowering::getOptimalMemOpType(
return MVT::v16i8;
if (CanUseFP && !IsSmallMemset && AlignmentIsAcceptable(MVT::f128, Align(16)))
return MVT::f128;
- if (Op.size() >= 8 && AlignmentIsAcceptable(MVT::i64, Align(8)))
- return MVT::i64;
- if (Op.size() >= 4 && AlignmentIsAcceptable(MVT::i32, Align(4)))
- return MVT::i32;
+
return MVT::Other;
}
LLT AArch64TargetLowering::getOptimalMemOpLLT(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
- bool CanImplicitFloat = !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
- bool CanUseNEON = Subtarget->hasNEON() && CanImplicitFloat;
- bool CanUseFP = Subtarget->hasFPARMv8() && CanImplicitFloat;
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
+ bool CanUseNEON = Subtarget->hasNEON();
+ bool CanUseFP = Subtarget->hasFPARMv8();
// Only use AdvSIMD to implement memset of 32-byte and above. It would have
// taken one instruction to materialize the v2i64 zero and one store (with
// restrictive addressing mode). Just do i64 stores.
@@ -17709,10 +17706,7 @@ LLT AArch64TargetLowering::getOptimalMemOpLLT(
return LLT::fixed_vector(2, 64);
if (CanUseFP && !IsSmallMemset && AlignmentIsAcceptable(MVT::f128, Align(16)))
return LLT::scalar(128);
- if (Op.size() >= 8 && AlignmentIsAcceptable(MVT::i64, Align(8)))
- return LLT::scalar(64);
- if (Op.size() >= 4 && AlignmentIsAcceptable(MVT::i32, Align(4)))
- return LLT::scalar(32);
+
return LLT();
}
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 39d5df0de0eec7..0192012b717a74 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -721,10 +721,12 @@ class AArch64TargetLowering : public TargetLowering {
bool shouldConsiderGEPOffsetSplit() const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
LLT getOptimalMemOpLLT(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
/// Return true if the addressing mode represented by AM is legal for this
/// target, for a load/store of the specified type.
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1437f3d58b5e79..5d114cab93ce58 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1862,7 +1862,8 @@ bool SITargetLowering::allowsMisalignedMemoryAccesses(
}
EVT SITargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
// FIXME: Should account for address space here.
// The default fallback uses the private pointer size as a guess for a type to
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h
index eed4b3e79cdeee..c80bd56a7cf48d 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h
@@ -345,7 +345,8 @@ class SITargetLowering final : public AMDGPUTargetLowering {
unsigned *IsFast = nullptr) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
bool isMemOpUniform(const SDNode *N) const;
bool isMemOpHasNoClobberedMemOperand(const SDNode *N) const;
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 4ab0433069ae66..db163398755e33 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -19191,10 +19191,10 @@ bool ARMTargetLowering::allowsMisalignedMemoryAccesses(EVT VT, unsigned,
EVT ARMTargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
// See if we can use NEON instructions for this...
- if ((Op.isMemcpy() || Op.isZeroMemset()) && Subtarget->hasNEON() &&
- !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat)) {
+ if ((Op.isMemcpy() || Op.isZeroMemset()) && Subtarget->hasNEON()) {
unsigned Fast;
if (Op.size() >= 16 &&
(Op.isAligned(Align(16)) ||
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.h b/llvm/lib/Target/ARM/ARMISelLowering.h
index a255e9b6fc365f..e8d8eab0c9a031 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.h
+++ b/llvm/lib/Target/ARM/ARMISelLowering.h
@@ -455,7 +455,8 @@ class VectorType;
unsigned *Fast) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
bool isTruncateFree(Type *SrcTy, Type *DstTy) const override;
bool isTruncateFree(EVT SrcVT, EVT DstVT) const override;
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.h b/llvm/lib/Target/BPF/BPFISelLowering.h
index 42707949e864cd..85b969a3edb6e1 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -114,7 +114,8 @@ class BPFTargetLowering : public TargetLowering {
SelectionDAG &DAG) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override {
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override {
return Op.size() >= 8 ? MVT::i64 : MVT::i32;
}
diff --git a/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp b/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
index 856c952e785dac..83023021d79d5a 100644
--- a/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
@@ -3783,7 +3783,8 @@ bool HexagonTargetLowering::IsEligibleForTailCallOptimization(
/// does not need to be loaded. It returns EVT::Other if the type should be
/// determined using generic target-independent logic.
EVT HexagonTargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
if (Op.size() >= 8 && Op.isAligned(Align(8)))
return MVT::i64;
if (Op.size() >= 4 && Op.isAligned(Align(4)))
diff --git a/llvm/lib/Target/Hexagon/HexagonISelLowering.h b/llvm/lib/Target/Hexagon/HexagonISelLowering.h
index 3fd961f5a74623..a568d67c5fdf91 100644
--- a/llvm/lib/Target/Hexagon/HexagonISelLowering.h
+++ b/llvm/lib/Target/Hexagon/HexagonISelLowering.h
@@ -326,7 +326,8 @@ class HexagonTargetLowering : public TargetLowering {
bool isLegalICmpImmediate(int64_t Imm) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
bool allowsMemoryAccess(LLVMContext &Context, const DataLayout &DL, EVT VT,
unsigned AddrSpace, Align Alignment,
diff --git a/llvm/lib/Target/Mips/MipsISelLowering.cpp b/llvm/lib/Target/Mips/MipsISelLowering.cpp
index 0f2047fcac640e..4448a502ec4db3 100644
--- a/llvm/lib/Target/Mips/MipsISelLowering.cpp
+++ b/llvm/lib/Target/Mips/MipsISelLowering.cpp
@@ -4331,14 +4331,6 @@ MipsTargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
return false;
}
-EVT MipsTargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
- if (Subtarget.hasMips64())
- return MVT::i64;
-
- return MVT::i32;
-}
-
bool MipsTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,
bool ForCodeSize) const {
if (VT != MVT::f32 && VT != MVT::f64)
diff --git a/llvm/lib/Target/Mips/MipsISelLowering.h b/llvm/lib/Target/Mips/MipsISelLowering.h
index 84ad40d6bbbe26..e2a077bd9104af 100644
--- a/llvm/lib/Target/Mips/MipsISelLowering.h
+++ b/llvm/lib/Target/Mips/MipsISelLowering.h
@@ -664,9 +664,6 @@ class TargetRegisterClass;
bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;
- EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
-
/// isFPImmLegal - Returns true if the target can instruction select the
/// specified FP immediate natively. If false, the legalizer will
/// materialize the FP immediate as a load from a constant pool.
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 459a96eca1ff20..17b26178f7960f 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -17400,8 +17400,10 @@ bool PPCTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
/// It returns EVT::Other if the type should be determined using generic
/// target-independent logic.
EVT PPCTargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
- if (getTargetMachine().getOptLevel() != CodeGenOptLevel::None) {
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
+ if (getTargetMachine().getOptLevel() != CodeGenOptLevel::None &&
+ !PreferIntScalar) {
// We should use Altivec/VSX loads and stores when available. For unaligned
// addresses, unaligned VSX loads are only fast starting with the P8.
if (Subtarget.hasAltivec() && Op.size() >= 16) {
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index 0bdfdcd15441f4..978a2ed6a606de 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -1072,7 +1072,8 @@ namespace llvm {
/// It returns EVT::Other if the type should be determined using generic
/// target-independent logic.
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
/// Is unaligned memory access allowed for the given type, and is it fast
/// relative to software emulation.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 670dee2edb1dfb..fdbf202946a093 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -21047,13 +21047,11 @@ bool RISCVTargetLowering::allowsMisalignedMemoryAccesses(
EVT RISCVTargetLowering::getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const {
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
if (!Subtarget.hasVInstructions())
return MVT::Other;
- if (FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat))
- return MVT::Other;
-
// We use LMUL1 memory operations here for a non-obvious reason. Our caller
// has an expansion threshold, and we want the number of hardware memory
// operations to correspond roughly to that threshold. LMUL>1 operations
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 1b91ab43a4637f..5a99d764190a7c 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -783,7 +783,8 @@ class RISCVTargetLowering : public TargetLowering {
unsigned *Fast = nullptr) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
bool splitValueIntoRegisterParts(
SelectionDAG & DAG, const SDLoc &DL, SDValue Val, SDValue *Parts,
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 6f84bd6c6e4ff4..a5b0dc2d8ca268 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -1133,7 +1133,8 @@ bool SystemZTargetLowering::findOptimalMemOpLowering(
}
EVT SystemZTargetLowering::getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const {
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
return Subtarget.hasVector() ? MVT::v2i64 : MVT::Other;
}
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.h b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
index 1e7285e3e0fc53..0046078f829050 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.h
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
@@ -494,7 +494,8 @@ class SystemZTargetLowering : public TargetLowering {
const MemOp &Op, unsigned DstAS, unsigned SrcAS,
const AttributeList &FuncAttributes) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
bool isTruncateFree(Type *, Type *) const override;
bool isTruncateFree(EVT, EVT) const override;
diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h
index 93d2b3e65742b2..0f137a4f4a2e02 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.h
+++ b/llvm/lib/Target/X86/X86ISelLowering.h
@@ -1057,7 +1057,8 @@ namespace llvm {
const DataLayout &DL) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
/// Returns true if it's safe to use load / store of the
/// specified type to expand memcpy / memset inline. This is mostly true
diff --git a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
index ab1eeb4111ccdb..75a8fd24078a56 100644
--- a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+++ b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@@ -280,8 +280,9 @@ uint64_t X86TargetLowerin...
[truncated]
|
@llvm/pr-subscribers-backend-amdgpu Author: Alex Rønne Petersen (alexrp) ChangesCallers of Note: The RISC-V backend still gets this very wrong, hence why I'm not adding a test for it. In its case, the reason appears to be that it doesn't support these attributes at all (yet?). This will require much deeper surgery to fix, which I might do in follow-up patches. Closes #105978. Patch is 68.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107022.diff 28 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index eda38cd8a564d6..fd44223f3d4ce0 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -1957,19 +1957,24 @@ class TargetLoweringBase {
unsigned *Fast = nullptr) const;
/// Returns the target specific optimal type for load and store operations as
- /// a result of memset, memcpy, and memmove lowering.
- /// It returns EVT::Other if the type should be determined using generic
- /// target-independent logic.
+ /// a result of memset, memcpy, and memmove lowering. It returns EVT::Other if
+ /// the type should be determined using generic target-independent logic.
+ ///
+ /// The PreferIntScalar parameter is a hint from the caller; if true *and* the
+ /// implementation returns a float or vector type, the caller will discard the
+ /// result and proceed as if EVT::Other had been returned.
virtual EVT
getOptimalMemOpType(const MemOp &Op,
- const AttributeList & /*FuncAttributes*/) const {
+ const AttributeList & /*FuncAttributes*/,
+ bool /*PreferIntScalar*/) const {
return MVT::Other;
}
/// LLT returning variant.
virtual LLT
getOptimalMemOpLLT(const MemOp &Op,
- const AttributeList & /*FuncAttributes*/) const {
+ const AttributeList & /*FuncAttributes*/,
+ bool /*PreferIntScalar*/) const {
return LLT();
}
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 3fece81df1f2fd..c3fe0c46f8d8d5 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -8846,9 +8846,14 @@ static bool findGISelOptimalMemOpLowering(std::vector<LLT> &MemOps,
if (Op.isMemcpyWithFixedDstAlign() && Op.getSrcAlign() < Op.getDstAlign())
return false;
- LLT Ty = TLI.getOptimalMemOpLLT(Op, FuncAttributes);
-
- if (Ty == LLT()) {
+ bool WantIntScalar = TLI.useSoftFloat() ||
+ FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
+ LLT Ty = TLI.getOptimalMemOpLLT(Op, FuncAttributes, WantIntScalar);
+
+ // The target may well report supporting vector operations to do the
+ // operation, but we don't want to use those as a matter of policy if we're
+ // using soft float or if implicit float operations are disallowed.
+ if (Ty == LLT() || (WantIntScalar && Ty.isVector())) {
// Use the largest scalar type whose alignment constraints are satisfied.
// We only need to check DstAlign here as SrcAlign is always greater or
// equal to DstAlign (or zero).
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 4e796289cff0a1..0dac6085950dc3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -203,9 +203,15 @@ bool TargetLowering::findOptimalMemOpLowering(
Op.getSrcAlign() < Op.getDstAlign())
return false;
- EVT VT = getOptimalMemOpType(Op, FuncAttributes);
-
- if (VT == MVT::Other) {
+ bool WantIntScalar = useSoftFloat() ||
+ FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
+ EVT VT = getOptimalMemOpType(Op, FuncAttributes, WantIntScalar);
+
+ // The target may well report supporting float/vector operations to do the
+ // operation, but we don't want to use those as a matter of policy if we're
+ // using soft float or if implicit float operations are disallowed.
+ if (VT == MVT::Other ||
+ (WantIntScalar && (VT.isVector() || VT.isFloatingPoint()))) {
// Use the largest integer type whose alignment constraints are satisfied.
// We only need to check DstAlign here as SrcAlign is always greater or
// equal to DstAlign (or zero).
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 5ac5b7f8a5ab18..6fb386e5e4bfbb 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -17657,10 +17657,10 @@ bool AArch64TargetLowering::lowerInterleaveIntrinsicToStore(
}
EVT AArch64TargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
- bool CanImplicitFloat = !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
- bool CanUseNEON = Subtarget->hasNEON() && CanImplicitFloat;
- bool CanUseFP = Subtarget->hasFPARMv8() && CanImplicitFloat;
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
+ bool CanUseNEON = Subtarget->hasNEON();
+ bool CanUseFP = Subtarget->hasFPARMv8();
// Only use AdvSIMD to implement memset of 32-byte and above. It would have
// taken one instruction to materialize the v2i64 zero and one store (with
// restrictive addressing mode). Just do i64 stores.
@@ -17679,18 +17679,15 @@ EVT AArch64TargetLowering::getOptimalMemOpType(
return MVT::v16i8;
if (CanUseFP && !IsSmallMemset && AlignmentIsAcceptable(MVT::f128, Align(16)))
return MVT::f128;
- if (Op.size() >= 8 && AlignmentIsAcceptable(MVT::i64, Align(8)))
- return MVT::i64;
- if (Op.size() >= 4 && AlignmentIsAcceptable(MVT::i32, Align(4)))
- return MVT::i32;
+
return MVT::Other;
}
LLT AArch64TargetLowering::getOptimalMemOpLLT(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
- bool CanImplicitFloat = !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat);
- bool CanUseNEON = Subtarget->hasNEON() && CanImplicitFloat;
- bool CanUseFP = Subtarget->hasFPARMv8() && CanImplicitFloat;
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
+ bool CanUseNEON = Subtarget->hasNEON();
+ bool CanUseFP = Subtarget->hasFPARMv8();
// Only use AdvSIMD to implement memset of 32-byte and above. It would have
// taken one instruction to materialize the v2i64 zero and one store (with
// restrictive addressing mode). Just do i64 stores.
@@ -17709,10 +17706,7 @@ LLT AArch64TargetLowering::getOptimalMemOpLLT(
return LLT::fixed_vector(2, 64);
if (CanUseFP && !IsSmallMemset && AlignmentIsAcceptable(MVT::f128, Align(16)))
return LLT::scalar(128);
- if (Op.size() >= 8 && AlignmentIsAcceptable(MVT::i64, Align(8)))
- return LLT::scalar(64);
- if (Op.size() >= 4 && AlignmentIsAcceptable(MVT::i32, Align(4)))
- return LLT::scalar(32);
+
return LLT();
}
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index 39d5df0de0eec7..0192012b717a74 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -721,10 +721,12 @@ class AArch64TargetLowering : public TargetLowering {
bool shouldConsiderGEPOffsetSplit() const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
LLT getOptimalMemOpLLT(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
/// Return true if the addressing mode represented by AM is legal for this
/// target, for a load/store of the specified type.
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1437f3d58b5e79..5d114cab93ce58 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1862,7 +1862,8 @@ bool SITargetLowering::allowsMisalignedMemoryAccesses(
}
EVT SITargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
// FIXME: Should account for address space here.
// The default fallback uses the private pointer size as a guess for a type to
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.h b/llvm/lib/Target/AMDGPU/SIISelLowering.h
index eed4b3e79cdeee..c80bd56a7cf48d 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.h
@@ -345,7 +345,8 @@ class SITargetLowering final : public AMDGPUTargetLowering {
unsigned *IsFast = nullptr) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
bool isMemOpUniform(const SDNode *N) const;
bool isMemOpHasNoClobberedMemOperand(const SDNode *N) const;
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 4ab0433069ae66..db163398755e33 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -19191,10 +19191,10 @@ bool ARMTargetLowering::allowsMisalignedMemoryAccesses(EVT VT, unsigned,
EVT ARMTargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
// See if we can use NEON instructions for this...
- if ((Op.isMemcpy() || Op.isZeroMemset()) && Subtarget->hasNEON() &&
- !FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat)) {
+ if ((Op.isMemcpy() || Op.isZeroMemset()) && Subtarget->hasNEON()) {
unsigned Fast;
if (Op.size() >= 16 &&
(Op.isAligned(Align(16)) ||
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.h b/llvm/lib/Target/ARM/ARMISelLowering.h
index a255e9b6fc365f..e8d8eab0c9a031 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.h
+++ b/llvm/lib/Target/ARM/ARMISelLowering.h
@@ -455,7 +455,8 @@ class VectorType;
unsigned *Fast) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
bool isTruncateFree(Type *SrcTy, Type *DstTy) const override;
bool isTruncateFree(EVT SrcVT, EVT DstVT) const override;
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.h b/llvm/lib/Target/BPF/BPFISelLowering.h
index 42707949e864cd..85b969a3edb6e1 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -114,7 +114,8 @@ class BPFTargetLowering : public TargetLowering {
SelectionDAG &DAG) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override {
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override {
return Op.size() >= 8 ? MVT::i64 : MVT::i32;
}
diff --git a/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp b/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
index 856c952e785dac..83023021d79d5a 100644
--- a/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonISelLowering.cpp
@@ -3783,7 +3783,8 @@ bool HexagonTargetLowering::IsEligibleForTailCallOptimization(
/// does not need to be loaded. It returns EVT::Other if the type should be
/// determined using generic target-independent logic.
EVT HexagonTargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
if (Op.size() >= 8 && Op.isAligned(Align(8)))
return MVT::i64;
if (Op.size() >= 4 && Op.isAligned(Align(4)))
diff --git a/llvm/lib/Target/Hexagon/HexagonISelLowering.h b/llvm/lib/Target/Hexagon/HexagonISelLowering.h
index 3fd961f5a74623..a568d67c5fdf91 100644
--- a/llvm/lib/Target/Hexagon/HexagonISelLowering.h
+++ b/llvm/lib/Target/Hexagon/HexagonISelLowering.h
@@ -326,7 +326,8 @@ class HexagonTargetLowering : public TargetLowering {
bool isLegalICmpImmediate(int64_t Imm) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
bool allowsMemoryAccess(LLVMContext &Context, const DataLayout &DL, EVT VT,
unsigned AddrSpace, Align Alignment,
diff --git a/llvm/lib/Target/Mips/MipsISelLowering.cpp b/llvm/lib/Target/Mips/MipsISelLowering.cpp
index 0f2047fcac640e..4448a502ec4db3 100644
--- a/llvm/lib/Target/Mips/MipsISelLowering.cpp
+++ b/llvm/lib/Target/Mips/MipsISelLowering.cpp
@@ -4331,14 +4331,6 @@ MipsTargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
return false;
}
-EVT MipsTargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
- if (Subtarget.hasMips64())
- return MVT::i64;
-
- return MVT::i32;
-}
-
bool MipsTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,
bool ForCodeSize) const {
if (VT != MVT::f32 && VT != MVT::f64)
diff --git a/llvm/lib/Target/Mips/MipsISelLowering.h b/llvm/lib/Target/Mips/MipsISelLowering.h
index 84ad40d6bbbe26..e2a077bd9104af 100644
--- a/llvm/lib/Target/Mips/MipsISelLowering.h
+++ b/llvm/lib/Target/Mips/MipsISelLowering.h
@@ -664,9 +664,6 @@ class TargetRegisterClass;
bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;
- EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
-
/// isFPImmLegal - Returns true if the target can instruction select the
/// specified FP immediate natively. If false, the legalizer will
/// materialize the FP immediate as a load from a constant pool.
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 459a96eca1ff20..17b26178f7960f 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -17400,8 +17400,10 @@ bool PPCTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
/// It returns EVT::Other if the type should be determined using generic
/// target-independent logic.
EVT PPCTargetLowering::getOptimalMemOpType(
- const MemOp &Op, const AttributeList &FuncAttributes) const {
- if (getTargetMachine().getOptLevel() != CodeGenOptLevel::None) {
+ const MemOp &Op, const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
+ if (getTargetMachine().getOptLevel() != CodeGenOptLevel::None &&
+ !PreferIntScalar) {
// We should use Altivec/VSX loads and stores when available. For unaligned
// addresses, unaligned VSX loads are only fast starting with the P8.
if (Subtarget.hasAltivec() && Op.size() >= 16) {
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index 0bdfdcd15441f4..978a2ed6a606de 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -1072,7 +1072,8 @@ namespace llvm {
/// It returns EVT::Other if the type should be determined using generic
/// target-independent logic.
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
/// Is unaligned memory access allowed for the given type, and is it fast
/// relative to software emulation.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 670dee2edb1dfb..fdbf202946a093 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -21047,13 +21047,11 @@ bool RISCVTargetLowering::allowsMisalignedMemoryAccesses(
EVT RISCVTargetLowering::getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const {
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
if (!Subtarget.hasVInstructions())
return MVT::Other;
- if (FuncAttributes.hasFnAttr(Attribute::NoImplicitFloat))
- return MVT::Other;
-
// We use LMUL1 memory operations here for a non-obvious reason. Our caller
// has an expansion threshold, and we want the number of hardware memory
// operations to correspond roughly to that threshold. LMUL>1 operations
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 1b91ab43a4637f..5a99d764190a7c 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -783,7 +783,8 @@ class RISCVTargetLowering : public TargetLowering {
unsigned *Fast = nullptr) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
bool splitValueIntoRegisterParts(
SelectionDAG & DAG, const SDLoc &DL, SDValue Val, SDValue *Parts,
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 6f84bd6c6e4ff4..a5b0dc2d8ca268 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -1133,7 +1133,8 @@ bool SystemZTargetLowering::findOptimalMemOpLowering(
}
EVT SystemZTargetLowering::getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const {
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const {
return Subtarget.hasVector() ? MVT::v2i64 : MVT::Other;
}
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.h b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
index 1e7285e3e0fc53..0046078f829050 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.h
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
@@ -494,7 +494,8 @@ class SystemZTargetLowering : public TargetLowering {
const MemOp &Op, unsigned DstAS, unsigned SrcAS,
const AttributeList &FuncAttributes) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
bool isTruncateFree(Type *, Type *) const override;
bool isTruncateFree(EVT, EVT) const override;
diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h
index 93d2b3e65742b2..0f137a4f4a2e02 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.h
+++ b/llvm/lib/Target/X86/X86ISelLowering.h
@@ -1057,7 +1057,8 @@ namespace llvm {
const DataLayout &DL) const override;
EVT getOptimalMemOpType(const MemOp &Op,
- const AttributeList &FuncAttributes) const override;
+ const AttributeList &FuncAttributes,
+ bool PreferIntScalar) const override;
/// Returns true if it's safe to use load / store of the
/// specified type to expand memcpy / memset inline. This is mostly true
diff --git a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
index ab1eeb4111ccdb..75a8fd24078a56 100644
--- a/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
+++ b/llvm/lib/Target/X86/X86ISelLoweringCall.cpp
@@ -280,8 +280,9 @@ uint64_t X86TargetLowerin...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
7dc7e9a
to
ce5e25e
Compare
… memops. Note: The RISC-V backend still gets this very wrong, hence why I'm not adding a test for it. In its case, the reason appears to be that it doesn't support these attributes at all (yet?). This will require much deeper surgery to fix, which I might do in follow-up patches. Closes llvm#105978.
ce5e25e
to
8145b6f
Compare
call void @llvm.memcpy.p0.p0.i32(ptr null, ptr null, i32 32, i1 true) | ||
call void @llvm.memset.p0.i32(ptr null, i8 0, i32 32, i1 true) | ||
ret void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 space indent. Also avoid the UB null accesses?
@@ -2023,6 +2023,11 @@ class TargetLoweringBase { | |||
return LLT(); | |||
} | |||
|
|||
bool useIntScalarMemOps(const AttributeList &FuncAttributes) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs documentation. Also the name should reflect the meaning, not the implementation. This seems to me like it's just an extension of useSoftFloat which contains the function context. Can you just add the argument to useSoftFloat instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the name should reflect the meaning, not the implementation.
Hmm, I'm not sure I follow; it seems to me that's what it does already? What do you have in mind?
This seems to me like it's just an extension of useSoftFloat which contains the function context. Can you just add the argument to useSoftFloat instead?
The backends are inconsistent about whether they use TM.Options.FloatABIType
or useSoftFloat()
for ABI questions. The ideal (IMO) would be that they all agree on using the former for ABI questions and the latter for "can we use float/vector code within ABI boundaries?" (which would just be about the presence of noimplicitfloat
, at least for now).
This PR is intended to be an incremental step towards that ideal. But if you think it makes more sense, I can try to tackle that whole inconsistency in one go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"useIntScalarMemOps" Why int? Why scalar? What mem ops? In what context? The core change has nothing to do with any of these things, this is about handling of some fuzzy definition of floating point operations which I've never understood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useSoftFloat is better but also vague. use "soft" float for what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"useIntScalarMemOps" Why int? Why scalar? What mem ops? In what context? The core change has nothing to do with any of these things
The IntScalar
term is used in other places in the repo and just seems to mean non-vector integers. Granted, the Mem
part is unnecessary because the function could equally well apply to other areas than just getOptimalMemOpType()
which I'm modifying in this PR.
Essentially the question being asked is: Are we able/allowed to use float/vector instructions? If useSoftFloat()
is true (usually a result of CPU feature flags), then the answer is technically yes but it would be pointless work because the instructions would just be converted to soft float equivalents immediately after, and likely result in worse code than had we just used the int-scalar path to begin with. If noimplicitfloat
is set on the function, then the answer is a definite no, probably because we're compiling kernel code or similar.
So, perhaps avoidFloatOrVectorOps()
or something like that?
useSoftFloat is better but also vague. use "soft" float for what?
Well, indeed, that's the problem I was alluding to above re: how backends are inconsistent about what they take it to mean.
I also forgot to mention the use-soft-float
function attribute, which each backend typically transforms into a subtarget CPU feature flag (like soft-float
or hard-float
) which finally informs useSoftFloat()
. It's all quite confusing and messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see it, the term "soft float" has traditionally been used for one of two distinct concepts:
- Hardware features associated with floating-point (and/or vector) are not available. This includes instructions, but more importantly also registers. These may not be available because the HW doesn't actually have them, or because they must not be used in the given context (e.g. Linux kernel code).
- The ABI for passing floating-point (and/or vector) types does not use any floating-point (vector) HW features (specifically, registers).
Now clearly if we have no float/vector registers, we cannot use an ABI that requires them. The inverse is not necessarily true - you could implement an ABI that does not use float registers, but the generated code is still allowed to use float registers within a function. That's the case that I think sometimes causes confusion.
At least on SystemZ, the intent of the -msoft-float
flag in all its permutations has always been both 1) and 2) - i.e. it implements a non-standard ABI, and also prohibits any use of floating-point or vector registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inverse is not necessarily true - you could implement an ABI that does not use float registers, but the generated code is still allowed to use float registers within a function. That's the case that I think sometimes causes confusion.
I think ARM is the only backend that supports this configuration right now, although it is pretty fragile if you don't massage the CPU feature flags and function attributes just right IIRC. I think ideally every backend should allow this configuration, though; not doing so seems mostly like an artificial restriction.
In my ideal world, you'd set TM.Options.FloatABIType
to pick the soft/hard ABI, and in the case of soft ABI, optionally set the use-soft-float
function attribute if you want to also lower all float/vector computation to soft float. noimplicitfloat
would keep its current meaning, i.e. the compiler isn't allowed to introduce float/vector operations that weren't already there.
Clang options would then translate like this:
-mfloat-abi=soft
/-msoft-float
:FloatABI::ABIType::Soft
+use-soft-float
+noimplicitfloat
-mfloat-abi=softfp
:FloatABI::ABIType::Soft
-mfloat-abi=hard
/-mhard-float
:FloatABI::ABIType::Hard
-mno-implicit-float
:noimplicitfloat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one additional comment: "if you want to also lower all float/vector computation to soft float" is not the actual concern for the primary use case of -msoft-float
on our platform: the Linux kernel. This code base already never uses any float/vector computation in the first place. The primary concern is to prevent the compiler from introducing uses of float/vector registers even for code that does have any float/vector operations at the source level.
return Subtarget.hasVector() ? MVT::v2i64 : MVT::Other; | ||
EVT SystemZTargetLowering::getOptimalMemOpType( | ||
const MemOp &Op, const AttributeList &FuncAttributes) const { | ||
return Subtarget.hasVector() && !useIntScalarMemOps(FuncAttributes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this is necessary. This is supposed to handled once and for all in SystemZSubtarget::initializeSubtargetDependencies
by switching off the vector facility completely for soft-float:
// -msoft-float implies -mno-vx.
if (HasSoftFloat)
HasVector = false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might be right that it's not needed. I don't actually recall why I modified this function in the initial version of the PR.
I'll re-check and remove it if it's unnecessary.
The RISC-V backend still gets this very wrong, hence why I'm not adding a test for it. In its case, the reason appears to be that it doesn't support these attributes at all (yet?). This will require much deeper surgery to fix, which I might do in follow-up patches.
Closes #105978.