Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM] Slay undead copysign code #111269

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

workingjubilee
Copy link
Contributor

@workingjubilee workingjubilee commented Oct 6, 2024

Frontends that emit LLVMIR would like to be able to count on float operations that can be encoded as simple bit-operations as lowering to machine-code routines embedded into the binary. This is desired so that they do not affect what one might think of as "freestanding" binaries. SelectionDAG1 and GlobalISel2 both explicitly handle this lowering for copysign.

However, later code implies that it is, at least in theory, possible to lower to runtime libcalls, as indicated by its entry in llvm/include/llvm/IR/RuntimeLibcalls.def3! Yet this code is largely impotent. We seem to need to preserve runtime libcalls for architecture-specific variants of these float operations that have architecture-specific semantics. However, ordinary IEEE754 floats have nonesuch: these are bitops.

So, purge this dead code, that it may plague us no more with its revenant half-life.

Footnotes

  1. https://github.com/llvm/llvm-project/blob/46944b0cbc9a9d8daad0182c40fcd3560bc9ca35/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp#L7772-L7811

  2. https://github.com/llvm/llvm-project/blob/75103aae4a9d22e2c46068f2160f2dddd6ad2116/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp#L1646-L1707

  3. https://github.com/llvm/llvm-project/blob/46944b0cbc9a9d8daad0182c40fcd3560bc9ca35/llvm/include/llvm/IR/RuntimeLibcalls.def#L287-L291

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 6, 2024

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-webassembly

Author: Jubilee (workingjubilee)

Changes

Frontends that emit LLVMIR would like to be able to count on float operations that can be encoded as simple bit-operations as lowering to machine-code routines embedded into the binary. This is desired so that they do not affect what one might think of as "freestanding" binaries. GlobalISel explicitly handles this lowering for copysign1.

Perhaps it is just my newness to the codebase but I was unable to find a case where SelectionDAG likewise guaranteed that copysign was lowered to machine code. Instead, it seemed to be, at least in theory, prone to lowering to runtime libcalls, as indicated by its entry in llvm/include/llvm/IR/RuntimeLibcalls.def2!

In removing this code, it may be necessary to preserve runtime libcalls only for architecture-specific variants of these float operations that may have architecture-specific semantics. However, ordinary IEEE754 floats have nonesuch, and these bitops are then guaranteed to be bitops.

As no tests seem to fail after doing so, it seems plausible to me this was actually-dead code that was only in-theory reachable. If that is so, purge it from the codebase, so this revenant may plague us no more.


Full diff: https://github.com/llvm/llvm-project/pull/111269.diff

6 Files Affected:

  • (modified) llvm/include/llvm/IR/RuntimeLibcalls.def (-3)
  • (modified) llvm/lib/CodeGen/IntrinsicLowering.cpp (+15-3)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp (+5-6)
  • (modified) llvm/lib/IR/RuntimeLibcalls.cpp (-1)
  • (modified) llvm/lib/Target/SystemZ/ZOSLibcallNames.def (-3)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp (-3)
diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.def b/llvm/include/llvm/IR/RuntimeLibcalls.def
index 69cf43140ad4bd..b891dce07d60dd 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.def
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.def
@@ -284,10 +284,7 @@ HANDLE_LIBCALL(FLOOR_F64, "floor")
 HANDLE_LIBCALL(FLOOR_F80, "floorl")
 HANDLE_LIBCALL(FLOOR_F128, "floorl")
 HANDLE_LIBCALL(FLOOR_PPCF128, "floorl")
-HANDLE_LIBCALL(COPYSIGN_F32, "copysignf")
-HANDLE_LIBCALL(COPYSIGN_F64, "copysign")
 HANDLE_LIBCALL(COPYSIGN_F80, "copysignl")
-HANDLE_LIBCALL(COPYSIGN_F128, "copysignl")
 HANDLE_LIBCALL(COPYSIGN_PPCF128, "copysignl")
 HANDLE_LIBCALL(FMIN_F32, "fminf")
 HANDLE_LIBCALL(FMIN_F64, "fmin")
diff --git a/llvm/lib/CodeGen/IntrinsicLowering.cpp b/llvm/lib/CodeGen/IntrinsicLowering.cpp
index 256c081b46e262..61d5f75273fa2b 100644
--- a/llvm/lib/CodeGen/IntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/IntrinsicLowering.cpp
@@ -58,7 +58,9 @@ static Value *LowerBSWAP(LLVMContext &Context, Value *V, Instruction *IP) {
   IRBuilder<> Builder(IP);
 
   switch(BitSize) {
-  default: llvm_unreachable("Unhandled type size of value to byteswap!");
+  default:
+    assert("Unhandled type size of value to byteswap!");
+    break;
   case 16: {
     Value *Tmp1 = Builder.CreateShl(V, ConstantInt::get(V->getType(), 8),
                                     "bswap.2");
@@ -203,7 +205,9 @@ static void ReplaceFPIntrinsicWithCall(CallInst *CI, const char *Fname,
                                        const char *Dname,
                                        const char *LDname) {
   switch (CI->getArgOperand(0)->getType()->getTypeID()) {
-  default: llvm_unreachable("Invalid type in intrinsic");
+  default:
+    assert("Invalid type in intrinsic");
+    break;
   case Type::FloatTyID:
     ReplaceCallWith(Fname, CI, CI->arg_begin(), CI->arg_end(),
                     Type::getFloatTy(CI->getContext()));
@@ -438,7 +442,15 @@ void IntrinsicLowering::LowerIntrinsicCall(CallInst *CI) {
     break;
   }
   case Intrinsic::copysign: {
-    ReplaceFPIntrinsicWithCall(CI, "copysignf", "copysign", "copysignl");
+    switch (CI->getArgOperand(0)->getType()->getTypeID()) {
+      default:
+        assert("only need a copysign libcall for arch-specific floats");
+        break;
+      case Type::X86_FP80TyID:
+      case Type::PPC_FP128TyID:
+        ReplaceCallWith("copysignl", CI, CI->arg_begin(), CI->arg_end(),
+                        Type::getFloatTy(CI->getContext()));
+    }
     break;
   }
   case Intrinsic::get_rounding:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
index 2c81c829e75cbb..0e4d364f69f216 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
@@ -1649,12 +1649,11 @@ void DAGTypeLegalizer::ExpandFloatRes_FCEIL(SDNode *N,
 
 void DAGTypeLegalizer::ExpandFloatRes_FCOPYSIGN(SDNode *N,
                                                 SDValue &Lo, SDValue &Hi) {
-  ExpandFloatRes_Binary(N, GetFPLibCall(N->getValueType(0),
-                                        RTLIB::COPYSIGN_F32,
-                                        RTLIB::COPYSIGN_F64,
-                                        RTLIB::COPYSIGN_F80,
-                                        RTLIB::COPYSIGN_F128,
-                                        RTLIB::COPYSIGN_PPCF128), Lo, Hi);
+
+  auto VT = N->getValueType(0);
+  ExpandFloatRes_Binary(N, (VT == MVT::f80 ? RTLIB::COPYSIGN_F80 :
+                            VT == MVT::ppcf128 ? RTLIB::COPYSIGN_PPCF128 :
+                            RTLIB::UNKNOWN_LIBCALL), Lo, Hi);
 }
 
 void DAGTypeLegalizer::ExpandFloatRes_FCOS(SDNode *N,
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index d806f8093459ee..ca78892ec78838 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -61,7 +61,6 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
     setLibcallName(RTLIB::ROUND_F128, "roundf128");
     setLibcallName(RTLIB::ROUNDEVEN_F128, "roundevenf128");
     setLibcallName(RTLIB::FLOOR_F128, "floorf128");
-    setLibcallName(RTLIB::COPYSIGN_F128, "copysignf128");
     setLibcallName(RTLIB::FMIN_F128, "fminf128");
     setLibcallName(RTLIB::FMAX_F128, "fmaxf128");
     setLibcallName(RTLIB::LROUND_F128, "lroundf128");
diff --git a/llvm/lib/Target/SystemZ/ZOSLibcallNames.def b/llvm/lib/Target/SystemZ/ZOSLibcallNames.def
index 12a01522a7e643..a53c9618696fcc 100644
--- a/llvm/lib/Target/SystemZ/ZOSLibcallNames.def
+++ b/llvm/lib/Target/SystemZ/ZOSLibcallNames.def
@@ -87,9 +87,6 @@ HANDLE_LIBCALL(EXP2_F128, "@@LXP2@B")
 HANDLE_LIBCALL(COS_F64, "@@SCOS@B")
 HANDLE_LIBCALL(COS_F32, "@@FCOS@B")
 HANDLE_LIBCALL(COS_F128, "@@LCOS@B")
-HANDLE_LIBCALL(COPYSIGN_F64, "@@DCPY@B")
-HANDLE_LIBCALL(COPYSIGN_F32, "@@FCPY@B")
-HANDLE_LIBCALL(COPYSIGN_F128, "@@LCPY@B")
 HANDLE_LIBCALL(CEIL_F64, "@@SCEL@B")
 HANDLE_LIBCALL(CEIL_F32, "@@FCEL@B")
 HANDLE_LIBCALL(CEIL_F128, "@@LCEL@B")
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
index ba3ab5164af267..13c8a6fe1524fc 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
@@ -261,9 +261,6 @@ struct RuntimeLibcallSignatureTable {
     Table[RTLIB::FLOOR_F32] = f32_func_f32;
     Table[RTLIB::FLOOR_F64] = f64_func_f64;
     Table[RTLIB::FLOOR_F128] = i64_i64_func_i64_i64;
-    Table[RTLIB::COPYSIGN_F32] = f32_func_f32_f32;
-    Table[RTLIB::COPYSIGN_F64] = f64_func_f64_f64;
-    Table[RTLIB::COPYSIGN_F128] = i64_i64_func_i64_i64_i64_i64;
     Table[RTLIB::FMIN_F32] = f32_func_f32_f32;
     Table[RTLIB::FMIN_F64] = f64_func_f64_f64;
     Table[RTLIB::FMIN_F128] = i64_i64_func_i64_i64_i64_i64;

Footnotes

  1. https://github.com/llvm/llvm-project/blob/46944b0cbc9a9d8daad0182c40fcd3560bc9ca35/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp#L7772-L7811

  2. https://github.com/llvm/llvm-project/blob/46944b0cbc9a9d8daad0182c40fcd3560bc9ca35/llvm/include/llvm/IR/RuntimeLibcalls.def#L287-L291

Copy link

github-actions bot commented Oct 6, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/CodeGen/IntrinsicLowering.cpp Outdated Show resolved Hide resolved
llvm/lib/CodeGen/IntrinsicLowering.cpp Outdated Show resolved Hide resolved
llvm/lib/CodeGen/IntrinsicLowering.cpp Outdated Show resolved Hide resolved
@arsenm
Copy link
Contributor

arsenm commented Oct 6, 2024

Regardless of the usage, I think these should still be recorded in RuntimeLibcalls. Ideally it would be a complete database of all calls for the system, not just used

@workingjubilee
Copy link
Contributor Author

Regardless of the usage, I think these should still be recorded in RuntimeLibcalls. Ideally it would be a complete database of all calls for the system, not just used

Of all calls for... what? All function calls that are hypothetically possible at runtime? Used for what purpose? Based on what conditions? What are we assuming is linked and therefore imposing on frontends that they must guarantee is linked?

Can we have some way of marking the ones that aren't actually going to, and should not, get spontaneously lowered to?

@workingjubilee workingjubilee force-pushed the remove-dead-copysign-code branch 3 times, most recently from 0699d47 to 0606b62 Compare October 6, 2024 06:37
@RalfJung
Copy link
Contributor

RalfJung commented Oct 6, 2024

Regardless of the usage, I think these should still be recorded in RuntimeLibcalls. Ideally it would be a complete database of all calls for the system, not just used

It would be good to know which functions LLVM might actually use so that frontends have a chance of ensuring they are all provided. And it would be good for that list to be as short as possible so that frontends don't spend effort trying to provide some functions that LLVM will not actually use.

@workingjubilee
Copy link
Contributor Author

If a true database of all function calls that might be made at runtime is desired, @arsenm, and not "all calls that LLVM might generate", then the .def file in question is hopelessly short. You have approximately the entirety of libc (minus some of math.h), Win32, and POSIX APIs to include. I don't think setting this list back by even a few more functions than this would be appreciable.

It doesn't even have:

  • abs
  • fdim
  • nan
  • remainder
  • remquo
  • hypot
  • atan2
  • acosh
  • asinh
  • atanh
  • erf
  • erfc
  • lgamma
  • tgamma
  • modf (not to be confused with fmodf)
  • scalbn
  • log1p
  • logb
  • ilogb
  • expm1
  • nextafter
  • nexttoward

...or all their differently-sized variants.

@workingjubilee
Copy link
Contributor Author

@topperc This is using report_fatal_error now.

Also, good news: I seem to have found the spot where SelectionDAG expands copysign that catches this before we ever hit the libcall legalization:

SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
SDLoc DL(Node);
SDValue Mag = Node->getOperand(0);
SDValue Sign = Node->getOperand(1);
// Get sign bit into an integer value.
FloatSignAsInt SignAsInt;
getSignAsIntValue(SignAsInt, DL, Sign);
EVT IntVT = SignAsInt.IntValue.getValueType();
SDValue SignMask = DAG.getConstant(SignAsInt.SignMask, DL, IntVT);
SDValue SignBit = DAG.getNode(ISD::AND, DL, IntVT, SignAsInt.IntValue,
SignMask);
// If FABS is legal transform FCOPYSIGN(x, y) => sign(x) ? -FABS(x) : FABS(X)
EVT FloatVT = Mag.getValueType();
if (TLI.isOperationLegalOrCustom(ISD::FABS, FloatVT) &&
TLI.isOperationLegalOrCustom(ISD::FNEG, FloatVT)) {
SDValue AbsValue = DAG.getNode(ISD::FABS, DL, FloatVT, Mag);
SDValue NegValue = DAG.getNode(ISD::FNEG, DL, FloatVT, AbsValue);
SDValue Cond = DAG.getSetCC(DL, getSetCCResultType(IntVT), SignBit,
DAG.getConstant(0, DL, IntVT), ISD::SETNE);
return DAG.getSelect(DL, FloatVT, Cond, NegValue, AbsValue);
}
// Transform Mag value to integer, and clear the sign bit.
FloatSignAsInt MagAsInt;
getSignAsIntValue(MagAsInt, DL, Mag);
EVT MagVT = MagAsInt.IntValue.getValueType();
SDValue ClearSignMask = DAG.getConstant(~MagAsInt.SignMask, DL, MagVT);
SDValue ClearedSign = DAG.getNode(ISD::AND, DL, MagVT, MagAsInt.IntValue,
ClearSignMask);
// Get the signbit at the right position for MagAsInt.
int ShiftAmount = SignAsInt.SignBit - MagAsInt.SignBit;
EVT ShiftVT = IntVT;
if (SignBit.getScalarValueSizeInBits() <
ClearedSign.getScalarValueSizeInBits()) {
SignBit = DAG.getNode(ISD::ZERO_EXTEND, DL, MagVT, SignBit);
ShiftVT = MagVT;
}
if (ShiftAmount > 0) {
SDValue ShiftCnst = DAG.getConstant(ShiftAmount, DL, ShiftVT);
SignBit = DAG.getNode(ISD::SRL, DL, ShiftVT, SignBit, ShiftCnst);
} else if (ShiftAmount < 0) {
SDValue ShiftCnst = DAG.getConstant(-ShiftAmount, DL, ShiftVT);
SignBit = DAG.getNode(ISD::SHL, DL, ShiftVT, SignBit, ShiftCnst);
}
if (SignBit.getScalarValueSizeInBits() >
ClearedSign.getScalarValueSizeInBits()) {
SignBit = DAG.getNode(ISD::TRUNCATE, DL, MagVT, SignBit);
}
SDNodeFlags Flags;
Flags.setDisjoint(true);
// Store the part with the modified sign and convert back to float.
SDValue CopiedSign =
DAG.getNode(ISD::OR, DL, MagVT, ClearedSign, SignBit, Flags);
return modifySignAsInt(MagAsInt, DL, CopiedSign);
}

@workingjubilee
Copy link
Contributor Author

workingjubilee commented Oct 8, 2024

And as I suspected, trying to simplify the code further hits the PowerPC float case on some of the regression tests for that target, so this seems to be, unfortunately, exactly the correct scope.

arsenm pushed a commit that referenced this pull request Oct 8, 2024
I noticed this while following
#111269. It makes little sense
that FCOPYSIGN would look at the sign of `x`, right? Surely this must be
`y`. Also fix the inconsistency where it's sometimes `x` and sometimes
`X`.
@arsenm
Copy link
Contributor

arsenm commented Oct 10, 2024

Regardless of the usage, I think these should still be recorded in RuntimeLibcalls. Ideally it would be a complete database of all calls for the system, not just used

Of all calls for... what? All function calls that are hypothetically possible at runtime? Used for what purpose? Based on what conditions? What are we assuming is linked and therefore imposing on frontends that they must guarantee is linked?

The current TargetLibraryInfo + RuntimeLibcalls is an unworkable mess for doing LTO where the libc/libm/compiler-rt-esque builtins are all getting compiled in the same module. We have to have some way of knowing what function names might need to be retained for usage elsewhere in the compilation. e.g. see 615b7ee, which is trying to make use of this list in LTO

If a true database of all function calls that might be made at runtime is desired, @arsenm, and not "all calls that LLVM might generate", then the .def file in question is hopelessly short

Yes, this is a huge mess which needs work.

@workingjubilee
Copy link
Contributor Author

@arsenm I see. If you can coach me on some of the finer points of making a .td (I have the basic gist, but there will be some questions), I'd be happy to assemble something such that the data covered in those two files is in a tablegen, in a followup. Then I will be able to make sure that it or another file is supplied with the necessary information that frontends... frontends that aren't clang, anyways... need. And then it will cover libm properly, instead of the current state of affairs.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of getting rid of target libcalls, linking in runtime libraries in the same TU is still an unsolved problem. Right now the first step was to only keep symbols alive if the backend actually emits them. Next step would probably have the IR transform LLVM IR intrinsics to libcalls during IR linking and then mark them as nobuiltin so they don't get transformed again.

Seems weird that this doesn't handle the x86 / ppc cases. They're weird but it should just be copying a bit I would think.

@workingjubilee
Copy link
Contributor Author

I believe the PowerPC target code has setOperationAction for ppc_f128 in a way that somehow still finds this codepath. As it is logically the only user of this codepath, it seemed harmless to just leave it. If you want, I can go muck about inside llvm/lib/Target/PowerPC/PPCISelLowering.cpp so that the regression tests pass and we can completely remove this, but my initial inclination was to not bother because the Rust frontend doesn't use these types at all and I will be hacking pretty wildly at that point.

I believe the x87 case might actually be handled... "correctly". At least in the sense that the code produced by LLVM should be no more incorrect than it was before, for this target.

@arsenm
Copy link
Contributor

arsenm commented Oct 11, 2024

I believe the PowerPC target code has setOperationAction for ppc_f128 in a way that somehow still finds this codepath.

It actually hits the default expand handling. ppc_fp128 surprisingly isn't a legal type:

Legalizing node: t20: ppcf128 = fcopysign t5, t4
Analyzing result type: ppcf128
Expand float result: t20: ppcf128 = fcopysign t5, t4
Creating new node: t23: i128 = bitcast t5
Creating new node: t24: i64 = extract_element t23, Constant:i64<1>
Creating new node: t25: i64 = extract_element t23, Constant:i64<0>
Creating new node: t26: f64 = bitcast t25
Creating new node: t27: f64 = bitcast t24

and it does result in the call:

	lfs 2, .LCPI0_0@toc@l(3)
	bl copysignl

@workingjubilee
Copy link
Contributor Author

lmao PowerPC doesn't treat its own float as a legal type

@workingjubilee
Copy link
Contributor Author

I checked and the x87 float seems fine if this codepath is pruned. No visible regressions in LLVM's test suite. I mean, clang-format whines about my formatting again, but

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is fine, although unfortunate to leave ppc_fp128 behind.

@workingjubilee workingjubilee force-pushed the remove-dead-copysign-code branch 2 times, most recently from 02b1a52 to f68c342 Compare October 11, 2024 22:45
@workingjubilee
Copy link
Contributor Author

...is getOperand actually getOperands?

@topperc
Copy link
Collaborator

topperc commented Oct 12, 2024

...is getOperand actually getOperands?

No. What makes you think that?

@workingjubilee
Copy link
Contributor Author

@topperc Momentary delirium, it seems?

@arsenm re: your remarks about not leaving the PowerPC libcall behind, I figured out how to remove the code in question, after some wrangling, by doing the conversion in question. See the recently pushed commits.

Comment on lines +1655 to +1716
SDLoc DL = SDLoc(N);
SDValue Tmp = SDValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SDLoc DL = SDLoc(N);
SDValue Tmp = SDValue();
SDLoc DL(N);
SDValue Tmp;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLVM style guide specifically said I should "Use auto Type Deduction to Make Code More Readable". So I did. Then I was advised to not. So I accepted that.

But I find these changes make the code significantly harder to read.

Are you aware of another programming language that implements this... most vexing constructor form?

llvm/test/CodeGen/PowerPC/copysignl.ll Outdated Show resolved Hide resolved
llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp Outdated Show resolved Hide resolved
// a double-double is Hi + Lo, so if Hi flips sign, so must Lo
Lo = DAG.getSelectCC(DL, Tmp, Hi, Lo,
DAG.getNode(ISD::FNEG, DL, Lo.getValueType(), Lo),
ISD::SETEQ);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using SETEQ for FP types is weird, since it's the case that may be oeq or ueq. Can this break the handling of nan sign bits?

Can this just be emitted as another copysign?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of a rule that says that the two floats have to both be the same sign, which would be required for a second copysign to make sense. The only one that I am aware of that is relevant is that Hi has to be greater. That's why it has to be a flip of the existing sign of Lo. We are not trying to apply the logic of copysign to the individual parts, but to the "summed float".

For a given float it might go something like this (I suspect most real cases would not have these be so close in orders of magnitude but...):

dbldbl = { Hi = 27.0, Lo = -4.0 };
assert(dbldbl == 23.0);
dbldbl = copysign(dbldbl, -1.0);
assert(dbldbl.Hi == -27.0);
assert(dbldbl.Lo == 4.0);
assert(dbldbl == -23.0);

So no, it cannot be "another copysign", I think, as that would leave us with this possibility:

dbldbl = copysign(dbldbl, -1.0);
assert(dbldbl.Hi == -27.0);
assert(dbldbl.Lo == -4.0);
assert(dbldbl == -27.0);

This pattern is also used in ExpandFloatRes_FABS, here:

// Lo = Hi==fabs(Hi) ? Lo : -Lo;
Lo = DAG.getSelectCC(dl, Tmp, Hi, Lo,
DAG.getNode(ISD::FNEG, dl, Lo.getValueType(), Lo),
ISD::SETEQ);

We should only ever, when copying the sign from this, take the value from Hi, per here, since it is the sign that matters:

// The ppcf128 value is providing only the sign; take it from the
// higher-order double (which must have the larger magnitude).
return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N),
N->getValueType(0), N->getOperand(0), Hi);

Thus it should not break NaN float sign handling, as the sign of Lo should be of no consequence to a NaN ppc_fp128, at least "externally" to the float.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copysign needs to respect the sign bit of incoming nan values. This logic will flip the sign bit for Lo for a nan Hi. I don't really understand how ppcf128 is supposed to work for nans, but I would expect this logic to be inverted (e.g. Hi UEQ fabs(hi) ? Lo : neg(Lo))

I guess it's ok for fabs, or fabs is wrong?

Copy link
Contributor Author

@workingjubilee workingjubilee Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Because the rule is Hi + Lo == "the real float", if one is NaN then logically they both are, aren't they?

Though, to be clear:

  • If the double-double is NaN or Infinity, the Hi value must encode NaN or Infinity1.
  • If the double-double is Infinity, then Lo must not be NaN.
  • Otherwise, the value of Lo is not significant.

Footnotes

  1. This is basically an extension of the "Hi must be the greater magnitude" rule.

Copy link
Contributor Author

@workingjubilee workingjubilee Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, the answer of "how does ppc_fp128 work for this" is "mmmostly by not really caring too hard, honestly".

Which doesn't mean I am against using a different encoding of the logic if it is preferred, it just can't be two copysigns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike a true floating point operation, the "not significant" bits of a nan shouldn't change. It would be better if we weren't flipping the bit of the insignificant half here (or for any of fneg/fabs/fcopysign)

@jayfoad
Copy link
Contributor

jayfoad commented Oct 14, 2024

Slay undead copysign code

Suggest writing this in plainer English, e.g. "Remove unused copysign code".

@@ -1,28 +0,0 @@
; RUN: llc -verify-machineinstrs < %s -mcpu=ppc | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to 1cf48ab

Don't form PPC CTR-based loops around a copysignl call
copysign/copysignf never become function calls (because the SDAG expansion code
does not lower to the corresponding function call, but rather directly
implements the associated logic), but copysignl almost always is lowered into a
call to the requested libm functon (and, thus, might clobber CTR).

So it seemed like it was now irrelevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserted a test with the baseline at start for this so you can examine the diff for it, if you want.

; CHECK-NEXT: lwz 4, 88(1)
; CHECK-NEXT: stw 4, 104(1)
; CHECK-NEXT: lfd 0, 104(1)
; CHECK-NEXT: mtctr 3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I said "the test seems irrelevant", we now emit an MTCTR instruction here, but the copysignl call we were guarding against is nowhere to be found, sooo

…alls

This makes real what is already true:
Copysign does not ever need to lower to runtime libcalls!
Its operation should be possible to always implement via bitops.
This reduces the burden on frontends that wish to support float ops
without needing a C compiler to build LLVM's compiler-rt for that target,
e.g. so that they can be a fully self-contained toolchain for bare-metal.

All other floats are expanded for all current architectures just fine.
PowerPC, however, does not efficiently legalize its very own float.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants