-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLVM] Slay undead copysign code #111269
base: main
Are you sure you want to change the base?
[LLVM] Slay undead copysign code #111269
Conversation
@llvm/pr-subscribers-backend-powerpc @llvm/pr-subscribers-backend-webassembly Author: Jubilee (workingjubilee) ChangesFrontends that emit LLVMIR would like to be able to count on float operations that can be encoded as simple bit-operations as lowering to machine-code routines embedded into the binary. This is desired so that they do not affect what one might think of as "freestanding" binaries. GlobalISel explicitly handles this lowering for copysign1. Perhaps it is just my newness to the codebase but I was unable to find a case where SelectionDAG likewise guaranteed that copysign was lowered to machine code. Instead, it seemed to be, at least in theory, prone to lowering to runtime libcalls, as indicated by its entry in llvm/include/llvm/IR/RuntimeLibcalls.def2! In removing this code, it may be necessary to preserve runtime libcalls only for architecture-specific variants of these float operations that may have architecture-specific semantics. However, ordinary IEEE754 floats have nonesuch, and these bitops are then guaranteed to be bitops. As no tests seem to fail after doing so, it seems plausible to me this was actually-dead code that was only in-theory reachable. If that is so, purge it from the codebase, so this revenant may plague us no more. Full diff: https://github.com/llvm/llvm-project/pull/111269.diff 6 Files Affected:
diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.def b/llvm/include/llvm/IR/RuntimeLibcalls.def
index 69cf43140ad4bd..b891dce07d60dd 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.def
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.def
@@ -284,10 +284,7 @@ HANDLE_LIBCALL(FLOOR_F64, "floor")
HANDLE_LIBCALL(FLOOR_F80, "floorl")
HANDLE_LIBCALL(FLOOR_F128, "floorl")
HANDLE_LIBCALL(FLOOR_PPCF128, "floorl")
-HANDLE_LIBCALL(COPYSIGN_F32, "copysignf")
-HANDLE_LIBCALL(COPYSIGN_F64, "copysign")
HANDLE_LIBCALL(COPYSIGN_F80, "copysignl")
-HANDLE_LIBCALL(COPYSIGN_F128, "copysignl")
HANDLE_LIBCALL(COPYSIGN_PPCF128, "copysignl")
HANDLE_LIBCALL(FMIN_F32, "fminf")
HANDLE_LIBCALL(FMIN_F64, "fmin")
diff --git a/llvm/lib/CodeGen/IntrinsicLowering.cpp b/llvm/lib/CodeGen/IntrinsicLowering.cpp
index 256c081b46e262..61d5f75273fa2b 100644
--- a/llvm/lib/CodeGen/IntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/IntrinsicLowering.cpp
@@ -58,7 +58,9 @@ static Value *LowerBSWAP(LLVMContext &Context, Value *V, Instruction *IP) {
IRBuilder<> Builder(IP);
switch(BitSize) {
- default: llvm_unreachable("Unhandled type size of value to byteswap!");
+ default:
+ assert("Unhandled type size of value to byteswap!");
+ break;
case 16: {
Value *Tmp1 = Builder.CreateShl(V, ConstantInt::get(V->getType(), 8),
"bswap.2");
@@ -203,7 +205,9 @@ static void ReplaceFPIntrinsicWithCall(CallInst *CI, const char *Fname,
const char *Dname,
const char *LDname) {
switch (CI->getArgOperand(0)->getType()->getTypeID()) {
- default: llvm_unreachable("Invalid type in intrinsic");
+ default:
+ assert("Invalid type in intrinsic");
+ break;
case Type::FloatTyID:
ReplaceCallWith(Fname, CI, CI->arg_begin(), CI->arg_end(),
Type::getFloatTy(CI->getContext()));
@@ -438,7 +442,15 @@ void IntrinsicLowering::LowerIntrinsicCall(CallInst *CI) {
break;
}
case Intrinsic::copysign: {
- ReplaceFPIntrinsicWithCall(CI, "copysignf", "copysign", "copysignl");
+ switch (CI->getArgOperand(0)->getType()->getTypeID()) {
+ default:
+ assert("only need a copysign libcall for arch-specific floats");
+ break;
+ case Type::X86_FP80TyID:
+ case Type::PPC_FP128TyID:
+ ReplaceCallWith("copysignl", CI, CI->arg_begin(), CI->arg_end(),
+ Type::getFloatTy(CI->getContext()));
+ }
break;
}
case Intrinsic::get_rounding:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
index 2c81c829e75cbb..0e4d364f69f216 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
@@ -1649,12 +1649,11 @@ void DAGTypeLegalizer::ExpandFloatRes_FCEIL(SDNode *N,
void DAGTypeLegalizer::ExpandFloatRes_FCOPYSIGN(SDNode *N,
SDValue &Lo, SDValue &Hi) {
- ExpandFloatRes_Binary(N, GetFPLibCall(N->getValueType(0),
- RTLIB::COPYSIGN_F32,
- RTLIB::COPYSIGN_F64,
- RTLIB::COPYSIGN_F80,
- RTLIB::COPYSIGN_F128,
- RTLIB::COPYSIGN_PPCF128), Lo, Hi);
+
+ auto VT = N->getValueType(0);
+ ExpandFloatRes_Binary(N, (VT == MVT::f80 ? RTLIB::COPYSIGN_F80 :
+ VT == MVT::ppcf128 ? RTLIB::COPYSIGN_PPCF128 :
+ RTLIB::UNKNOWN_LIBCALL), Lo, Hi);
}
void DAGTypeLegalizer::ExpandFloatRes_FCOS(SDNode *N,
diff --git a/llvm/lib/IR/RuntimeLibcalls.cpp b/llvm/lib/IR/RuntimeLibcalls.cpp
index d806f8093459ee..ca78892ec78838 100644
--- a/llvm/lib/IR/RuntimeLibcalls.cpp
+++ b/llvm/lib/IR/RuntimeLibcalls.cpp
@@ -61,7 +61,6 @@ void RuntimeLibcallsInfo::initLibcalls(const Triple &TT) {
setLibcallName(RTLIB::ROUND_F128, "roundf128");
setLibcallName(RTLIB::ROUNDEVEN_F128, "roundevenf128");
setLibcallName(RTLIB::FLOOR_F128, "floorf128");
- setLibcallName(RTLIB::COPYSIGN_F128, "copysignf128");
setLibcallName(RTLIB::FMIN_F128, "fminf128");
setLibcallName(RTLIB::FMAX_F128, "fmaxf128");
setLibcallName(RTLIB::LROUND_F128, "lroundf128");
diff --git a/llvm/lib/Target/SystemZ/ZOSLibcallNames.def b/llvm/lib/Target/SystemZ/ZOSLibcallNames.def
index 12a01522a7e643..a53c9618696fcc 100644
--- a/llvm/lib/Target/SystemZ/ZOSLibcallNames.def
+++ b/llvm/lib/Target/SystemZ/ZOSLibcallNames.def
@@ -87,9 +87,6 @@ HANDLE_LIBCALL(EXP2_F128, "@@LXP2@B")
HANDLE_LIBCALL(COS_F64, "@@SCOS@B")
HANDLE_LIBCALL(COS_F32, "@@FCOS@B")
HANDLE_LIBCALL(COS_F128, "@@LCOS@B")
-HANDLE_LIBCALL(COPYSIGN_F64, "@@DCPY@B")
-HANDLE_LIBCALL(COPYSIGN_F32, "@@FCPY@B")
-HANDLE_LIBCALL(COPYSIGN_F128, "@@LCPY@B")
HANDLE_LIBCALL(CEIL_F64, "@@SCEL@B")
HANDLE_LIBCALL(CEIL_F32, "@@FCEL@B")
HANDLE_LIBCALL(CEIL_F128, "@@LCEL@B")
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
index ba3ab5164af267..13c8a6fe1524fc 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp
@@ -261,9 +261,6 @@ struct RuntimeLibcallSignatureTable {
Table[RTLIB::FLOOR_F32] = f32_func_f32;
Table[RTLIB::FLOOR_F64] = f64_func_f64;
Table[RTLIB::FLOOR_F128] = i64_i64_func_i64_i64;
- Table[RTLIB::COPYSIGN_F32] = f32_func_f32_f32;
- Table[RTLIB::COPYSIGN_F64] = f64_func_f64_f64;
- Table[RTLIB::COPYSIGN_F128] = i64_i64_func_i64_i64_i64_i64;
Table[RTLIB::FMIN_F32] = f32_func_f32_f32;
Table[RTLIB::FMIN_F64] = f64_func_f64_f64;
Table[RTLIB::FMIN_F128] = i64_i64_func_i64_i64_i64_i64;
Footnotes |
✅ With the latest revision this PR passed the C/C++ code formatter. |
d971a20
to
fd73b72
Compare
Regardless of the usage, I think these should still be recorded in RuntimeLibcalls. Ideally it would be a complete database of all calls for the system, not just used |
Of all calls for... what? All function calls that are hypothetically possible at runtime? Used for what purpose? Based on what conditions? What are we assuming is linked and therefore imposing on frontends that they must guarantee is linked? Can we have some way of marking the ones that aren't actually going to, and should not, get spontaneously lowered to? |
0699d47
to
0606b62
Compare
It would be good to know which functions LLVM might actually use so that frontends have a chance of ensuring they are all provided. And it would be good for that list to be as short as possible so that frontends don't spend effort trying to provide some functions that LLVM will not actually use. |
If a true database of all function calls that might be made at runtime is desired, @arsenm, and not "all calls that LLVM might generate", then the .def file in question is hopelessly short. You have approximately the entirety of libc (minus some of math.h), Win32, and POSIX APIs to include. I don't think setting this list back by even a few more functions than this would be appreciable. It doesn't even have:
...or all their differently-sized variants. |
@topperc This is using Also, good news: I seem to have found the spot where SelectionDAG expands copysign that catches this before we ever hit the libcall legalization: llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp Lines 1646 to 1707 in 75103aa
|
And as I suspected, trying to simplify the code further hits the PowerPC float case on some of the regression tests for that target, so this seems to be, unfortunately, exactly the correct scope. |
I noticed this while following #111269. It makes little sense that FCOPYSIGN would look at the sign of `x`, right? Surely this must be `y`. Also fix the inconsistency where it's sometimes `x` and sometimes `X`.
The current TargetLibraryInfo + RuntimeLibcalls is an unworkable mess for doing LTO where the libc/libm/compiler-rt-esque builtins are all getting compiled in the same module. We have to have some way of knowing what function names might need to be retained for usage elsewhere in the compilation. e.g. see 615b7ee, which is trying to make use of this list in LTO
Yes, this is a huge mess which needs work. |
@arsenm I see. If you can coach me on some of the finer points of making a |
0606b62
to
444c9f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of getting rid of target libcalls, linking in runtime libraries in the same TU is still an unsolved problem. Right now the first step was to only keep symbols alive if the backend actually emits them. Next step would probably have the IR transform LLVM IR intrinsics to libcalls during IR linking and then mark them as nobuiltin
so they don't get transformed again.
Seems weird that this doesn't handle the x86 / ppc cases. They're weird but it should just be copying a bit I would think.
I believe the PowerPC target code has setOperationAction for ppc_f128 in a way that somehow still finds this codepath. As it is logically the only user of this codepath, it seemed harmless to just leave it. If you want, I can go muck about inside llvm/lib/Target/PowerPC/PPCISelLowering.cpp so that the regression tests pass and we can completely remove this, but my initial inclination was to not bother because the Rust frontend doesn't use these types at all and I will be hacking pretty wildly at that point. I believe the x87 case might actually be handled... "correctly". At least in the sense that the code produced by LLVM should be no more incorrect than it was before, for this target. |
It actually hits the default expand handling. ppc_fp128 surprisingly isn't a legal type:
and it does result in the call:
|
lmao PowerPC doesn't treat its own float as a legal type |
I checked and the x87 float seems fine if this codepath is pruned. No visible regressions in LLVM's test suite. I mean, clang-format whines about my formatting again, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is fine, although unfortunate to leave ppc_fp128 behind.
02b1a52
to
f68c342
Compare
...is getOperand actually getOperands? |
No. What makes you think that? |
SDLoc DL = SDLoc(N); | ||
SDValue Tmp = SDValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDLoc DL = SDLoc(N); | |
SDValue Tmp = SDValue(); | |
SDLoc DL(N); | |
SDValue Tmp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LLVM style guide specifically said I should "Use auto Type Deduction to Make Code More Readable". So I did. Then I was advised to not. So I accepted that.
But I find these changes make the code significantly harder to read.
Are you aware of another programming language that implements this... most vexing constructor form?
// a double-double is Hi + Lo, so if Hi flips sign, so must Lo | ||
Lo = DAG.getSelectCC(DL, Tmp, Hi, Lo, | ||
DAG.getNode(ISD::FNEG, DL, Lo.getValueType(), Lo), | ||
ISD::SETEQ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using SETEQ for FP types is weird, since it's the case that may be oeq or ueq. Can this break the handling of nan sign bits?
Can this just be emitted as another copysign?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of a rule that says that the two floats have to both be the same sign, which would be required for a second copysign
to make sense. The only one that I am aware of that is relevant is that Hi
has to be greater. That's why it has to be a flip of the existing sign of Lo
. We are not trying to apply the logic of copysign
to the individual parts, but to the "summed float".
For a given float it might go something like this (I suspect most real cases would not have these be so close in orders of magnitude but...):
dbldbl = { Hi = 27.0, Lo = -4.0 };
assert(dbldbl == 23.0);
dbldbl = copysign(dbldbl, -1.0);
assert(dbldbl.Hi == -27.0);
assert(dbldbl.Lo == 4.0);
assert(dbldbl == -23.0);
So no, it cannot be "another copysign", I think, as that would leave us with this possibility:
dbldbl = copysign(dbldbl, -1.0);
assert(dbldbl.Hi == -27.0);
assert(dbldbl.Lo == -4.0);
assert(dbldbl == -27.0);
This pattern is also used in ExpandFloatRes_FABS
, here:
llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
Lines 1557 to 1560 in 4a0caf6
// Lo = Hi==fabs(Hi) ? Lo : -Lo; | |
Lo = DAG.getSelectCC(dl, Tmp, Hi, Lo, | |
DAG.getNode(ISD::FNEG, dl, Lo.getValueType(), Lo), | |
ISD::SETEQ); |
We should only ever, when copying the sign from this, take the value from Hi
, per here, since it is the sign that matters:
llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
Lines 2205 to 2208 in 367c3c9
// The ppcf128 value is providing only the sign; take it from the | |
// higher-order double (which must have the larger magnitude). | |
return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), | |
N->getValueType(0), N->getOperand(0), Hi); |
Thus it should not break NaN float sign handling, as the sign of Lo
should be of no consequence to a NaN ppc_fp128
, at least "externally" to the float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copysign needs to respect the sign bit of incoming nan values. This logic will flip the sign bit for Lo for a nan Hi. I don't really understand how ppcf128 is supposed to work for nans, but I would expect this logic to be inverted (e.g. Hi UEQ fabs(hi) ? Lo : neg(Lo)
)
I guess it's ok for fabs, or fabs is wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Because the rule is Hi + Lo == "the real float", if one is NaN then logically they both are, aren't they?
Though, to be clear:
- If the double-double is NaN or Infinity, the Hi value must encode NaN or Infinity1.
- If the double-double is Infinity, then Lo must not be NaN.
- Otherwise, the value of Lo is not significant.
Footnotes
-
This is basically an extension of the "Hi must be the greater magnitude" rule. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, the answer of "how does ppc_fp128
work for this" is "mmmostly by not really caring too hard, honestly".
Which doesn't mean I am against using a different encoding of the logic if it is preferred, it just can't be two copysigns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike a true floating point operation, the "not significant" bits of a nan shouldn't change. It would be better if we weren't flipping the bit of the insignificant half here (or for any of fneg/fabs/fcopysign)
Suggest writing this in plainer English, e.g. "Remove unused copysign code". |
4a0caf6
to
32d117c
Compare
32d117c
to
11a7cf8
Compare
@@ -1,28 +0,0 @@ | |||
; RUN: llc -verify-machineinstrs < %s -mcpu=ppc | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lost this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to 1cf48ab
Don't form PPC CTR-based loops around a copysignl call
copysign/copysignf never become function calls (because the SDAG expansion code
does not lower to the corresponding function call, but rather directly
implements the associated logic), but copysignl almost always is lowered into a
call to the requested libm functon (and, thus, might clobber CTR).
So it seemed like it was now irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inserted a test with the baseline at start for this so you can examine the diff for it, if you want.
11a7cf8
to
8c2297d
Compare
; CHECK-NEXT: lwz 4, 88(1) | ||
; CHECK-NEXT: stw 4, 104(1) | ||
; CHECK-NEXT: lfd 0, 104(1) | ||
; CHECK-NEXT: mtctr 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I said "the test seems irrelevant", we now emit an MTCTR instruction here, but the copysignl call we were guarding against is nowhere to be found, sooo
f4e8a10
to
65e3cc5
Compare
…alls This makes real what is already true: Copysign does not ever need to lower to runtime libcalls! Its operation should be possible to always implement via bitops.
This reduces the burden on frontends that wish to support float ops without needing a C compiler to build LLVM's compiler-rt for that target, e.g. so that they can be a fully self-contained toolchain for bare-metal. All other floats are expanded for all current architectures just fine. PowerPC, however, does not efficiently legalize its very own float.
65e3cc5
to
74adfd2
Compare
…ysignl This allows ripping out the remaining copysignl infra.
74adfd2
to
bd017cc
Compare
Frontends that emit LLVMIR would like to be able to count on float operations that can be encoded as simple bit-operations as lowering to machine-code routines embedded into the binary. This is desired so that they do not affect what one might think of as "freestanding" binaries. SelectionDAG1 and GlobalISel2 both explicitly handle this lowering for copysign.
However, later code implies that it is, at least in theory, possible to lower to runtime libcalls, as indicated by its entry in llvm/include/llvm/IR/RuntimeLibcalls.def3! Yet this code is largely impotent. We seem to need to preserve runtime libcalls for architecture-specific variants of these float operations that have architecture-specific semantics. However, ordinary IEEE754 floats have nonesuch: these are bitops.
So, purge this dead code, that it may plague us no more with its revenant half-life.
Footnotes
https://github.com/llvm/llvm-project/blob/46944b0cbc9a9d8daad0182c40fcd3560bc9ca35/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp#L7772-L7811 ↩
https://github.com/llvm/llvm-project/blob/75103aae4a9d22e2c46068f2160f2dddd6ad2116/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp#L1646-L1707 ↩
https://github.com/llvm/llvm-project/blob/46944b0cbc9a9d8daad0182c40fcd3560bc9ca35/llvm/include/llvm/IR/RuntimeLibcalls.def#L287-L291 ↩