-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[DAGCombiner] Extend FP-to-Int cast without requiring nsz #161093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-selectiondag Author: Yi-Chi Lee (yichi170) ChangesThis patch updates the FP-to-Int conversion handling:
This removes the previous dependence on I've tested the code generation of -mtriple=amdgcn. It seems that the assembly code is expected, but I'm not sure how to write a general testcase for every target. Fixes #160623. Full diff: https://github.com/llvm/llvm-project/pull/161093.diff 1 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index a6ba6e518899f..65cea64e0982d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18862,27 +18862,57 @@ SDValue DAGCombiner::visitFPOW(SDNode *N) {
static SDValue foldFPToIntToFP(SDNode *N, const SDLoc &DL, SelectionDAG &DAG,
const TargetLowering &TLI) {
- // We only do this if the target has legal ftrunc. Otherwise, we'd likely be
- // replacing casts with a libcall. We also must be allowed to ignore -0.0
- // because FTRUNC will return -0.0 for (-1.0, -0.0), but using integer
- // conversions would return +0.0.
+ // We can fold the fpto[us]i -> [us]itofp pattern into a single ftrunc.
+ // If NoSignedZerosFPMath is enabled, this is a direct replacement.
+ // Otherwise, for strict math, we must handle edge cases:
+ // 1. For signed conversions, clamp out-of-range values to the valid
+ // integer range before the trunc.
+ // 2. For unsigned conversions, use FABS. A negative float becomes integer 0,
+ // which must convert back to +0.0. FTRUNC on its own could produce -0.0.
+
// FIXME: We should be able to use node-level FMF here.
- // TODO: If strict math, should we use FABS (+ range check for signed cast)?
EVT VT = N->getValueType(0);
- if (!TLI.isOperationLegal(ISD::FTRUNC, VT) ||
- !DAG.getTarget().Options.NoSignedZerosFPMath)
+ if (!TLI.isOperationLegal(ISD::FTRUNC, VT))
return SDValue();
// fptosi/fptoui round towards zero, so converting from FP to integer and
// back is the same as an 'ftrunc': [us]itofp (fpto[us]i X) --> ftrunc X
SDValue N0 = N->getOperand(0);
if (N->getOpcode() == ISD::SINT_TO_FP && N0.getOpcode() == ISD::FP_TO_SINT &&
- N0.getOperand(0).getValueType() == VT)
- return DAG.getNode(ISD::FTRUNC, DL, VT, N0.getOperand(0));
+ N0.getOperand(0).getValueType() == VT) {
+ if (DAG.getTarget().Options.NoSignedZerosFPMath)
+ return DAG.getNode(ISD::FTRUNC, DL, VT, N0.getOperand(0));
+
+ // Strict math: clamp to the signed integer range before truncating.
+ unsigned IntWidth = N0.getValueSizeInBits();
+ APInt APMax = APInt::getSignedMaxValue(IntWidth);
+ APInt APMin = APInt::getSignedMinValue(IntWidth);
+
+ APFloat MaxAPF(VT.getFltSemantics());
+ MaxAPF.convertFromAPInt(APMax, true, APFloat::rmTowardZero);
+ APFloat MinAPF(VT.getFltSemantics());
+ MinAPF.convertFromAPInt(APMin, true, APFloat::rmTowardZero);
+
+ SDValue MaxFP = DAG.getConstantFP(MaxAPF, DL, VT);
+ SDValue MinFP = DAG.getConstantFP(MinAPF, DL, VT);
+
+ SDValue Clamped = DAG.getNode(
+ ISD::FMINNUM, DL, VT,
+ DAG.getNode(ISD::FMAXNUM, DL, VT, N0->getOperand(0), MinFP), MaxFP);
+ return DAG.getNode(ISD::FTRUNC, DL, VT, Clamped);
+ }
if (N->getOpcode() == ISD::UINT_TO_FP && N0.getOpcode() == ISD::FP_TO_UINT &&
- N0.getOperand(0).getValueType() == VT)
- return DAG.getNode(ISD::FTRUNC, DL, VT, N0.getOperand(0));
+ N0.getOperand(0).getValueType() == VT) {
+ if (DAG.getTarget().Options.NoSignedZerosFPMath)
+ return DAG.getNode(ISD::FTRUNC, DL, VT, N0.getOperand(0));
+
+ // Strict math: use FABS to handle negative inputs correctly.
+ if (TLI.isFAbsFree(VT)) {
+ SDValue Abs = DAG.getNode(ISD::FABS, DL, VT, N0.getOperand(0));
+ return DAG.getNode(ISD::FTRUNC, DL, VT, Abs);
+ }
+ }
return SDValue();
}
|
if (DAG.getTarget().Options.NoSignedZerosFPMath) | ||
return DAG.getNode(ISD::FTRUNC, DL, VT, N0.getOperand(0)); | ||
|
||
// Strict math: use FABS to handle negative inputs correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uitofp
has nneg
flag support. Check nneg
flag before generating FABS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, nneg flag does not help! The problematic cases are inputs like -0.5. In that case the input to uitofp would be 0 which is not negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, my fault, seems both nneg and nsz are requied but in current implementation it is impossible 🥲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! would address this!
// Strict math: clamp to the signed integer range before truncating. | ||
unsigned IntWidth = N0.getValueSizeInBits(); | ||
APInt APMax = APInt::getSignedMaxValue(IntWidth); | ||
APInt APMin = APInt::getSignedMinValue(IntWidth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NoSignedZerosFPMath
might be not enough to cover the range check part. BTW, the constrained floating point intrinsics are converted into the SDNode with prefix STRICT, which might cause confusion.
It's missing from the PR? |
@@ -16,8 +21,12 @@ entry: | |||
define float @t2(float %x) { | |||
; CHECK-LABEL: t2: | |||
; CHECK: // %bb.0: // %entry | |||
; CHECK-NEXT: fcvtzs s0, s0 | |||
; CHECK-NEXT: scvtf s0, s0 | |||
; CHECK-NEXT: movi v1.2s, #207, lsl #24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regression, this should probably be guarded on an isFAbsFree check
ISD::FMINNUM, DL, VT, | ||
DAG.getNode(ISD::FMAXNUM, DL, VT, N0->getOperand(0), MinFP), MaxFP); | ||
return DAG.getNode(ISD::FTRUNC, DL, VT, Clamped); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier to handle the signed and unsigned cases in separate PRs, the signed case is less obviously profitable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will focus on handling unsigned cases in this PR.
✅ With the latest revision this PR passed the C/C++ code formatter. |
In the newest commit, I only handle the unsigned cases as @arsenm suggested, and I also added the test for the AMDGPU backend. It seems that it doesn't generate the expected codegen when the fp type is |
71fddaa
to
b127081
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but should regenerate the test with the suggested run lines
@@ -0,0 +1,209 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6 | |||
; RUN: llc < %s -mtriple=amdgcn | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; RUN: llc < %s -mtriple=amdgcn | FileCheck %s | |
; RUN: llc -mtriple=amdgcn -mcpu=gfx600 < %s | FileCheck -check-prefix=GFX6 %s | |
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck -check-prefix=GFX9 %s |
Should test with and without legal 16-bit operations, the 16-bit checks are missing the fabs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the testcase. It looks like the results are correct. I found that the gfx900 also has different codegen with f64. Are f64 operations also illegal in gfx600?
(I don't have permission to merge the PR, so if everything looks good to you, could you please help merge it?)
fff0273
to
36447e7
Compare
This patch updates the FP-to-Int conversion handling:
ftrunc
followed by clamping to the target integer range.fabs
+ftrunc
, then clamp.This removes the previous dependence on
nsz
and ensures correct lowering for both signed and unsigned cases.I've tested the code generation of -mtriple=amdgcn. It seems that the assembly code is expected, but I'm not sure how to write a general testcase for every target.
Fixes #160623.