Skip to content

Conversation

yichi170
Copy link
Contributor

This patch updates the FP-to-Int conversion handling:

  • For signed integers: use ftrunc followed by clamping to the target integer range.
  • For unsigned integers: apply fabs + ftrunc, then clamp.

This removes the previous dependence on nsz and ensures correct lowering for both signed and unsigned cases.

I've tested the code generation of -mtriple=amdgcn. It seems that the assembly code is expected, but I'm not sure how to write a general testcase for every target.

Fixes #160623.

@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Sep 28, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 28, 2025

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-selectiondag

Author: Yi-Chi Lee (yichi170)

Changes

This patch updates the FP-to-Int conversion handling:

  • For signed integers: use ftrunc followed by clamping to the target integer range.
  • For unsigned integers: apply fabs + ftrunc, then clamp.

This removes the previous dependence on nsz and ensures correct lowering for both signed and unsigned cases.

I've tested the code generation of -mtriple=amdgcn. It seems that the assembly code is expected, but I'm not sure how to write a general testcase for every target.

Fixes #160623.


Full diff: https://github.com/llvm/llvm-project/pull/161093.diff

1 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+41-11)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index a6ba6e518899f..65cea64e0982d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18862,27 +18862,57 @@ SDValue DAGCombiner::visitFPOW(SDNode *N) {
 
 static SDValue foldFPToIntToFP(SDNode *N, const SDLoc &DL, SelectionDAG &DAG,
                                const TargetLowering &TLI) {
-  // We only do this if the target has legal ftrunc. Otherwise, we'd likely be
-  // replacing casts with a libcall. We also must be allowed to ignore -0.0
-  // because FTRUNC will return -0.0 for (-1.0, -0.0), but using integer
-  // conversions would return +0.0.
+  // We can fold the fpto[us]i -> [us]itofp pattern into a single ftrunc.
+  // If NoSignedZerosFPMath is enabled, this is a direct replacement.
+  // Otherwise, for strict math, we must handle edge cases:
+  // 1. For signed conversions, clamp out-of-range values to the valid
+  //    integer range before the trunc.
+  // 2. For unsigned conversions, use FABS. A negative float becomes integer 0,
+  //    which must convert back to +0.0. FTRUNC on its own could produce -0.0.
+
   // FIXME: We should be able to use node-level FMF here.
-  // TODO: If strict math, should we use FABS (+ range check for signed cast)?
   EVT VT = N->getValueType(0);
-  if (!TLI.isOperationLegal(ISD::FTRUNC, VT) ||
-      !DAG.getTarget().Options.NoSignedZerosFPMath)
+  if (!TLI.isOperationLegal(ISD::FTRUNC, VT))
     return SDValue();
 
   // fptosi/fptoui round towards zero, so converting from FP to integer and
   // back is the same as an 'ftrunc': [us]itofp (fpto[us]i X) --> ftrunc X
   SDValue N0 = N->getOperand(0);
   if (N->getOpcode() == ISD::SINT_TO_FP && N0.getOpcode() == ISD::FP_TO_SINT &&
-      N0.getOperand(0).getValueType() == VT)
-    return DAG.getNode(ISD::FTRUNC, DL, VT, N0.getOperand(0));
+      N0.getOperand(0).getValueType() == VT) {
+    if (DAG.getTarget().Options.NoSignedZerosFPMath)
+      return DAG.getNode(ISD::FTRUNC, DL, VT, N0.getOperand(0));
+
+    // Strict math: clamp to the signed integer range before truncating.
+    unsigned IntWidth = N0.getValueSizeInBits();
+    APInt APMax = APInt::getSignedMaxValue(IntWidth);
+    APInt APMin = APInt::getSignedMinValue(IntWidth);
+
+    APFloat MaxAPF(VT.getFltSemantics());
+    MaxAPF.convertFromAPInt(APMax, true, APFloat::rmTowardZero);
+    APFloat MinAPF(VT.getFltSemantics());
+    MinAPF.convertFromAPInt(APMin, true, APFloat::rmTowardZero);
+
+    SDValue MaxFP = DAG.getConstantFP(MaxAPF, DL, VT);
+    SDValue MinFP = DAG.getConstantFP(MinAPF, DL, VT);
+
+    SDValue Clamped = DAG.getNode(
+        ISD::FMINNUM, DL, VT,
+        DAG.getNode(ISD::FMAXNUM, DL, VT, N0->getOperand(0), MinFP), MaxFP);
+    return DAG.getNode(ISD::FTRUNC, DL, VT, Clamped);
+  }
 
   if (N->getOpcode() == ISD::UINT_TO_FP && N0.getOpcode() == ISD::FP_TO_UINT &&
-      N0.getOperand(0).getValueType() == VT)
-    return DAG.getNode(ISD::FTRUNC, DL, VT, N0.getOperand(0));
+      N0.getOperand(0).getValueType() == VT) {
+    if (DAG.getTarget().Options.NoSignedZerosFPMath)
+      return DAG.getNode(ISD::FTRUNC, DL, VT, N0.getOperand(0));
+
+    // Strict math: use FABS to handle negative inputs correctly.
+    if (TLI.isFAbsFree(VT)) {
+      SDValue Abs = DAG.getNode(ISD::FABS, DL, VT, N0.getOperand(0));
+      return DAG.getNode(ISD::FTRUNC, DL, VT, Abs);
+    }
+  }
 
   return SDValue();
 }

if (DAG.getTarget().Options.NoSignedZerosFPMath)
return DAG.getNode(ISD::FTRUNC, DL, VT, N0.getOperand(0));

// Strict math: use FABS to handle negative inputs correctly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uitofp has nneg flag support. Check nneg flag before generating FABS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nneg flag does not help! The problematic cases are inputs like -0.5. In that case the input to uitofp would be 0 which is not negative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, my fault, seems both nneg and nsz are requied but in current implementation it is impossible 🥲

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! would address this!

// Strict math: clamp to the signed integer range before truncating.
unsigned IntWidth = N0.getValueSizeInBits();
APInt APMax = APInt::getSignedMaxValue(IntWidth);
APInt APMin = APInt::getSignedMinValue(IntWidth);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoSignedZerosFPMath might be not enough to cover the range check part. BTW, the constrained floating point intrinsics are converted into the SDNode with prefix STRICT, which might cause confusion.

@paperchalice paperchalice requested a review from arsenm September 29, 2025 06:32
@arsenm arsenm added the floating-point Floating-point math label Sep 29, 2025
@arsenm
Copy link
Contributor

arsenm commented Sep 29, 2025

I've tested the code generation of -mtriple=amdgcn. It seems that the assembly code is expected, but I'm not sure how to write a general testcase for every target.

It's missing from the PR?

@@ -16,8 +21,12 @@ entry:
define float @t2(float %x) {
; CHECK-LABEL: t2:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: fcvtzs s0, s0
; CHECK-NEXT: scvtf s0, s0
; CHECK-NEXT: movi v1.2s, #207, lsl #24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression, this should probably be guarded on an isFAbsFree check

ISD::FMINNUM, DL, VT,
DAG.getNode(ISD::FMAXNUM, DL, VT, N0->getOperand(0), MinFP), MaxFP);
return DAG.getNode(ISD::FTRUNC, DL, VT, Clamped);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to handle the signed and unsigned cases in separate PRs, the signed case is less obviously profitable

Copy link
Contributor Author

@yichi170 yichi170 Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will focus on handling unsigned cases in this PR.

Copy link

github-actions bot commented Sep 29, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@yichi170
Copy link
Contributor Author

yichi170 commented Sep 29, 2025

In the newest commit, I only handle the unsigned cases as @arsenm suggested, and I also added the test for the AMDGPU backend. It seems that it doesn't generate the expected codegen when the fp type is double. I'm not sure why.

@yichi170 yichi170 force-pushed the extend-fp-to-int-combine branch from 71fddaa to b127081 Compare September 30, 2025 12:52
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but should regenerate the test with the suggested run lines

@@ -0,0 +1,209 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
; RUN: llc < %s -mtriple=amdgcn | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; RUN: llc < %s -mtriple=amdgcn | FileCheck %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx600 < %s | FileCheck -check-prefix=GFX6 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck -check-prefix=GFX9 %s

Should test with and without legal 16-bit operations, the 16-bit checks are missing the fabs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the testcase. It looks like the results are correct. I found that the gfx900 also has different codegen with f64. Are f64 operations also illegal in gfx600?

(I don't have permission to merge the PR, so if everything looks good to you, could you please help merge it?)

@yichi170 yichi170 force-pushed the extend-fp-to-int-combine branch from fff0273 to 36447e7 Compare October 10, 2025 14:33
@arsenm arsenm merged commit a9c8e94 into llvm:main Oct 10, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend FP-to-Int cast combine to use fabs instead of requiring no signed zeros

5 participants