Skip to content

Conversation

VedantParanjape
Copy link
Contributor

When FAdd result is used by fabs, we can safely ignore the sign bit of fp zero. This patch enables an instruction simplification optimization that folds fadd x, 0 ==> x, which would otherwise not work as the compiler cannot prove that the zero isn't -0. But if the result of the fadd is used by fabs we can simply ignore this and still do the optimization.

Fixes #154238

@llvmbot llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Sep 9, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 9, 2025

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-analysis

Author: Vedant Paranjape (VedantParanjape)

Changes

When FAdd result is used by fabs, we can safely ignore the sign bit of fp zero. This patch enables an instruction simplification optimization that folds fadd x, 0 ==> x, which would otherwise not work as the compiler cannot prove that the zero isn't -0. But if the result of the fadd is used by fabs we can simply ignore this and still do the optimization.

Fixes #154238


Full diff: https://github.com/llvm/llvm-project/pull/157757.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/InstructionSimplify.cpp (+3-1)
  • (added) llvm/test/Transforms/InstSimplify/fold-fadd-with-zero-gh154238.ll (+12)
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index ebe329aa1d5fe..7f555c24f71a8 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -5723,7 +5723,9 @@ simplifyFAddInst(Value *Op0, Value *Op1, FastMathFlags FMF,
   // fadd X, 0 ==> X, when we know X is not -0
   if (canIgnoreSNaN(ExBehavior, FMF))
     if (match(Op1, m_PosZeroFP()) &&
-        (FMF.noSignedZeros() || cannotBeNegativeZero(Op0, Q)))
+        (FMF.noSignedZeros() || cannotBeNegativeZero(Op0, Q) ||
+         (Q.CxtI && !Q.CxtI->use_empty() &&
+          canIgnoreSignBitOfZero(*(Q.CxtI->use_begin())))))
       return Op0;
 
   if (!isDefaultFPEnvironment(ExBehavior, Rounding))
diff --git a/llvm/test/Transforms/InstSimplify/fold-fadd-with-zero-gh154238.ll b/llvm/test/Transforms/InstSimplify/fold-fadd-with-zero-gh154238.ll
new file mode 100644
index 0000000000000..bb12328574dda
--- /dev/null
+++ b/llvm/test/Transforms/InstSimplify/fold-fadd-with-zero-gh154238.ll
@@ -0,0 +1,12 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instsimplify -S | FileCheck %s
+define float @src(float %arg1) {
+; CHECK-LABEL: define float @src(
+; CHECK-SAME: float [[ARG1:%.*]]) {
+; CHECK-NEXT:    [[V3:%.*]] = call float @llvm.fabs.f32(float [[ARG1]])
+; CHECK-NEXT:    ret float [[V3]]
+;
+  %v2 = fadd float %arg1, 0.000000e+00
+  %v3 = call float @llvm.fabs.f32(float %v2)
+  ret float %v3
+}

@llvmbot
Copy link
Member

llvmbot commented Sep 9, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Vedant Paranjape (VedantParanjape)

Changes

When FAdd result is used by fabs, we can safely ignore the sign bit of fp zero. This patch enables an instruction simplification optimization that folds fadd x, 0 ==> x, which would otherwise not work as the compiler cannot prove that the zero isn't -0. But if the result of the fadd is used by fabs we can simply ignore this and still do the optimization.

Fixes #154238


Full diff: https://github.com/llvm/llvm-project/pull/157757.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/InstructionSimplify.cpp (+3-1)
  • (added) llvm/test/Transforms/InstSimplify/fold-fadd-with-zero-gh154238.ll (+12)
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index ebe329aa1d5fe..7f555c24f71a8 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -5723,7 +5723,9 @@ simplifyFAddInst(Value *Op0, Value *Op1, FastMathFlags FMF,
   // fadd X, 0 ==> X, when we know X is not -0
   if (canIgnoreSNaN(ExBehavior, FMF))
     if (match(Op1, m_PosZeroFP()) &&
-        (FMF.noSignedZeros() || cannotBeNegativeZero(Op0, Q)))
+        (FMF.noSignedZeros() || cannotBeNegativeZero(Op0, Q) ||
+         (Q.CxtI && !Q.CxtI->use_empty() &&
+          canIgnoreSignBitOfZero(*(Q.CxtI->use_begin())))))
       return Op0;
 
   if (!isDefaultFPEnvironment(ExBehavior, Rounding))
diff --git a/llvm/test/Transforms/InstSimplify/fold-fadd-with-zero-gh154238.ll b/llvm/test/Transforms/InstSimplify/fold-fadd-with-zero-gh154238.ll
new file mode 100644
index 0000000000000..bb12328574dda
--- /dev/null
+++ b/llvm/test/Transforms/InstSimplify/fold-fadd-with-zero-gh154238.ll
@@ -0,0 +1,12 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instsimplify -S | FileCheck %s
+define float @src(float %arg1) {
+; CHECK-LABEL: define float @src(
+; CHECK-SAME: float [[ARG1:%.*]]) {
+; CHECK-NEXT:    [[V3:%.*]] = call float @llvm.fabs.f32(float [[ARG1]])
+; CHECK-NEXT:    ret float [[V3]]
+;
+  %v2 = fadd float %arg1, 0.000000e+00
+  %v3 = call float @llvm.fabs.f32(float %v2)
+  ret float %v3
+}

@VedantParanjape
Copy link
Contributor Author

I reviewed the failing test case (CodeGen/AMDGPU/fcanonicalize-elimination.ll) test_fold_canonicalize_fabs_value_f32. Since fadd X, 0 = X after the optimization, AMD backend generates v_mul_f32_e64 instead of v_and_b32_e32 for fabs. I can fix this in testcase, but doesn't this look like the ISel is wrong? mul is costlier than an and usually, is this not applicable for AMDGPU?

When FAdd result is used by fabs, we can safely ignore the sign bit of
fp zero. This patch enables an instruction simplification optimization
that folds fadd x, 0 ==> x, which would otherwise not work as the
compiler cannot prove that the zero isn't -0. But if the result of the
fadd is used by fabs we can simply ignore this and still do the
optimization.

Fixes llvm#154238
@VedantParanjape
Copy link
Contributor Author

I reviewed the failing test case (CodeGen/AMDGPU/fcanonicalize-elimination.ll) test_fold_canonicalize_fabs_value_f32. Since fadd X, 0 = X after the optimization, AMD backend generates v_mul_f32_e64 instead of v_and_b32_e32 for fabs. I can fix this in testcase, but doesn't this look like the ISel is wrong? mul is costlier than an and usually, is this not applicable for AMDGPU?

It seems on older arch it emits a vmax, and vmul on the newer ones. It does so to make sure fmath flags are copied over correctly.

(FMF.noSignedZeros() || cannotBeNegativeZero(Op0, Q)))
(FMF.noSignedZeros() || cannotBeNegativeZero(Op0, Q) ||
(Q.CxtI && !Q.CxtI->use_empty() &&
canIgnoreSignBitOfZero(*(Q.CxtI->use_begin())))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do use-based reasoning inside InstSimplify, this must happen in InstCombine.

@nikic nikic added the floating-point Floating-point math label Sep 10, 2025
@VedantParanjape
Copy link
Contributor Author

Made the changes as proposed, also changed canIgnoreSignBitOfNaN for uniformity.

@dtcxzyw
Copy link
Member

dtcxzyw commented Sep 10, 2025

Made the changes as proposed, also changed canIgnoreSignBitOfNaN for uniformity.

This reverts the previous change asked by #141015 (comment) :)
You should put your code in InstCombine instead of InstSimplify, if it checks the users.

@VedantParanjape
Copy link
Contributor Author

Made the changes as proposed, also changed canIgnoreSignBitOfNaN for uniformity.

This reverts the previous change asked by #141015 (comment) :) You should put your code in InstCombine instead of InstSimplify, if it checks the users.

Okay makes sense, as in move the complete optimization to InstCombine

@VedantParanjape
Copy link
Contributor Author

@dtcxzyw made the changes.

@VedantParanjape VedantParanjape changed the title [InstSimplify] Enable FAdd simplifications when user can ignore sign bit [InstCombine] Enable FAdd simplifications when user can ignore sign bit Sep 11, 2025
@VedantParanjape
Copy link
Contributor Author

VedantParanjape commented Sep 11, 2025

@arsenm tracking here for which call operators this can be implemented. Operators whose identity element is zero.

  • visitFAdd
  • visitFSub

@dtcxzyw
Copy link
Member

dtcxzyw commented Sep 12, 2025

@zyw-bot mfuzz

Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@VedantParanjape VedantParanjape merged commit 092de9b into llvm:main Sep 12, 2025
9 checks passed
Value *A;
if (match(&I, m_OneUse(m_FSub(m_Value(A), m_AnyZeroFP()))) &&
canIgnoreSignBitOfZero(*I.use_begin()))
return replaceInstUsesWith(I, A);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fold doesn't make sense, as fsub x, C will be canonicalized to fadd x, -C. The test case you added makes even less sense because it uses fsub x, 0.0 aka fadd x, -0.0 which is always a no-op and does not actually depend on the use-based logic.

Please remove this code again.

VedantParanjape added a commit to VedantParanjape/llvm-project that referenced this pull request Sep 12, 2025
Since FSub X, 0 gets canoncialised to FAdd X, -0 the said optimization
didn't make much sense for FSub. Remove it from IC and the adjoined
testcase.
VedantParanjape added a commit that referenced this pull request Sep 12, 2025
Since FSub X, 0 gets canoncialised to FAdd X, -0 the said optimization
didn't make much sense for FSub. Remove it from IC and the adjoined
testcase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU floating-point Floating-point math llvm:analysis Includes value tracking, cost tables and constant folding llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missed optimization: fold away neutral fadd 0.0 before fabs
5 participants