[SROA] Propagate no-signed-zeros(nsz) fast-math flag on the phi node using function attribute #83381

yashssh · 2024-02-29T06:55:00Z

Its expected that the sequence return X > 0.0 ? X : -X, compiled with -Ofast, produces fabs intrinsic. However, at this point, LLVM is unable to do so.

The above sequence goes through the following transformation during the pass pipeline:

SROA pass generates the phi node. Here, it does not infer the fast-math flags on the phi node unlike clang frontend typically does.
Phi node eventually gets translated into select instruction.
Because of missing no-signed-zeros(nsz) fast-math flag on the select instruction, InstCombine pass fails to fold the sequence into fabs intrinsic.

This patch, as a part of SROA, tries to propagate nsz fast-math flag on the phi node using function attribute enabling this folding.

Closes #51601

Co-authored-by: Sushant Gokhale sgokhale@nvidia.com

llvmbot · 2024-02-29T06:55:32Z

@llvm/pr-subscribers-llvm-transforms

Author: Yashwant Singh (yashssh)

Changes

Based on these C/C++ patterns when compiled with 'Ofast' return X > 0.0 ? X : -X;
return X < 0.0 ? -X : X;

InstCombine tries to propogate FMF to 'select' instructions before attempting a fold, but it can't safely propgate 'nsz' hence wasn't performing the optimization. OOTH we should be able to do this optimization at 'Ofast' same as gcc(https://godbolt.org/z/c69fe5fa6).

Bit of a workaround but this patch allows us to query the "no-signed-zeroes" function attribute added to the function during 'Ofast' compilation. Allowing instcombine to safely match the idiom.

Full diff: https://github.com/llvm/llvm-project/pull/83381.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+5-1)
(modified) llvm/test/Transforms/InstCombine/fabs.ll (+27)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 71fa9b9ba41ebb..bd56d92d5d3d0f 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -2742,7 +2742,11 @@ static Instruction *foldSelectWithFCmpToFabs(SelectInst &SI,
     // Note: We require "nnan" for this fold because fcmp ignores the signbit
     //       of NAN, but IEEE-754 specifies the signbit of NAN values with
     //       fneg/fabs operations.
-    if (!SI.hasNoSignedZeros() || !SI.hasNoNaNs())
+    if (!SI.hasNoNaNs())
+      return nullptr;
+
+    bool functionHasNoSignedZeroes = SI.getParent()->getParent()->hasFnAttribute("no-signed-zeros-fp-math");
+    if(!functionHasNoSignedZeroes && !SI.hasNoSignedZeros())
       return nullptr;
 
     if (Swap)
diff --git a/llvm/test/Transforms/InstCombine/fabs.ll b/llvm/test/Transforms/InstCombine/fabs.ll
index 7e380c2e4590a0..88b02a852f3d74 100644
--- a/llvm/test/Transforms/InstCombine/fabs.ll
+++ b/llvm/test/Transforms/InstCombine/fabs.ll
@@ -547,6 +547,20 @@ define double @select_fcmp_nnan_nsz_ult_zero_unary_fneg(double %x) {
   ret double %fabs
 }
 
+
+define float @absfloat32f_olt_fast_no_signed_zeroes(float %x) "no-signed-zeros-fp-math" {
+; CHECK-LABEL: @absfloat32f_olt_fast_no_signed_zeroes(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = call nnan ninf float @llvm.fabs.f32(float [[X:%.*]])
+; CHECK-NEXT:    ret float [[RETVAL_0]]
+;
+entry:
+  %cmp = fcmp fast olt float %x, 0.000000e+00
+  %fneg = fneg fast float %x
+  %retval.0 = select i1 %cmp, float %fneg, float %x
+  ret float %retval.0
+}
+
 ; X < -0.0 ? -X : X --> fabs(X)
 
 define float @select_fcmp_nnan_nsz_olt_negzero(float %x) {
@@ -839,6 +853,19 @@ define <2 x float> @select_fcmp_nnan_nsz_ugt_zero_unary_fneg(<2 x float> %x) {
   ret <2 x float> %fabs
 }
 
+define float @absfloat32f_ogt_fast_no_signed_zeroes(float %x) "no-signed-zeros-fp-math" {
+; CHECK-LABEL: @absfloat32f_ogt_fast_no_signed_zeroes(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = call nnan ninf float @llvm.fabs.f32(float [[X:%.*]])
+; CHECK-NEXT:    ret float [[RETVAL_0]]
+;
+entry:
+  %cmp = fcmp fast ogt float %x, 0.000000e+00
+  %fneg = fneg fast float %x
+  %retval.0 = select i1 %cmp, float %x, float %fneg
+  ret float %retval.0
+}
+
 ; X > -0.0 ? X : (0.0 - X) --> fabs(X)
 
 define half @select_fcmp_nnan_nsz_ogt_negzero(half %x) {

github-actions · 2024-02-29T06:57:25Z

✅ With the latest revision this PR passed the C/C++ code formatter.

yashssh · 2024-02-29T06:58:40Z

The original inspiration for the missed optimization can be seen here https://godbolt.org/z/z3Es5oxqv.
This seemed to be the most straightforward way to achieve this optimization and a detailed discussion about the issue and why this fold was skipped by 'instcombine' is documented here #51601

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

llvm/test/Transforms/InstCombine/fabs.ll

nikic

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

llvm/test/Transforms/InstCombine/fabs.ll

dtcxzyw · 2024-03-01T09:06:41Z

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

TBH I think fast-math flags should only be function-level attributes :(

yashssh · 2024-03-01T09:53:40Z

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

TBH I think fast-math flags should only be function-level attributes :(

Will that be more efficient? What about function inlining when we end up losing these attributes?

yashssh · 2024-03-01T09:58:28Z

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

That can be done but will be a separate work altogether, my understanding is somewhere early in the pipeline. I was also suggested that it would require allowing loads and stores to have fast flags so that they are safely reflected in phis and selects.

dtcxzyw · 2024-03-01T11:19:46Z

What about function inlining when we end up losing these attributes?

Is there any real-world project whose translation units/functions are compiled with different fast math flags?

andykaylor · 2024-03-01T19:50:05Z

What about function inlining when we end up losing these attributes?

Is there any real-world project whose translation units/functions are compiled with different fast math flags?

We have pragmas that allow you to turn fast-math on an off locally, at a scope level, so even within a function the fast-math flags can change. I think this is an important capability. One of the top concerns I hear from customers who are using fast-math is that they need the ability to track down optimizations that are throwing their results too far out of range. A first step to tracking this down is to stop using fast-math for individual files or sets of files to bisect the problem down to which file is causing a failure. From there, it gets harder, but pragmas can help.

I'd like to introduce a change to the way we check fast-math flags that would let us insert some kind of debug counter to narrow it down in a way that would be similar to how opt-bisect works but effective at the individual transformation granularity. There are some obstacles to making that happen, but I think that would be an extremely beneficial capability. Once you find the place where an accuracy-related failure is being introduced, my idea is to have some way (probably based on pragmas) to turn fast-math off for that location only in the code without needing the debug counter. That's a bit of a pipe dream at this point, but I want to see us working towards that.

This is one of the reasons I don't like fast-math function attributes, but the way things currently work is that the attribute isn't set if the relevant fast-math flag is turned off anywhere in the function. The front end handles this with pragmas, but it means that the inliner needs to remove the attribute if inlining calls where the callee and caller don't both have it set.

andykaylor · 2024-03-01T20:03:33Z

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

The problem is in deciding where to do that. I don't like the function attributes and have made a few attempts to have them deprecated, but there are cases that couldn't be solved without them. This case, as detailed in #51601, is one such example. The front end can't set the fast-math flags on load instructions, so they never make it to the phi instruction that gets created. Propogating the flags from the attribute to instructions wouldn't help unless it was done after SROA and/or SimplifyCFG, but maybe those passes could add the flag based on the function-level attribute. That might be a better way to fix this.

yashssh · 2024-03-13T17:25:04Z

I'm leaving this pull request open for now while simultaneously looking if I can set FMF in SROA, will return back with whatever I find.

andykaylor · 2024-03-13T18:50:21Z

I'm leaving this pull request open for now while simultaneously looking if I can set FMF in SROA, will return back with whatever I find.

This is obviously a contrived example, but here is a case that will not be optimized if you rely on the function attributes but could be optimized if the flags were propagated by SROA.

https://godbolt.org/z/senb3bEbn

yashssh · 2024-03-18T07:00:38Z

Thanks for the example @andykaylor!

I tried a change that sets fast flag on phi nodes created by SROA

diff --git a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
index 88b05aab8db4..1d85c7e0d8a6 100644
--- a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
+++ b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
@@ -1056,6 +1056,22 @@ bool PromoteMem2Reg::QueuePhiNode(BasicBlock *BB, unsigned AllocaNo,
   // BasicBlock.
   PN = PHINode::Create(Allocas[AllocaNo]->getAllocatedType(), getNumPreds(BB),
                        Allocas[AllocaNo]->getName() + "." + Twine(Version++));
+
+  if(BB->getParent()->hasFnAttribute("unsafe-fp-math")){
+    PN->setFast(true);
+  }
+
   PN->insertBefore(BB->begin());
   ++NumPHIInsert;
   PhiToAllocaMap[PN] = AllocaNo;

Is this the direction you were hinting towards?

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp

llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp

llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll

arsenm · 2024-06-13T11:52:34Z

Should add the Fixes: issue number to the description

sushgokh · 2024-06-19T07:00:54Z

ping @arsenm

arsenm · 2024-06-19T13:48:48Z

I do think we should have the phase ordering test

sushgokh · 2024-06-25T06:08:00Z

I do think we should have the phase ordering test

Although I didn't have a PhaseOrdering test, I had initially added a test that showed the transformation after the concerned passes here and @jcranmer-intel insisted there that this is not required.

jcranmer-intel · 2024-06-28T20:45:56Z

I do think we should have the phase ordering test

Although I didn't have a PhaseOrdering test, I had initially added a test that showed the transformation after the concerned passes here and @jcranmer-intel insisted there that this is not required.

The right way to do a phase ordering test is not the way you did it. There's a folder test/Transforms/PhaseOrdering directory where the phase-ordering tests go, and in general, you would probably want to check a test like opt -O1 to make sure that the passes are actually run in the order expected.

arsenm

It's not great but was the consensus on the ticket. The phase orderingtest could use a comment and rename though

llvm/test/Transforms/PhaseOrdering/fabs.ll

…using function attribute Its expected that the sequence return X > 0.0 ? X : -X, compiled with -Ofast, produces fabs intrinsic. However, at this point, LLVM is unable to do so. The above sequence goes through the following transformation during the pass pipeline: SROA pass generates the phi node. Here, it does not infer the fast-math flags on the phi node unlike clang frontend typically does. Phi node eventually gets translated into select instruction. Because of missing no-signed-zeros(nsz) fast-math flag on the select instruction, InstCombine pass fails to fold the sequence into fabs intrinsic. This patch, as a part of SROA, tries to propagate nsz fast-math flag on the phi node using function attribute enabling this folding. Co-authored-by: Sushant Gokhale <sgokhale@nvidia.com>

llvm-ci · 2024-07-02T06:55:48Z

LLVM Buildbot has detected a new failure on builder sanitizer-ppc64le-linux running on ppc64le-sanitizer while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/683

Here is the relevant piece of the build log for the reference:

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
PASS: MemorySanitizer-POWERPC64LE :: recover-dso.cpp (2428 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: heap-origin.cpp (2429 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: check_mem_is_initialized.cpp (2430 of 2450)
XFAIL: MemorySanitizer-POWERPC64LE :: vararg_shadow.cpp (2431 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin2.cpp (2432 of 2450)
PASS: ScudoStandalone-Unit :: ./ScudoUnitTest-powerpc64le-Test/138/267 (2433 of 2450)
PASS: SanitizerCommon-tsan-powerpc64le-Linux :: max_allocation_size.cpp (2434 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin.cpp (2435 of 2450)
PASS: ThreadSanitizer-powerpc64le :: bench_threads.cpp (2436 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: recover.cpp (2437 of 2450)
FAIL: ThreadSanitizer-powerpc64le :: signal_reset.cpp (2438 of 2450)
******************** TEST 'ThreadSanitizer-powerpc64le :: signal_reset.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang  --driver-mode=g++ -fsanitize=thread -Wall  -m64 -fno-function-sections   -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp &&  /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -fno-function-sections -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:73:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: WARNING: ThreadSanitizer:
              ^
<stdin>:2:1: note: found here
WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744)
^~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        1: ================== 
        2: WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744) 
not:73     !~~~~~~~~~~~~~~~~~~~~~~~~                                            error: no match expected
        3:  Signal 27 handler invoked at: 
        4:  #0 handler(int) /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 (signal_reset.cpp.tmp+0x1019d0) 
        5:  
        6: SUMMARY: ThreadSanitizer: signal handler spoils errno /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 in handler(int) 
        7: ================== 
        8: DONE 
        9: ThreadSanitizer: reported 1 warnings 
>>>>>>

--

********************
Step 11 (test compiler-rt debug) failure: test compiler-rt debug (failure)
...
PASS: MemorySanitizer-POWERPC64LE :: recover-dso.cpp (2428 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: heap-origin.cpp (2429 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: check_mem_is_initialized.cpp (2430 of 2450)
XFAIL: MemorySanitizer-POWERPC64LE :: vararg_shadow.cpp (2431 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin2.cpp (2432 of 2450)
PASS: ScudoStandalone-Unit :: ./ScudoUnitTest-powerpc64le-Test/138/267 (2433 of 2450)
PASS: SanitizerCommon-tsan-powerpc64le-Linux :: max_allocation_size.cpp (2434 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin.cpp (2435 of 2450)
PASS: ThreadSanitizer-powerpc64le :: bench_threads.cpp (2436 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: recover.cpp (2437 of 2450)
FAIL: ThreadSanitizer-powerpc64le :: signal_reset.cpp (2438 of 2450)
******************** TEST 'ThreadSanitizer-powerpc64le :: signal_reset.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang  --driver-mode=g++ -fsanitize=thread -Wall  -m64 -fno-function-sections   -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp &&  /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -fno-function-sections -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:73:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: WARNING: ThreadSanitizer:
              ^
<stdin>:2:1: note: found here
WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744)
^~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        1: ================== 
        2: WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744) 
not:73     !~~~~~~~~~~~~~~~~~~~~~~~~                                            error: no match expected
        3:  Signal 27 handler invoked at: 
        4:  #0 handler(int) /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 (signal_reset.cpp.tmp+0x1019d0) 
        5:  
        6: SUMMARY: ThreadSanitizer: signal handler spoils errno /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 in handler(int) 
        7: ================== 
        8: DONE 
        9: ThreadSanitizer: reported 1 warnings 
>>>>>>

--

********************

llvm-ci · 2024-07-02T07:10:42Z

LLVM Buildbot has detected a new failure on builder lldb-aarch64-windows running on linaro-armv8-windows-msvc-05 while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/141/builds/428

Here is the relevant piece of the build log for the reference:

Step 6 (test) failure: build (failure)
...
30.698 [1/2/288] Linking CXX executable tools\lldb\unittests\Thread\ThreadTests.exe
30.814 [1/1/289] Linking CXX executable tools\lldb\unittests\ValueObject\LLDBValueObjectTests.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:57: note: using lit tools: C:\Users\tcwg\scoop\apps\git\2.39.0.windows.2\usr\bin
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using clang: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using ld.lld: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld.lld.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using lld-link: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lld-link.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using ld64.lld: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld64.lld.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using wasm-ld: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\wasm-ld.exe
30.814 [0/1/289] Running lldb-- Testing: 1983 tests, 2 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 
FAIL: lldb-shell :: SymbolFile/DWARF/x86/dwp-foreign-type-units.cpp (1616 of 1983)
******************** TEST 'lldb-shell :: SymbolFile/DWARF/x86/dwp-foreign-type-units.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 19
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf    -fdebug-types-section -gpubnames -c C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp -o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe' --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf -fdebug-types-section -gpubnames -c 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp' -o 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o'
# note: command had no output on stdout or stderr
# RUN: at line 21
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf -DVARIANT    -fdebug-types-section -gpubnames -c C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp -o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe' --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf -DVARIANT -fdebug-types-section -gpubnames -c 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp' -o 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o'
# note: command had no output on stdout or stderr
# RUN: at line 23
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld.lld.exe C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o -o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld.lld.exe' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o' -o 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp'
# .---command stderr------------
# | ld.lld: warning: cannot find entry symbol _start; not setting start address
# `-----------------------------
# RUN: at line 26
rm -f C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.dwp
# executed command: rm -f 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.dwp'
# note: command had no output on stdout or stderr
# RUN: at line 27
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lldb.exe --no-lldbinit -S C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/tools/lldb\test\Shell\lit-lldb-init-quiet    -o "type lookup IntegerType"    -o "type lookup FloatType"    -o "type lookup CustomType"    -b C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp | c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\filecheck.exe C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp --check-prefix=NODWP
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lldb.exe' --no-lldbinit -S 'C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/tools/lldb\test\Shell\lit-lldb-init-quiet' -o 'type lookup IntegerType' -o 'type lookup FloatType' -o 'type lookup CustomType' -b 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp'
# note: command had no output on stdout or stderr
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\filecheck.exe' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp' --check-prefix=NODWP
# .---command stderr------------
# | C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp:35:11: error: NODWP: expected string not found in input
# | // NODWP: (lldb) type lookup FloatType
# |           ^
# | <stdin>:20:22: note: scanning from here
# |  typedef unsigned int IntegerType;
# |                      ^
# | <stdin>:21:3: note: possible intended match here
# |  typedef float FloatType;
# |   ^

llvm#83381 introduced a call to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number of predecessors, resulting in pathological compile-time regressions for cases with many predecessors. I believe this was intended as a compile-time optimization in the first place, so just drop the check, and always try to transfer the attribute.

#83381 introduced a call to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number of predecessors, resulting in pathological compile-time regressions for cases with many predecessors. Remove the isComplete() check and instead cache the attribute lookup, to only perform it once per function. Actually setting the FMF flag is cheap.

…using function attribute (llvm#83381) Its expected that the sequence `return X > 0.0 ? X : -X`, compiled with -Ofast, produces fabs intrinsic. However, at this point, LLVM is unable to do so. The above sequence goes through the following transformation during the pass pipeline: 1) SROA pass generates the phi node. Here, it does not infer the fast-math flags on the phi node unlike clang frontend typically does. 2) Phi node eventually gets translated into select instruction. Because of missing no-signed-zeros(nsz) fast-math flag on the select instruction, InstCombine pass fails to fold the sequence into fabs intrinsic. This patch, as a part of SROA, tries to propagate nsz fast-math flag on the phi node using function attribute enabling this folding. Closes llvm#51601 Co-authored-by: Sushant Gokhale <sgokhale@nvidia.com>

llvm#83381 introduced a call to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number of predecessors, resulting in pathological compile-time regressions for cases with many predecessors. Remove the isComplete() check and instead cache the attribute lookup, to only perform it once per function. Actually setting the FMF flag is cheap.

…using function attribute (llvm#83381) Its expected that the sequence `return X > 0.0 ? X : -X`, compiled with -Ofast, produces fabs intrinsic. However, at this point, LLVM is unable to do so. The above sequence goes through the following transformation during the pass pipeline: 1) SROA pass generates the phi node. Here, it does not infer the fast-math flags on the phi node unlike clang frontend typically does. 2) Phi node eventually gets translated into select instruction. Because of missing no-signed-zeros(nsz) fast-math flag on the select instruction, InstCombine pass fails to fold the sequence into fabs intrinsic. This patch, as a part of SROA, tries to propagate nsz fast-math flag on the phi node using function attribute enabling this folding. Closes llvm#51601 Co-authored-by: Sushant Gokhale <sgokhale@nvidia.com>

llvm#83381 introduced a call to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number of predecessors, resulting in pathological compile-time regressions for cases with many predecessors. Remove the isComplete() check and instead cache the attribute lookup, to only perform it once per function. Actually setting the FMF flag is cheap.

nunoplopes · 2024-07-07T15:20:41Z

Alive2 complains about this commit:

define double @fabs_fcmp_olt_nsz_func_attr(double %#0, double %#1) {
entry:
  %x = alloca i64 8, align 8
  store double %#0, ptr %x, align 8
  %cmp = fcmp nnan nsz olt double %#0, 0.000000
  br i1 %cmp, label %if.then, label %return

if.then:
  %fneg = fneg nnan nsz double %#0
  store double %fneg, ptr %x, align 8
  br label %return

return:
  %ret = load double, ptr %x, align 8
  ret double %ret
}
=>
define double @fabs_fcmp_olt_nsz_func_attr(double %#0, double %#1) {
entry:
  %cmp = fcmp nnan nsz olt double %#0, 0.000000
  br i1 %cmp, label %if.then, label %return

if.then:
  %fneg = fneg nnan nsz double %#0
  br label %return

return:
  %x.0 = phi nsz double [ %fneg, %if.then ], [ %#0, %entry ]
  ret double %x.0
}
Transformation doesn't verify! (unsound)
ERROR: Target's return value is more undefined

Example:
double %#0 = #x0000000000000000 (+0.0)
double %#1 = poison

Source:
ptr %x = pointer(local, block_id=0, offset=0)
i1 %cmp = #x0 (0)
  >> Jump to %return
double %ret = #x0000000000000000 (+0.0)

Target:
i1 %cmp = #x0 (0)
  >> Jump to %return
double %x.0 = #x8000000000000000 (-0.0)
Source value: #x0000000000000000 (+0.0)
Target value: #x8000000000000000 (-0.0)

nikic · 2024-07-07T15:26:23Z

@nunoplopes As far as I can tell, alive2 does not have support for the no-signed-zeros-fp-math attribute, so I think that's expected.

nunoplopes · 2024-07-07T15:59:17Z

@nunoplopes As far as I can tell, alive2 does not have support for the no-signed-zeros-fp-math attribute, so I think that's expected.

Ah, thanks, I missed that.
To my defense, it's not documented in LangRef 😅

yashssh requested review from rotateright, andykaylor and sjoerdmeijer February 29, 2024 06:55

yashssh requested a review from nikic as a code owner February 29, 2024 06:55

llvmbot added the llvm:transforms label Feb 29, 2024

goldsteinn requested a review from dtcxzyw February 29, 2024 18:22

goldsteinn assigned arsenm and unassigned arsenm Feb 29, 2024

goldsteinn requested a review from arsenm February 29, 2024 18:23

goldsteinn reviewed Feb 29, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Feb 29, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp Outdated Show resolved Hide resolved

andykaylor reviewed Feb 29, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/InstCombine/fabs.ll Outdated Show resolved Hide resolved

dtcxzyw linked an issue Mar 1, 2024 that may be closed by this pull request

Optimizing functions that compute absolute values of x #61613

Closed

nikic reviewed Mar 1, 2024

View reviewed changes

arsenm reviewed Mar 1, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp Outdated Show resolved Hide resolved

arsenm reviewed Mar 1, 2024

View reviewed changes

llvm/test/Transforms/InstCombine/fabs.ll Outdated Show resolved Hide resolved

yashssh added the llvm:instcombine label Mar 1, 2024

yashssh requested a review from madhur13490 March 4, 2024 03:38

sushgokh force-pushed the main branch from 0d9de59 to 6b1e806 Compare June 3, 2024 06:18

arsenm reviewed Jun 6, 2024

View reviewed changes

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp Outdated Show resolved Hide resolved

arsenm reviewed Jun 6, 2024

View reviewed changes

llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll Show resolved Hide resolved

sushgokh force-pushed the main branch from 6b1e806 to 31bfff1 Compare June 11, 2024 07:38

arsenm reviewed Jun 12, 2024

View reviewed changes

llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll Outdated Show resolved Hide resolved

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/SROA/propagate-fast-math-flags-on-phi.ll Show resolved Hide resolved

sushgokh force-pushed the main branch from 31bfff1 to 67fbdf1 Compare June 13, 2024 04:59

sushgokh force-pushed the main branch from 67fbdf1 to 8c7d119 Compare July 1, 2024 12:50

arsenm approved these changes Jul 1, 2024

View reviewed changes

llvm/test/Transforms/PhaseOrdering/fabs.ll Outdated Show resolved Hide resolved

sushgokh force-pushed the main branch from 8c7d119 to 82009cc Compare July 2, 2024 05:26

arsenm approved these changes Jul 2, 2024

View reviewed changes

sushgokh merged commit cd1e6a5 into llvm:main Jul 2, 2024
5 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SROA] Propagate no-signed-zeros(nsz) fast-math flag on the phi node using function attribute #83381

[SROA] Propagate no-signed-zeros(nsz) fast-math flag on the phi node using function attribute #83381

yashssh commented Feb 29, 2024 •

edited by sushgokh

Loading

llvmbot commented Feb 29, 2024

github-actions bot commented Feb 29, 2024 •

edited

Loading

yashssh commented Feb 29, 2024

nikic left a comment

dtcxzyw commented Mar 1, 2024

yashssh commented Mar 1, 2024

yashssh commented Mar 1, 2024

dtcxzyw commented Mar 1, 2024

andykaylor commented Mar 1, 2024

andykaylor commented Mar 1, 2024

yashssh commented Mar 13, 2024

andykaylor commented Mar 13, 2024

yashssh commented Mar 18, 2024

arsenm commented Jun 13, 2024

sushgokh commented Jun 19, 2024

arsenm commented Jun 19, 2024

sushgokh commented Jun 25, 2024 •

edited

Loading

jcranmer-intel commented Jun 28, 2024

arsenm left a comment

llvm-ci commented Jul 2, 2024

llvm-ci commented Jul 2, 2024

nunoplopes commented Jul 7, 2024

nikic commented Jul 7, 2024

nunoplopes commented Jul 7, 2024

[SROA] Propagate no-signed-zeros(nsz) fast-math flag on the phi node using function attribute #83381

[SROA] Propagate no-signed-zeros(nsz) fast-math flag on the phi node using function attribute #83381

Conversation

yashssh commented Feb 29, 2024 • edited by sushgokh Loading

llvmbot commented Feb 29, 2024

github-actions bot commented Feb 29, 2024 • edited Loading

yashssh commented Feb 29, 2024

nikic left a comment

Choose a reason for hiding this comment

dtcxzyw commented Mar 1, 2024

yashssh commented Mar 1, 2024

yashssh commented Mar 1, 2024

dtcxzyw commented Mar 1, 2024

andykaylor commented Mar 1, 2024

andykaylor commented Mar 1, 2024

yashssh commented Mar 13, 2024

andykaylor commented Mar 13, 2024

yashssh commented Mar 18, 2024

arsenm commented Jun 13, 2024

sushgokh commented Jun 19, 2024

arsenm commented Jun 19, 2024

sushgokh commented Jun 25, 2024 • edited Loading

jcranmer-intel commented Jun 28, 2024

arsenm left a comment

Choose a reason for hiding this comment

llvm-ci commented Jul 2, 2024

llvm-ci commented Jul 2, 2024

nunoplopes commented Jul 7, 2024

nikic commented Jul 7, 2024

nunoplopes commented Jul 7, 2024

yashssh commented Feb 29, 2024 •

edited by sushgokh

Loading

github-actions bot commented Feb 29, 2024 •

edited

Loading

sushgokh commented Jun 25, 2024 •

edited

Loading