Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SROA] Propagate no-signed-zeros(nsz) fast-math flag on the phi node using function attribute #83381

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

yashssh
Copy link
Contributor

@yashssh yashssh commented Feb 29, 2024

Its expected that the sequence return X > 0.0 ? X : -X, compiled with -Ofast, produces fabs intrinsic. However, at this point, LLVM is unable to do so.

The above sequence goes through the following transformation during the pass pipeline:

  1. SROA pass generates the phi node. Here, it does not infer the fast-math flags on the phi node unlike clang frontend typically does.
  2. Phi node eventually gets translated into select instruction.
    Because of missing no-signed-zeros(nsz) fast-math flag on the select instruction, InstCombine pass fails to fold the sequence into fabs intrinsic.

This patch, as a part of SROA, tries to propagate nsz fast-math flag on the phi node using function attribute enabling this folding.

Closes #51601

Co-authored-by: Sushant Gokhale sgokhale@nvidia.com

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 29, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Yashwant Singh (yashssh)

Changes

Based on these C/C++ patterns when compiled with 'Ofast' return X > 0.0 ? X : -X;
return X < 0.0 ? -X : X;

InstCombine tries to propogate FMF to 'select' instructions before attempting a fold, but it can't safely propgate 'nsz' hence wasn't performing the optimization. OOTH we should be able to do this optimization at 'Ofast' same as gcc(https://godbolt.org/z/c69fe5fa6).

Bit of a workaround but this patch allows us to query the "no-signed-zeroes" function attribute added to the function during 'Ofast' compilation. Allowing instcombine to safely match the idiom.


Full diff: https://github.com/llvm/llvm-project/pull/83381.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+5-1)
  • (modified) llvm/test/Transforms/InstCombine/fabs.ll (+27)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 71fa9b9ba41ebb..bd56d92d5d3d0f 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -2742,7 +2742,11 @@ static Instruction *foldSelectWithFCmpToFabs(SelectInst &SI,
     // Note: We require "nnan" for this fold because fcmp ignores the signbit
     //       of NAN, but IEEE-754 specifies the signbit of NAN values with
     //       fneg/fabs operations.
-    if (!SI.hasNoSignedZeros() || !SI.hasNoNaNs())
+    if (!SI.hasNoNaNs())
+      return nullptr;
+
+    bool functionHasNoSignedZeroes = SI.getParent()->getParent()->hasFnAttribute("no-signed-zeros-fp-math");
+    if(!functionHasNoSignedZeroes && !SI.hasNoSignedZeros())
       return nullptr;
 
     if (Swap)
diff --git a/llvm/test/Transforms/InstCombine/fabs.ll b/llvm/test/Transforms/InstCombine/fabs.ll
index 7e380c2e4590a0..88b02a852f3d74 100644
--- a/llvm/test/Transforms/InstCombine/fabs.ll
+++ b/llvm/test/Transforms/InstCombine/fabs.ll
@@ -547,6 +547,20 @@ define double @select_fcmp_nnan_nsz_ult_zero_unary_fneg(double %x) {
   ret double %fabs
 }
 
+
+define float @absfloat32f_olt_fast_no_signed_zeroes(float %x) "no-signed-zeros-fp-math" {
+; CHECK-LABEL: @absfloat32f_olt_fast_no_signed_zeroes(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = call nnan ninf float @llvm.fabs.f32(float [[X:%.*]])
+; CHECK-NEXT:    ret float [[RETVAL_0]]
+;
+entry:
+  %cmp = fcmp fast olt float %x, 0.000000e+00
+  %fneg = fneg fast float %x
+  %retval.0 = select i1 %cmp, float %fneg, float %x
+  ret float %retval.0
+}
+
 ; X < -0.0 ? -X : X --> fabs(X)
 
 define float @select_fcmp_nnan_nsz_olt_negzero(float %x) {
@@ -839,6 +853,19 @@ define <2 x float> @select_fcmp_nnan_nsz_ugt_zero_unary_fneg(<2 x float> %x) {
   ret <2 x float> %fabs
 }
 
+define float @absfloat32f_ogt_fast_no_signed_zeroes(float %x) "no-signed-zeros-fp-math" {
+; CHECK-LABEL: @absfloat32f_ogt_fast_no_signed_zeroes(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RETVAL_0:%.*]] = call nnan ninf float @llvm.fabs.f32(float [[X:%.*]])
+; CHECK-NEXT:    ret float [[RETVAL_0]]
+;
+entry:
+  %cmp = fcmp fast ogt float %x, 0.000000e+00
+  %fneg = fneg fast float %x
+  %retval.0 = select i1 %cmp, float %x, float %fneg
+  ret float %retval.0
+}
+
 ; X > -0.0 ? X : (0.0 - X) --> fabs(X)
 
 define half @select_fcmp_nnan_nsz_ogt_negzero(half %x) {

Copy link

github-actions bot commented Feb 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@yashssh
Copy link
Contributor Author

yashssh commented Feb 29, 2024

The original inspiration for the missed optimization can be seen here https://godbolt.org/z/z3Es5oxqv.
This seemed to be the most straightforward way to achieve this optimization and a detailed discussion about the issue and why this fold was skipped by 'instcombine' is documented here #51601

@goldsteinn goldsteinn assigned arsenm and unassigned arsenm Feb 29, 2024
@dtcxzyw dtcxzyw linked an issue Mar 1, 2024 that may be closed by this pull request
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 1, 2024

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

TBH I think fast-math flags should only be function-level attributes :(

@yashssh
Copy link
Contributor Author

yashssh commented Mar 1, 2024

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

TBH I think fast-math flags should only be function-level attributes :(

Will that be more efficient? What about function inlining when we end up losing these attributes?

@yashssh
Copy link
Contributor Author

yashssh commented Mar 1, 2024

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

That can be done but will be a separate work altogether, my understanding is somewhere early in the pipeline. I was also suggested that it would require allowing loads and stores to have fast flags so that they are safely reflected in phis and selects.

@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 1, 2024

What about function inlining when we end up losing these attributes?

Is there any real-world project whose translation units/functions are compiled with different fast math flags?

@andykaylor
Copy link

What about function inlining when we end up losing these attributes?

Is there any real-world project whose translation units/functions are compiled with different fast math flags?

We have pragmas that allow you to turn fast-math on an off locally, at a scope level, so even within a function the fast-math flags can change. I think this is an important capability. One of the top concerns I hear from customers who are using fast-math is that they need the ability to track down optimizations that are throwing their results too far out of range. A first step to tracking this down is to stop using fast-math for individual files or sets of files to bisect the problem down to which file is causing a failure. From there, it gets harder, but pragmas can help.

I'd like to introduce a change to the way we check fast-math flags that would let us insert some kind of debug counter to narrow it down in a way that would be similar to how opt-bisect works but effective at the individual transformation granularity. There are some obstacles to making that happen, but I think that would be an extremely beneficial capability. Once you find the place where an accuracy-related failure is being introduced, my idea is to have some way (probably based on pragmas) to turn fast-math off for that location only in the code without needing the debug counter. That's a bit of a pipe dream at this point, but I want to see us working towards that.

This is one of the reasons I don't like fast-math function attributes, but the way things currently work is that the attribute isn't set if the relevant fast-math flag is turned off anywhere in the function. The front end handles this with pragmas, but it means that the inliner needs to remove the attribute if inlining calls where the callee and caller don't both have it set.

@andykaylor
Copy link

I'm wondering whether we should be propagating the function-level attributes to instruction flags somewhere so that this is handled generically, rather than in individual folds?

The problem is in deciding where to do that. I don't like the function attributes and have made a few attempts to have them deprecated, but there are cases that couldn't be solved without them. This case, as detailed in #51601, is one such example. The front end can't set the fast-math flags on load instructions, so they never make it to the phi instruction that gets created. Propogating the flags from the attribute to instructions wouldn't help unless it was done after SROA and/or SimplifyCFG, but maybe those passes could add the flag based on the function-level attribute. That might be a better way to fix this.

@yashssh yashssh requested a review from madhur13490 March 4, 2024 03:38
@yashssh
Copy link
Contributor Author

yashssh commented Mar 13, 2024

I'm leaving this pull request open for now while simultaneously looking if I can set FMF in SROA, will return back with whatever I find.

@andykaylor
Copy link

I'm leaving this pull request open for now while simultaneously looking if I can set FMF in SROA, will return back with whatever I find.

This is obviously a contrived example, but here is a case that will not be optimized if you rely on the function attributes but could be optimized if the flags were propagated by SROA.

https://godbolt.org/z/senb3bEbn

@yashssh
Copy link
Contributor Author

yashssh commented Mar 18, 2024

Thanks for the example @andykaylor!

I tried a change that sets fast flag on phi nodes created by SROA

diff --git a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
index 88b05aab8db4..1d85c7e0d8a6 100644
--- a/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
+++ b/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
@@ -1056,6 +1056,22 @@ bool PromoteMem2Reg::QueuePhiNode(BasicBlock *BB, unsigned AllocaNo,
   // BasicBlock.
   PN = PHINode::Create(Allocas[AllocaNo]->getAllocatedType(), getNumPreds(BB),
                        Allocas[AllocaNo]->getName() + "." + Twine(Version++));
+
+  if(BB->getParent()->hasFnAttribute("unsafe-fp-math")){
+    PN->setFast(true);
+  }
+
   PN->insertBefore(BB->begin());
   ++NumPHIInsert;
   PhiToAllocaMap[PN] = AllocaNo;

Is this the direction you were hinting towards?

@arsenm
Copy link
Contributor

arsenm commented Jun 13, 2024

Should add the Fixes: issue number to the description

@sushgokh
Copy link
Contributor

ping @arsenm

@arsenm
Copy link
Contributor

arsenm commented Jun 19, 2024

I do think we should have the phase ordering test

@sushgokh
Copy link
Contributor

sushgokh commented Jun 25, 2024

I do think we should have the phase ordering test

Although I didn't have a PhaseOrdering test, I had initially added a test that showed the transformation after the concerned passes here and @jcranmer-intel insisted there that this is not required.

@jcranmer-intel
Copy link
Contributor

I do think we should have the phase ordering test

Although I didn't have a PhaseOrdering test, I had initially added a test that showed the transformation after the concerned passes here and @jcranmer-intel insisted there that this is not required.

The right way to do a phase ordering test is not the way you did it. There's a folder test/Transforms/PhaseOrdering directory where the phase-ordering tests go, and in general, you would probably want to check a test like opt -O1 to make sure that the passes are actually run in the order expected.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great but was the consensus on the ticket. The phase orderingtest could use a comment and rename though

llvm/test/Transforms/PhaseOrdering/fabs.ll Outdated Show resolved Hide resolved
…using function attribute

Its expected that the sequence return X > 0.0 ? X : -X, compiled with -Ofast, produces fabs intrinsic. However, at this point, LLVM is unable to do so.

The above sequence goes through the following transformation during the pass pipeline:

SROA pass generates the phi node. Here, it does not infer the fast-math flags on the phi node unlike clang frontend typically does.
Phi node eventually gets translated into select instruction.
Because of missing no-signed-zeros(nsz) fast-math flag on the select instruction, InstCombine pass fails to fold the sequence into fabs intrinsic.
This patch, as a part of SROA, tries to propagate nsz fast-math flag on the phi node using function attribute enabling this folding.

Co-authored-by: Sushant Gokhale <sgokhale@nvidia.com>
@sushgokh sushgokh merged commit cd1e6a5 into llvm:main Jul 2, 2024
5 of 7 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 2, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-ppc64le-linux running on ppc64le-sanitizer while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/683

Here is the relevant piece of the build log for the reference:

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
PASS: MemorySanitizer-POWERPC64LE :: recover-dso.cpp (2428 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: heap-origin.cpp (2429 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: check_mem_is_initialized.cpp (2430 of 2450)
XFAIL: MemorySanitizer-POWERPC64LE :: vararg_shadow.cpp (2431 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin2.cpp (2432 of 2450)
PASS: ScudoStandalone-Unit :: ./ScudoUnitTest-powerpc64le-Test/138/267 (2433 of 2450)
PASS: SanitizerCommon-tsan-powerpc64le-Linux :: max_allocation_size.cpp (2434 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin.cpp (2435 of 2450)
PASS: ThreadSanitizer-powerpc64le :: bench_threads.cpp (2436 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: recover.cpp (2437 of 2450)
FAIL: ThreadSanitizer-powerpc64le :: signal_reset.cpp (2438 of 2450)
******************** TEST 'ThreadSanitizer-powerpc64le :: signal_reset.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang  --driver-mode=g++ -fsanitize=thread -Wall  -m64 -fno-function-sections   -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp &&  /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -fno-function-sections -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:73:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: WARNING: ThreadSanitizer:
              ^
<stdin>:2:1: note: found here
WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744)
^~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        1: ================== 
        2: WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744) 
not:73     !~~~~~~~~~~~~~~~~~~~~~~~~                                            error: no match expected
        3:  Signal 27 handler invoked at: 
        4:  #0 handler(int) /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 (signal_reset.cpp.tmp+0x1019d0) 
        5:  
        6: SUMMARY: ThreadSanitizer: signal handler spoils errno /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 in handler(int) 
        7: ================== 
        8: DONE 
        9: ThreadSanitizer: reported 1 warnings 
>>>>>>

--

********************
Step 11 (test compiler-rt debug) failure: test compiler-rt debug (failure)
...
PASS: MemorySanitizer-POWERPC64LE :: recover-dso.cpp (2428 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: heap-origin.cpp (2429 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: check_mem_is_initialized.cpp (2430 of 2450)
XFAIL: MemorySanitizer-POWERPC64LE :: vararg_shadow.cpp (2431 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin2.cpp (2432 of 2450)
PASS: ScudoStandalone-Unit :: ./ScudoUnitTest-powerpc64le-Test/138/267 (2433 of 2450)
PASS: SanitizerCommon-tsan-powerpc64le-Linux :: max_allocation_size.cpp (2434 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: stack-origin.cpp (2435 of 2450)
PASS: ThreadSanitizer-powerpc64le :: bench_threads.cpp (2436 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: recover.cpp (2437 of 2450)
FAIL: ThreadSanitizer-powerpc64le :: signal_reset.cpp (2438 of 2450)
******************** TEST 'ThreadSanitizer-powerpc64le :: signal_reset.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang  --driver-mode=g++ -fsanitize=thread -Wall  -m64 -fno-function-sections   -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp &&  /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -fno-function-sections -gline-tables-only -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_powerpc64le/include/c++/v1 -O1 /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/tsan/POWERPC64LEConfig/Output/signal_reset.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp
/home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:73:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: WARNING: ThreadSanitizer:
              ^
<stdin>:2:1: note: found here
WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744)
^~~~~~~~~~~~~~~~~~~~~~~~~

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
        1: ================== 
        2: WARNING: ThreadSanitizer: signal handler spoils errno (pid=3639744) 
not:73     !~~~~~~~~~~~~~~~~~~~~~~~~                                            error: no match expected
        3:  Signal 27 handler invoked at: 
        4:  #0 handler(int) /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 (signal_reset.cpp.tmp+0x1019d0) 
        5:  
        6: SUMMARY: ThreadSanitizer: signal handler spoils errno /home/buildbots/llvm-external-buildbots/workers/ppc64le-sanitizer/sanitizer-ppc64le/build/llvm-project/compiler-rt/test/tsan/signal_reset.cpp:15 in handler(int) 
        7: ================== 
        8: DONE 
        9: ThreadSanitizer: reported 1 warnings 
>>>>>>

--

********************

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 2, 2024

LLVM Buildbot has detected a new failure on builder lldb-aarch64-windows running on linaro-armv8-windows-msvc-05 while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/141/builds/428

Here is the relevant piece of the build log for the reference:

Step 6 (test) failure: build (failure)
...
30.698 [1/2/288] Linking CXX executable tools\lldb\unittests\Thread\ThreadTests.exe
30.814 [1/1/289] Linking CXX executable tools\lldb\unittests\ValueObject\LLDBValueObjectTests.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:57: note: using lit tools: C:\Users\tcwg\scoop\apps\git\2.39.0.windows.2\usr\bin
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using clang: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using ld.lld: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld.lld.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using lld-link: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lld-link.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using ld64.lld: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld64.lld.exe
llvm-lit.py: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\llvm\utils\lit\lit\llvm\config.py:508: note: using wasm-ld: c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\wasm-ld.exe
30.814 [0/1/289] Running lldb-- Testing: 1983 tests, 2 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 
FAIL: lldb-shell :: SymbolFile/DWARF/x86/dwp-foreign-type-units.cpp (1616 of 1983)
******************** TEST 'lldb-shell :: SymbolFile/DWARF/x86/dwp-foreign-type-units.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 19
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf    -fdebug-types-section -gpubnames -c C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp -o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe' --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf -fdebug-types-section -gpubnames -c 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp' -o 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o'
# note: command had no output on stdout or stderr
# RUN: at line 21
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf -DVARIANT    -fdebug-types-section -gpubnames -c C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp -o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe' --target=specify-a-target-or-use-a-_host-substitution -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf -DVARIANT -fdebug-types-section -gpubnames -c 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp' -o 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o'
# note: command had no output on stdout or stderr
# RUN: at line 23
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld.lld.exe C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o -o C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\ld.lld.exe' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.main.o' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.foo.o' -o 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp'
# .---command stderr------------
# | ld.lld: warning: cannot find entry symbol _start; not setting start address
# `-----------------------------
# RUN: at line 26
rm -f C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.dwp
# executed command: rm -f 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp.dwp'
# note: command had no output on stdout or stderr
# RUN: at line 27
c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lldb.exe --no-lldbinit -S C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/tools/lldb\test\Shell\lit-lldb-init-quiet    -o "type lookup IntegerType"    -o "type lookup FloatType"    -o "type lookup CustomType"    -b C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp | c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\filecheck.exe C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp --check-prefix=NODWP
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\lldb.exe' --no-lldbinit -S 'C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/tools/lldb\test\Shell\lit-lldb-init-quiet' -o 'type lookup IntegerType' -o 'type lookup FloatType' -o 'type lookup CustomType' -b 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\tools\lldb\test\Shell\SymbolFile\DWARF\x86\Output\dwp-foreign-type-units.cpp.tmp'
# note: command had no output on stdout or stderr
# executed command: 'c:\users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\filecheck.exe' 'C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp' --check-prefix=NODWP
# .---command stderr------------
# | C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp:35:11: error: NODWP: expected string not found in input
# | // NODWP: (lldb) type lookup FloatType
# |           ^
# | <stdin>:20:22: note: scanning from here
# |  typedef unsigned int IntegerType;
# |                      ^
# | <stdin>:21:3: note: possible intended match here
# |  typedef float FloatType;
# |   ^

nikic added a commit to nikic/llvm-project that referenced this pull request Jul 2, 2024
llvm#83381 introduced a call
to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number
of predecessors, resulting in pathological compile-time regressions
for cases with many predecessors.

I believe this was intended as a compile-time optimization in the
first place, so just drop the check, and always try to transfer
the attribute.
nikic added a commit that referenced this pull request Jul 2, 2024
#83381 introduced a call
to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number
of predecessors, resulting in pathological compile-time regressions
for cases with many predecessors.

Remove the isComplete() check and instead cache the attribute
lookup, to only perform it once per function. Actually setting
the FMF flag is cheap.
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
…using function attribute (llvm#83381)

Its expected that the sequence `return X > 0.0 ? X : -X`, compiled with
-Ofast, produces fabs intrinsic. However, at this point, LLVM is unable
to do so.

The above sequence goes through the following transformation during the
pass pipeline:
1) SROA pass generates the phi node. Here, it does not infer the
fast-math flags on the phi node unlike clang frontend typically does.
2) Phi node eventually gets translated into select instruction. 
Because of missing no-signed-zeros(nsz) fast-math flag on the select
instruction, InstCombine pass fails to fold the sequence into fabs
intrinsic.

This patch, as a part of SROA, tries to propagate nsz fast-math flag on
the phi node using function attribute enabling this folding.

Closes llvm#51601

Co-authored-by: Sushant Gokhale <sgokhale@nvidia.com>
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
llvm#83381 introduced a call
to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number
of predecessors, resulting in pathological compile-time regressions
for cases with many predecessors.

Remove the isComplete() check and instead cache the attribute
lookup, to only perform it once per function. Actually setting
the FMF flag is cheap.
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
…using function attribute (llvm#83381)

Its expected that the sequence `return X > 0.0 ? X : -X`, compiled with
-Ofast, produces fabs intrinsic. However, at this point, LLVM is unable
to do so.

The above sequence goes through the following transformation during the
pass pipeline:
1) SROA pass generates the phi node. Here, it does not infer the
fast-math flags on the phi node unlike clang frontend typically does.
2) Phi node eventually gets translated into select instruction. 
Because of missing no-signed-zeros(nsz) fast-math flag on the select
instruction, InstCombine pass fails to fold the sequence into fabs
intrinsic.

This patch, as a part of SROA, tries to propagate nsz fast-math flag on
the phi node using function attribute enabling this folding.

Closes llvm#51601

Co-authored-by: Sushant Gokhale <sgokhale@nvidia.com>
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
llvm#83381 introduced a call
to PHINode::isComplete() in Mem2Reg, which is O(n^2) in the number
of predecessors, resulting in pathological compile-time regressions
for cases with many predecessors.

Remove the isComplete() check and instead cache the attribute
lookup, to only perform it once per function. Actually setting
the FMF flag is cheap.
@nunoplopes
Copy link
Member

Alive2 complains about this commit:

define double @fabs_fcmp_olt_nsz_func_attr(double %#0, double %#1) {
entry:
  %x = alloca i64 8, align 8
  store double %#0, ptr %x, align 8
  %cmp = fcmp nnan nsz olt double %#0, 0.000000
  br i1 %cmp, label %if.then, label %return

if.then:
  %fneg = fneg nnan nsz double %#0
  store double %fneg, ptr %x, align 8
  br label %return

return:
  %ret = load double, ptr %x, align 8
  ret double %ret
}
=>
define double @fabs_fcmp_olt_nsz_func_attr(double %#0, double %#1) {
entry:
  %cmp = fcmp nnan nsz olt double %#0, 0.000000
  br i1 %cmp, label %if.then, label %return

if.then:
  %fneg = fneg nnan nsz double %#0
  br label %return

return:
  %x.0 = phi nsz double [ %fneg, %if.then ], [ %#0, %entry ]
  ret double %x.0
}
Transformation doesn't verify! (unsound)
ERROR: Target's return value is more undefined

Example:
double %#0 = #x0000000000000000 (+0.0)
double %#1 = poison

Source:
ptr %x = pointer(local, block_id=0, offset=0)
i1 %cmp = #x0 (0)
  >> Jump to %return
double %ret = #x0000000000000000 (+0.0)

Target:
i1 %cmp = #x0 (0)
  >> Jump to %return
double %x.0 = #x8000000000000000 (-0.0)
Source value: #x0000000000000000 (+0.0)
Target value: #x8000000000000000 (-0.0)

@nikic
Copy link
Contributor

nikic commented Jul 7, 2024

@nunoplopes As far as I can tell, alive2 does not have support for the no-signed-zeros-fp-math attribute, so I think that's expected.

@nunoplopes
Copy link
Member

@nunoplopes As far as I can tell, alive2 does not have support for the no-signed-zeros-fp-math attribute, so I think that's expected.

Ah, thanks, I missed that.
To my defense, it's not documented in LangRef 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet