Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llc aborts, clang error 70 on specific PPC64 CTR Loop bitcode at -O1 and beyond #36398

Closed
llvmbot opened this issue Apr 9, 2018 · 22 comments
Closed
Assignees
Labels
bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 9, 2018

Bugzilla Link 37050
Resolution FIXED
Resolved on May 07, 2018 19:52
Version trunk
OS Linux
Blocks #35997
Attachments Abort/error-inducing bitcode
Reporter LLVM Bugzilla Contributor
CC @echristo,@zmodem,@hfinkel,@nemanjai,@tstellar
Fixed by commit(s) r328400 r329335 r329355 r329573 r330186 r331714 r331715 r331716

Extended Description

While debugging an "Invalid PPC CTR loop!" problem with LLVM JIT codegen for Numba (numba/numba#2451 (comment)), I managed to isolate an ugly, 186KB LLVM IR module that caused both llc and clang-7.0.0svn trunk to abort() with that "Invalid PPC CTR loop!" message.

Bugpoint then stripped it down to 1.6KB of bitcode that causes Clang to error out (exit code 70) instead, but llc still aborts in the same way.

Bitcode is attached. Console printouts below. All this is also viewable at the Numba Github link above.

The Clang error printout:

$ clang -O3 -v -c crashir-reduced.bc -o crashir-reduced.bc.o
clang version 7.0.0 (https://git.llvm.org/git/clang.git/ f38834898ad991aec557e2a609c6b7c3b288bf10) (https://git.llvm.org/git/llvm.git/ 32d2ff0)
Target: powerpc64le-unknown-linux-gnu
Thread model: posix
InstalledDir: /.local/bin
Found candidate GCC installation: /usr/lib/gcc/ppc64le-redhat-linux/4.8.2
Found candidate GCC installation: /usr/lib/gcc/ppc64le-redhat-linux/4.8.5
Selected GCC installation: /usr/lib/gcc/ppc64le-redhat-linux/4.8.5
Candidate multilib: .;@m64
Selected multilib: .;@m64
"/.local/bin/clang-7" -cc1 -triple powerpc64le-unknown-linux-gnu -emit-obj -disable-free -main-file-name crashir-reduced.bc -mrelocation-model pic -pic-level 2 -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu ppc64le -mfloat-abi hard -target-abi elfv2 -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -v -coverage-notes-file /NUMBA_SOURCE_PATH/bugpointwdir/crashir-reduced.bc.gcno -resource-dir /.local/lib/clang/7.0.0 -O3 -fdebug-compilation-dir /NUMBA_SOURCE_PATH/bugpointwdir -ferror-limit 19 -fmessage-length 184 -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o crashir-reduced.bc.o -x ir crashir-reduced.bc
clang -cc1 version 7.0.0 based upon LLVM 7.0.0svn default target powerpc64le-unknown-linux-gnu
fatal error: error in backend: Cannot select: intrinsic %llvm.ppc.is.decremented.ctr.nonzero
clang-7: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 7.0.0 (https://git.llvm.org/git/clang.git/ f38834898ad991aec557e2a609c6b7c3b288bf10) (https://git.llvm.org/git/llvm.git/ 32d2ff0)
Target: powerpc64le-unknown-linux-gnu
Thread model: posix
InstalledDir: /.local/bin
clang-7: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-7: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.

The llc printout:

$ llc -O3 crashir-reduced.bc -o crashir-reduced.bc.o
Invalid PPC CTR loop!
UNREACHABLE executed at /LLVM_SOURCE_PATH/lib/Target/PowerPC/PPCCTRLoops.cpp:777!
LLVMSymbolizer: error reading file: No such file or directory
#​0 0x000000001271c7e0 (llc+0x1271c7e0)
#​1 0x000000001271c8e4 (llc+0x1271c8e4)
#​2 0x000000001271a568 (llc+0x1271a568)
#​3 0x000000001271bd88 (llc+0x1271bd88)
#​4 0x00003fff9ab50478 0x478 __GI_abort
#​5 0x00003fff9ab50478
#​6 0x00003fff9ab50478 (+0x478)
#​7 0x00003fff9a610d70 (/lib64/libc.so.6+0x40d70)
#​8 0x00000000126666e8 (llc+0x126666e8)
#​9 0x0000000010c4bcd4 (llc+0x10c4bcd4)
#​10 0x0000000011555c24 (llc+0x11555c24)
#​11 0x0000000011c87390 (llc+0x11c87390)
#​12 0x0000000011c875f4 (llc+0x11c875f4)
#​13 0x0000000011c87b80 (llc+0x11c87b80)
#​14 0x0000000011c884ec (llc+0x11c884ec)
#​15 0x0000000011c888c8 (llc+0x11c888c8)
#​16 0x0000000010b6ceac generic_start_main.isra.0 (llc+0x10b6ceac)
#​17 0x0000000010b6b3b8 __libc_start_main (llc+0x10b6b3b8)
llc(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x58)[0x1271c7e0]
llc[0x1271c8e4]
llc(_ZN4llvm3sys17RunSignalHandlersEv+0xb4)[0x1271a568]
llc[0x1271bd88]
[0x3fff9ab50478]
/lib64/libc.so.6(abort+0x280)[0x3fff9a610d70]
llc(_ZN4llvm25llvm_unreachable_internalEPKcS1_j+0x104)[0x126666e8]
llc[0x10c4bcd4]
llc(_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE+0x22c)[0x11555c24]
llc(_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE+0x170)[0x11c87390]
llc(_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE+0x74)[0x11c875f4]
llc[0x11c87b80]
llc(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x164)[0x11c884ec]
llc(_ZN4llvm6legacy11PassManager3runERNS_6ModuleE+0x38)[0x11c888c8]
llc[0x10b6ceac]
llc(main+0x580)[0x10b6b3b8]
/lib64/libc.so.6(+0x24700)[0x3fff9a5f4700]
/lib64/libc.so.6(__libc_start_main+0xc4)[0x3fff9a5f48f4]
Stack dump:
0. Program arguments: llc -O3 crashir-reduced.bc -o crashir-reduced.bc.o

  1.  Running pass 'Function Pass Manager' on module 'crashir-reduced.bc'.
    
  2.  Running pass 'PowerPC CTR Loops Verify' on function '@"_ZN5numba5tests7timsort12merge_lo$244E149MergeState$28int64$2c$20array$28int32$2c$201d$2c$20C$29$2c$20array$28int32$2c$201d$2c$20C$29$2c$20list$28MergeRun$28int64$20x$202$29$29$2c$20int64$295ArrayIiLi1E1C7mutable7alignedE5ArrayIiLi1E1C7mutable7alignedExxxx"'
    

Aborted

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 9, 2018

assigned to @nemanjai

@nemanjai
Copy link
Member

nemanjai commented Apr 9, 2018

I'll have a look at this.

@nemanjai
Copy link
Member

This patch should fix the problem. Please let me know whether it does. Of course, it won't fix the attached test case because the attached test case is broken already (i.e. there's no mtctr for the bdnz). But it should prevent CTRLoops followed by the opt pipeline from producing such code.

Index: include/llvm/IR/IntrinsicsPowerPC.td

--- include/llvm/IR/IntrinsicsPowerPC.td (revision 329903)
+++ include/llvm/IR/IntrinsicsPowerPC.td (working copy)
@@ -36,8 +36,9 @@

// Intrinsics used to generate ctr-based loops. These should only be
// generated by the PowerPC backend!

  • def int_ppc_mtctr : Intrinsic<[], [llvm_anyint_ty], []>;
  • def int_ppc_is_decremented_ctr_nonzero : Intrinsic<[llvm_i1_ty], [], []>;
  • def int_ppc_mtctr : Intrinsic<[], [llvm_anyint_ty], [IntrNoDuplicate]>;

  • def int_ppc_is_decremented_ctr_nonzero : Intrinsic<[llvm_i1_ty], [],

  •                                                 [IntrNoDuplicate]>;
    

    // Intrinsics for [double]word extended forms of divide instructions
    def int_ppc_divwe : GCCBuiltin<"__builtin_divwe">,

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 13, 2018

@​Nemanja Your proposed patch does indeed fix the testcase that produced this invalid bitcode. It will take several hours to verify that all the other testcases remain unbroken, but I don't expect any new failures and the odds are you've solved the problem.

Thank you!

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 14, 2018

The entire Numba testsuite passes with no aborts, segfaults or (relevant) failures. Your change solves the problem for Numba.

@hfinkel
Copy link
Collaborator

hfinkel commented Apr 14, 2018

This patch should fix the problem. Please let me know whether it does. Of
course, it won't fix the attached test case because the attached test case
is broken already (i.e. there's no mtctr for the bdnz). But it should
prevent CTRLoops followed by the opt pipeline from producing such code.

Index: include/llvm/IR/IntrinsicsPowerPC.td

--- include/llvm/IR/IntrinsicsPowerPC.td (revision 329903)
+++ include/llvm/IR/IntrinsicsPowerPC.td (working copy)
@@ -36,8 +36,9 @@

// Intrinsics used to generate ctr-based loops. These should only be
// generated by the PowerPC backend!

  • def int_ppc_mtctr : Intrinsic<[], [llvm_anyint_ty], []>;
  • def int_ppc_is_decremented_ctr_nonzero : Intrinsic<[llvm_i1_ty], [], []>;
  • def int_ppc_mtctr : Intrinsic<[], [llvm_anyint_ty], [IntrNoDuplicate]>;

  • def int_ppc_is_decremented_ctr_nonzero : Intrinsic<[llvm_i1_ty], [],

  •                                                 [IntrNoDuplicate]>;
    

    // Intrinsics for [double]word extended forms of divide instructions
    def int_ppc_divwe : GCCBuiltin<"__builtin_divwe">,

Do we need both intrinsics marked as NoDuplicate, or just int_ppc_is_decremented_ctr_nonzero? (it's not obvious to me that mtctr is special in the same way).

@nemanjai
Copy link
Member

The reason I marked both with the property is in case something attempts to duplicate one and not the other in some unexpected way. I am not sure if this could ever happen but if it did, what would the expected outcome be?

I suppose if there is a guarantee that the new call to the mtctr intrinsic cannot possibly end up in the region between the branch target of the BDNZ and the BDNZ instruction itself, it is perfectly safe not to add the property to the mtctr. The provided test case certainly doesn't fail if the property is only on the bdnz intrinsic.

Hal, do you know if there's a specific advantage to not marking the mtctr intrinsic as NoDuplicate? Or is it just that we'd prefer not to mark it if there is no clear requirement to do so?

Olexa, do you think you'd be able to test out the patch below to see if it is adequate to fix the Numba failures?

Index: include/llvm/IR/IntrinsicsPowerPC.td

--- include/llvm/IR/IntrinsicsPowerPC.td (revision 329903)
+++ include/llvm/IR/IntrinsicsPowerPC.td (working copy)
@@ -37,7 +37,8 @@ let TargetPrefix = "ppc" in { // All intrinsics s
// Intrinsics used to generate ctr-based loops. These should only be
// generated by the PowerPC backend!
def int_ppc_mtctr : Intrinsic<[], [llvm_anyint_ty], []>;

  • def int_ppc_is_decremented_ctr_nonzero : Intrinsic<[llvm_i1_ty], [], []>;
  • def int_ppc_is_decremented_ctr_nonzero : Intrinsic<[llvm_i1_ty], [],

  •                                                 [IntrNoDuplicate]>;
    

    // Intrinsics for [double]word extended forms of divide instructions
    def int_ppc_divwe : GCCBuiltin<"__builtin_divwe">,

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 15, 2018

I can try out that patch, yes. I take it it's just the second change of your first patch?

The PPCCTRLoopsVerify, at https://llvm.org/doxygen/PPCCTRLoops_8cpp_source.html , checks that, and I quote,

// Verify that all bdnz/bdz instructions are dominated by a loop mtctr before
// any other instructions that might clobber the ctr register.

From my foggy understanding of LLVM, it seems to me that the joint combination of a ppc_is_decremented_ctr_nonzero and a conditional branch are clearly intended to map down to a BDNZ. If so, then:

  • Duplication of ppc_mtctr cannot weaken domination of any given ppc_is_decremented_ctr_nonzero, while
  • Duplication of ppc_is_decremented_ctr_nonzero cannot strengthen domination by any given ppc_mtctr.

Observing that

  • Allowing both to be duplicated, as currently happens, causes aborts.
  • Allowing neither to be duplicated does not abort.

I reason that duplicating mtctr only, while not duplicating ppc_is_decremented_ctr_nonzero, results in strictly stronger domination than duplicating neither. Since duplicating neither does not abort, I would expect your proposed change not to abort either.

I will try this out immediately.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 15, 2018

Early signs are encouraging. All of the specific testcases that used to abort when both were duplicated, and were fixed when neither were duplicated, still pass when only mtctr is duplicated.

I'll launch the entire testsuite now, but I am more-or-less convinced (as I wrote above) that this will work.

@hfinkel
Copy link
Collaborator

hfinkel commented Apr 15, 2018

The reason I marked both with the property is in case something attempts to
duplicate one and not the other in some unexpected way. I am not sure if
this could ever happen but if it did, what would the expected outcome be?

I suppose if there is a guarantee that the new call to the mtctr intrinsic
cannot possibly end up in the region between the branch target of the BDNZ
and the BDNZ instruction itself, it is perfectly safe not to add the
property to the mtctr. The provided test case certainly doesn't fail if the
property is only on the bdnz intrinsic.

Hal, do you know if there's a specific advantage to not marking the mtctr
intrinsic as NoDuplicate? Or is it just that we'd prefer not to mark it if
there is no clear requirement to do so?

I don't have anything specific in mind, so mostly the latter. Also, mtctr doesn't seem that special.

By the way, what transformation is duplicating the intrinsic? Is it something we'd like to prevent? Is the bug in the verifier? Setting NoDuplicate on these, for example, will prevent unrolling an outer loop with a ctr-based inner loop inside it, and at least that doesn't seem unsound.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 16, 2018

I ran the entire Numba testsuite, and it finished with identical results as with your previous patch: No segfaults, aborts or (relevant) failures.

It appears that it is enough to prevent duplication of ppc_is_decremented_ctr_nonzero.

@nemanjai
Copy link
Member

I ran the entire Numba testsuite, and it finished with identical results as
with your previous patch: No segfaults, aborts or (relevant) failures.

It appears that it is enough to prevent duplication of
ppc_is_decremented_ctr_nonzero.

Excellent news. If Hal agrees, I'll commit that version of the patch. Do you need it backported to LLVM 6?

@nemanjai
Copy link
Member

I don't have anything specific in mind, so mostly the latter. Also, mtctr
doesn't seem that special.

By the way, what transformation is duplicating the intrinsic? Is it
something we'd like to prevent? Is the bug in the verifier? Setting
NoDuplicate on these, for example, will prevent unrolling an outer loop with
a ctr-based inner loop inside it, and at least that doesn't seem unsound.

It was loop rotation that was causing the break. I agree that unrolling should be fine to do (as long as it copies the mtctr in the preheader as well as the bdnz in the latch). However, I don't know how we would encapsulate the "OK to duplicate if the corresponding mtctr is duplicated as well" constraint.

@nemanjai
Copy link
Member

I can try out that patch, yes. I take it it's just the second change of your
first patch?

The PPCCTRLoopsVerify, at
https://llvm.org/doxygen/PPCCTRLoops_8cpp_source.html , checks that, and I
quote,

// Verify that all bdnz/bdz instructions are dominated by a loop mtctr
before
// any other instructions that might clobber the ctr register.

From my foggy understanding of LLVM, it seems to me that the joint
combination of a ppc_is_decremented_ctr_nonzero and a conditional branch are
clearly intended to map down to a BDNZ. If so, then:

  • Duplication of ppc_mtctr cannot weaken domination of any given
    ppc_is_decremented_ctr_nonzero, while
  • Duplication of ppc_is_decremented_ctr_nonzero cannot strengthen domination
    by any given ppc_mtctr.

Observing that

  • Allowing both to be duplicated, as currently happens, causes aborts.
  • Allowing neither to be duplicated does not abort.

I reason that duplicating mtctr only, while not duplicating
ppc_is_decremented_ctr_nonzero, results in strictly stronger domination than
duplicating neither. Since duplicating neither does not abort, I would
expect your proposed change not to abort either.

I will try this out immediately.

I agree that the dominance relation is guaranteed to hold. My initial thought was that since I don't know the loop optimization pipeline well enough, I cannot be sure that we won't produce invalid CTR loops. I was just afraid we would produce a situation such as this:

mtctr
...

BT:
...
mtctr
...
bdnz BT

But clearly this does not happen or you'd likely see run-time problems such as infinite loops, incorrect numbers of iterations, etc.

@hfinkel
Copy link
Collaborator

hfinkel commented Apr 16, 2018

I don't have anything specific in mind, so mostly the latter. Also, mtctr
doesn't seem that special.

By the way, what transformation is duplicating the intrinsic? Is it
something we'd like to prevent? Is the bug in the verifier? Setting
NoDuplicate on these, for example, will prevent unrolling an outer loop with
a ctr-based inner loop inside it, and at least that doesn't seem unsound.

It was loop rotation that was causing the break. I agree that unrolling
should be fine to do (as long as it copies the mtctr in the preheader as
well as the bdnz in the latch). However, I don't know how we would
encapsulate the "OK to duplicate if the corresponding mtctr is duplicated as
well" constraint.

Sounds good. Yes, please commit this (with some comment about preventing loop rotation, etc.) - and with a test case if possible. Thanks!

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 16, 2018

Excellent news. If Hal agrees, I'll commit that version of the patch. Do you
need it backported to LLVM 6?

Yes please. In fact, the following need backporting to LLVM 6.0.1 in order to enable Numba to claim support for PowerPC:

  • This bug's proposed patch.
  • r328400 (Involves upgrading R_PPC64_REL32 relocations to longer-range R_PPC64_REL64 in .eh_frame sections under large code-model)
  • r329335, r329355, r329573 (Relates to incorrect relocations of R_PPC64_REL24 calls targeting the local, rather than global, entry point of a weakly-defined function in a different module).

This would mean a lot for the Numba guys, and PowerPC more broadly. At present they make the JITing and execution of linkonce_odr functions spread across several modules quite simply unusable.

Thanks ahead of time!

@nemanjai
Copy link
Member

Hi Hal, Hans,

This is one of a series of issues that the Numba PPC port has uncovered and that have now been fixed. It is rather important for us to port the fixes back to LLVM 6. Below is a list of commits and their quick descriptions.

r330186 - The fix for this bug
r328400 - Use 64-bit relocations with the large code model
r329335 - Use global entry points when calling a strong function definition
in another JIT module and the current module contains a weak definition
r329355 - Test case for the above
r329573 - Fix for the above test case

I could open separate PR's for the relocations and global entry points if you'd prefer or I can commit these to the release_60 branch if that helps.

@hfinkel
Copy link
Collaborator

hfinkel commented Apr 19, 2018

Hi Hal, Hans,

This is one of a series of issues that the Numba PPC port has uncovered and
that have now been fixed. It is rather important for us to port the fixes
back to LLVM 6. Below is a list of commits and their quick descriptions.

r330186 - The fix for this bug
r328400 - Use 64-bit relocations with the large code model
r329335 - Use global entry points when calling a strong function definition
in another JIT module and the current module contains a weak
definition
r329355 - Test case for the above
r329573 - Fix for the above test case

I could open separate PR's for the relocations and global entry points if
you'd prefer or I can commit these to the release_60 branch if that helps.

Makes sense to me.

@nemanjai
Copy link
Member

Hi Tom,

the patches listed in comment 16 are approved by the code owner (Hal). Thanks Hal.

Please let me know if there's anything else you'd like me to do with regard to merging these.

@llvmbot
Copy link
Collaborator Author

llvmbot commented May 7, 2018

Any movement on backporting the above 5 patches to 6.0.1?

Once we know that LLVM 6.0.1 will contain the fixes, it would enable Numba to claim full support for PPC64 as of LLVM 6.0.1. Otherwise, we will have to wait till LLVM 7.0.0 and retest against any of its API/ABI-breaking changes.

@tstellar
Copy link
Collaborator

tstellar commented May 8, 2018

I've merged these into the release_60 branch: r331714, r331715, r331716.

@llvmbot
Copy link
Collaborator Author

llvmbot commented May 8, 2018

Thank you so much Tom!

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants