Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUST_BACKTRACE=full loop with -Cpanic=abort on aarch64-unknown-linux-gnu #123733

Open
cuviper opened this issue Apr 10, 2024 · 25 comments
Open

RUST_BACKTRACE=full loop with -Cpanic=abort on aarch64-unknown-linux-gnu #123733

cuviper opened this issue Apr 10, 2024 · 25 comments
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows C-bug Category: This is a bug. O-AArch64 Armv8-A or later processors in AArch64 mode P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@cuviper
Copy link
Member

cuviper commented Apr 10, 2024

When compiled with -Cpanic=abort, a program as simple as fn main() { panic!(); } gets into a loop with RUST_BACKTRACE=full on aarch64-unknown-linux-gnu:

$ echo 'fn main() { panic!() }' | rustc - -Cpanic=abort && RUST_BACKTRACE=full ./rust_out
thread 'main' panicked at <anon>:1:13:
explicit panic
stack backtrace:
   0:     0xaaaad44ee198 - std::backtrace_rs::backtrace::libunwind::trace::hfa92c98074c33c6e
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
   1:     0xaaaad44ee198 - std::backtrace_rs::backtrace::trace_unsynchronized::h4fca41224fd18d13
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0xaaaad44ee198 - std::sys_common::backtrace::_print_fmt::hbb1869018263cc3e
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/sys_common/backtrace.rs:68:5
   3:     0xaaaad44ee198 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hfa121140e91bdb74
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/sys_common/backtrace.rs:44:22
   4:     0xaaaad45074b8 - core::fmt::rt::Argument::fmt::h0cc54cf87d70c86b
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/core/src/fmt/rt.rs:142:9
   5:     0xaaaad45074b8 - core::fmt::write::hb74970315812417b
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/core/src/fmt/mod.rs:1153:17
   6:     0xaaaad44ec398 - std::io::Write::write_fmt::h4028ee7f77ba6e46
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/io/mod.rs:1843:15
   7:     0xaaaad44edfe0 - std::sys_common::backtrace::_print::ha41833e087a85e12
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/sys_common/backtrace.rs:47:5
   8:     0xaaaad44edfe0 - std::sys_common::backtrace::print::h32ae40bc3201cfc9
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/sys_common/backtrace.rs:34:9
   9:     0xaaaad44ef1e8 - std::panicking::default_hook::{{closure}}::h46d22ee496490b96
  10:     0xaaaad44eee50 - std::panicking::default_hook::h8a05bff9b9ebb786
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/panicking.rs:291:9
  11:     0xaaaad44ef628 - std::panicking::rust_panic_with_hook::h72b4980bd319f2cc
                               at /rustc/8b2459c1f21187f9792d99310171a15e64feb9cf/library/std/src/panicking.rs:788:13
  12:     0xaaaad44d6bb4 - std::panicking::begin_panic::{{closure}}::h6b7a2e6dc54500da
  13:     0xaaaad44d6bb4 - std::panicking::begin_panic::{{closure}}::h6b7a2e6dc54500da
  14:     0xaaaad44d6bb4 - std::panicking::begin_panic::{{closure}}::h6b7a2e6dc54500da
  15:     0xaaaad44d6bb4 - std::panicking::begin_panic::{{closure}}::h6b7a2e6dc54500da
...

With RUST_BACKTRACE=1, it has no backtrace at all:

thread 'main' panicked at <anon>:1:13:
explicit panic
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Aborted (core dumped)

Current toolchain:

rustc 1.79.0-nightly (8b2459c1f 2024-04-09)
binary: rustc
commit-hash: 8b2459c1f21187f9792d99310171a15e64feb9cf
commit-date: 2024-04-09
host: aarch64-unknown-linux-gnu
release: 1.79.0-nightly
LLVM version: 18.1.3

Using cargo bisect-rustc, this problem seems to be quite old:

********************************************************************************
Regression in nightly-2022-07-03
********************************************************************************

fetching https://static.rust-lang.org/dist/2022-07-02/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2022-07-02: 40 B / 40 B [============================================================] 100.00 % 229.04 KB/s converted 2022-07-02 to 46b8c23f3eb5e4d0e0aa27eb3f20d5b8fc3ed51f
fetching https://static.rust-lang.org/dist/2022-07-03/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2022-07-03: 40 B / 40 B [==============================================================] 100.00 % 1.88 MB/s converted 2022-07-03 to f2d93935ffba3ab9d7ccb5300771a2d29b4c8bf3
looking for regression commit between 2022-07-02 and 2022-07-03
fetching (via remote github) commits from max(46b8c23f3eb5e4d0e0aa27eb3f20d5b8fc3ed51f, 2022-06-30) to f2d93935ffba3ab9d7ccb5300771a2d29b4c8bf3
ending github query because we found starting sha: 46b8c23f3eb5e4d0e0aa27eb3f20d5b8fc3ed51f
get_commits_between returning commits, len: 8
  commit[0] 2022-07-01: Auto merge of #93967 - cjgillot:short-struct-span, r=petrochenkov
  commit[1] 2022-07-01: Auto merge of #98781 - GuillaumeGomez:rollup-798kb8u, r=GuillaumeGomez
  commit[2] 2022-07-02: Auto merge of #98791 - cuviper:rogue-binary, r=compiler-errors
  commit[3] 2022-07-02: Auto merge of #98802 - Dylan-DPC:rollup-u6mwx27, r=Dylan-DPC
  commit[4] 2022-07-02: Auto merge of #91743 - cjgillot:enable_mir_inlining_inline_all, r=oli-obk
  commit[5] 2022-07-02: Auto merge of #97235 - nbdd0121:unwind, r=Amanieu
  commit[6] 2022-07-02: Auto merge of #97585 - lqd:const-alloc-intern, r=RalfJung
  commit[7] 2022-07-02: Auto merge of #98820 - RalfJung:rollup-i3mip9a, r=RalfJung
ERROR: no CI builds available between 46b8c23f3eb5e4d0e0aa27eb3f20d5b8fc3ed51f and f2d93935ffba3ab9d7ccb5300771a2d29b4c8bf3 within last 167 days

Comparison: 46b8c23...f2d9393

#97235 looks like the most obvious candidate, cc @nbdd0121 @Amanieu

The reason I'm noticing this now is because there's a new cargo test panic_abort_doc_tests from rust-lang/cargo#13388, which I found hanging on Fedora aarch64 in a scratch build of 1.78-beta. After about 2 hours, rustdoc got a SIGKILL, which I assume is an OOM from capturing all that backtrace output. When reproducing this hang, I found that running its RUST_BACKTRACE=full /tmp/rustdoctestGHjZyP/rust_out myself showed the backtrace loop.


This is similar to #123686, but that proposed fix only applies to Windows.

@cuviper cuviper added A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows regression-from-stable-to-stable Performance or correctness regression from one stable version to another. C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 10, 2024
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Apr 10, 2024
@saethlin
Copy link
Member

I think this was already reported? #123686

@cuviper
Copy link
Member Author

cuviper commented Apr 10, 2024

@saethlin I mentioned that one at the end:

This is similar to #123686, but that proposed fix only applies to Windows.

and AFAICS that report doesn't depend on -Cpanic=abort either.

@cuviper
Copy link
Member Author

cuviper commented Apr 10, 2024

It might be a duplicate of #121817, cc @wesleywiser

@nbdd0121
Copy link
Contributor

#97235 adds a lint (which isn't even executed with panic=abort), and should have no codegen changes.

The current behaviour doesn't surprise me since we neither force frame pointers, nor force unwind tables. To generate the stack trace usually what happens is that eh_frame to unwind the frames is used if it exists, otherwise frame pointers are used to continue unwind the frames. We build the standard library with -Cpanic=unwind so the eh_frame is available, but past begin_panic nothing is available anymore.

Adding -Cforce-frame-pointers=yes allows termination (still no traces beyond begin_panic, and adding -Cforce-unwind-tables=yes produces a property stack trace.

@cuviper
Copy link
Member Author

cuviper commented Apr 10, 2024

#97235 adds a lint (which isn't even executed with panic=abort), and should have no codegen changes.

OK, I don't know. The bisected nightly works fine when I build it myself, which implies that it's sensitive to the particular configuration, but that makes it hard to try and narrow this further. Or if the bug only arose there because of happenstance code changes, the root cause might not be in that range at all.

@saethlin
Copy link
Member

I can reproduce the infinite backtrace with 2020-03-01 and the target is gone in 2020-01-01 so I suspect that this bug has been in the target since we started distributing artifacts for it.

I cannot reproduce the infinite backtrace printing with -Zbuild-std. I suspect the problem has something to do with the fact that the standard library is precompiled with -Cpanic=unwind then we're linking to it with -Cpanic=abort, and I don't know how to imitate that with -Zbuild-std.

The behavior of the MIR inliner is sensitive to the stable crate hash, so if the behavior of the MIR inliner can impact whether the fatal optimization happens, it may not be possible to bisect this.

@saethlin
Copy link
Member

saethlin commented Apr 10, 2024

The current behaviour doesn't surprise me since we neither force frame pointers, nor force unwind tables. To generate the stack trace usually what happens is that eh_frame to unwind the frames is used if it exists, otherwise frame pointers are used to continue unwind the frames. We build the standard library with -Cpanic=unwind so the eh_frame is available, but past begin_panic nothing is available anymore.

To be clear, the behavior seems wrong to me even using -Zbuild-std:

ubuntu@ip-172-31-13-98:~/demo$ RUSTFLAGS=-Cpanic=abort RUST_BACKTRACE=full cargo +nightly r -Zbuild-std=std,panic_abort --target=aarch64-unknown-linux-gnu
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/aarch64-unknown-linux-gnu/debug/demo`
thread 'main' panicked at src/main.rs:1:13:
explicit panic
stack backtrace:
   0:     0xaaaabefe12a8 - std::backtrace_rs::backtrace::libunwind::trace::h3eb3677c648e214e
                               at /home/ubuntu/.rustup/toolchains/nightly-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
   1:     0xaaaabefe12a8 - std::backtrace_rs::backtrace::trace_unsynchronized::h6b663bf10ab98a13
                               at /home/ubuntu/.rustup/toolchains/nightly-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
Aborted (core dumped)

Buggy in a different way, but still buggy, so maybe it's not worth splitting hairs.

@cuviper cuviper added the O-AArch64 Armv8-A or later processors in AArch64 mode label Apr 10, 2024
@Amanieu
Copy link
Member

Amanieu commented Apr 10, 2024

The current behaviour doesn't surprise me since we neither force frame pointers, nor force unwind tables. To generate the stack trace usually what happens is that eh_frame to unwind the frames is used if it exists, otherwise frame pointers are used to continue unwind the frames. We build the standard library with -Cpanic=unwind so the eh_frame is available, but past begin_panic nothing is available anymore.

Shouldn't the unwinder just give up if it reaches a frame with no unwinding information? This seems to be the issue here since the loop start when the unwinder reaches the first frame compiled with panic=abort.

This target uses libgcc as the unwinder, this might be worth looking more deeply into.

A truncated backtrace is normal if everything is compiled with panic=abort since we don't emit eh_frame by default so the unwinder doesn't know how to unwind the stack. What shouldn't be happening is that the unwinder ends up in an infinite loop since that indicates invalid unwind information.

Also note that the call into panic is a non-returning tail call. This has caused issues in the past on the ARM target (#69231) since LLVM would assume the LR value is dead and leave it dangling. If this happens to hold a function pointer to std::panicking::begin_panic then this could lead to the infinite loop we are seeing.

@workingjubilee
Copy link
Member

Huh? I thought AArch64 code was being compiled with frame pointers?

@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

@rustbot rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Apr 11, 2024
@nbdd0121
Copy link
Contributor

Shouldn't the unwinder just give up if it reaches a frame with no unwinding information?

For backtracing, libgcc does have a mechanism to use a fallback mechanism if unwind info is not available. Although I just checked that it is implemented for PPC only, so this wasn't the case for aarch64.

This unfortunately makes me unable to explain why the looping behaviour disappears when we force frame pointer though.

Huh? I thought AArch64 code was being compiled with frame pointers?

Doesn't look like it. https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/targets/aarch64_unknown_linux_gnu.rs

Apparently we have frame pointers enabled for non-leaf functions on Apple platforms, but not Linux.

@Amanieu
Copy link
Member

Amanieu commented Apr 11, 2024

Another interesting data point. When building for aarch64-unknown-linux-musl, I get this output:

stack backtrace:
   0:           0x419218 - std::backtrace_rs::backtrace::libunwind::trace::h41b2803078d25074
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
   1:           0x419218 - std::backtrace_rs::backtrace::trace_unsynchronized::hb38469496ae402aa
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:           0x419218 - std::sys_common::backtrace::_print_fmt::h0cf53d01d01865f9
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:68:5
   3:           0x419218 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h2016a34b1ca538fd
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:44:22
   4:           0x44476c - core::fmt::rt::Argument::fmt::h96d6bfc4fa1065a9
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/core/src/fmt/rt.rs:142:9
   5:           0x44476c - core::fmt::write::h3a1e5be1e1cfe579
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/core/src/fmt/mod.rs:1153:17
   6:           0x4173e8 - std::io::Write::write_fmt::hcc53d7d63a73e855
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/io/mod.rs:1843:15
   7:           0x419024 - std::sys_common::backtrace::_print::hae3e089c5d3e39b2
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:47:5
   8:           0x419024 - std::sys_common::backtrace::print::h1182a252cb3edf9c
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/sys_common/backtrace.rs:34:9
   9:           0x41a474 - std::panicking::default_hook::{{closure}}::he5c426f80ff73911
  10:           0x41a114 - std::panicking::default_hook::h5274ad032677b37f
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/panicking.rs:292:9
  11:           0x41a990 - std::panicking::rust_panic_with_hook::h45e7e3752affffd6
                               at /rustc/c67326b063bd27ed04f306ba2e372cd92e0a8751/library/std/src/panicking.rs:779:13
  12:           0x4022c8 - std::panicking::begin_panic::{{closure}}::h275b05efe23611e5
  13:           0x430254 - _Unwind_Backtrace

Note that the last line is clearly incorrect: _Unwind_Backtrace does not call begin_panic.

@Amanieu
Copy link
Member

Amanieu commented Apr 11, 2024

More clues:

  • std::panicking::rust_panic_with_hook is built as part of the standard library with -Cpanic=unwind.
  • std::panicking::begin_panic::{{closure}} is generic and built as part of the main program with -Cpanic=abort. I checked the binary with objdump and it does not have any eh_frame data.

What should have happened is that we stop the backtrace at std::panicking::begin_panic::{{closure}} because it has no unwind info to continue further. But for some reason we attempt to unwind despite the lack of unwind info, which causes this problem. Is the backtrace unwinder automatically falling back to frame pointers?

@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 11, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 14, 2024
…mulacrum

ci: test cargo on `aarch64-gnu`

Since `aarch64-unknown-linux-gnu` is a tier-1 target, we should also test cargo on it, especially since cargo's own CI doesn't cover this yet. This might have helped us discover rust-lang#123733 sooner, which is not a cargo problem but was uncovered by a new cargo test (which we'll have to skip for now). Everything else passes in my local run, so at least we'll have a guard against future regressions.
@eggyal
Copy link
Contributor

eggyal commented Apr 14, 2024

https://github.com/libunwind/libunwind/blob/05afdabf38d3fa461b7a9de80c64a6513a564d81/src/aarch64/Gstep.c#L635-L638

	  /* Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
	   * specifies that the end of the frame record chain is indicated by
	   * the address zero in the address for the previous frame.
	   */

@Amanieu
Copy link
Member

Amanieu commented Apr 14, 2024

OK I figured out the root cause of the problem. It's because LLVM is clobbering the link register on the tail call:

0000000000006924 <_ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h3fd05c3e3e7919b0E>:
    6924:	d100c3ff 	sub	sp, sp, #0x30
    6928:	aa0003e8 	mov	x8, x0
    692c:	f940010a 	ldr	x10, [x8]
    6930:	f9400509 	ldr	x9, [x8, #8]
    6934:	f9000bea 	str	x10, [sp, #16]
    6938:	f9000fe9 	str	x9, [sp, #24]
    693c:	f9400bea 	ldr	x10, [sp, #16]
    6940:	f9400fe9 	ldr	x9, [sp, #24]
    6944:	910003e0 	mov	x0, sp
    6948:	f90003ea 	str	x10, [sp]
    694c:	f90007e9 	str	x9, [sp, #8]
    6950:	f9400903 	ldr	x3, [x8, #16]
    6954:	f00002a1 	adrp	x1, 5d000 <GCC_except_table144+0x159b4>
    6958:	91248021 	add	x1, x1, #0x920
    695c:	aa1f03e2 	mov	x2, xzr
    6960:	52800028 	mov	w8, #0x1                   	// #1
    6964:	12000104 	and	w4, w8, #0x1
    6968:	2a1f03e8 	mov	w8, wzr
    696c:	12000105 	and	w5, w8, #0x1
    6970:	940060d9 	bl	1ecd4 <_ZN3std9panicking20rust_panic_with_hook17h06b7ef7eb6d8944bE>

Note how the incoming LR value is not saved anywhere and is later clobbered by the BL instruction.

Also, my earlier statement was incorrect, _ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h3fd05c3e3e7919b0E does have unwinding information:

00000088 0000000000000010 00000000 CIE
  Version:               1
  Augmentation:          "zR"
  Code alignment factor: 1
  Data alignment factor: -4
  Return address column: 30
  Augmentation data:     1b
  DW_CFA_def_cfa: r31 (sp) ofs 0

0000009c 0000000000000010 00000018 FDE cie=00000088 pc=0000000000006924..0000000000006974
  DW_CFA_advance_loc: 4 to 0000000000006928
  DW_CFA_def_cfa_offset: 48

This unwind info is claiming that the return address is present in the link register (this is the default if not overridden), but this is clearly incorrect in this case since the link register has been clobbered.

So the unwinder is behaving correctly, it's just that LLVM is emitting incorrect unwind info (or that it should be preserving the caller's link register).

@Amanieu
Copy link
Member

Amanieu commented Apr 14, 2024

Here is the LLVM IR for this function. I think the root of the problem is that we are assigning a personality function to this function, which forces LLVM to emit DWARF frame metadata for this function. However this is a nounwind function that only calls other nounwind functions, so LLVM clobbering the link register is perfectly valid.

I see 2 ways of fixing this:

  • Figure out why rustc is adding a personality to this function and stop doing that.
  • Change LLVM so that that attaching a personality to a function implies the uwtable attribute.
; ModuleID = '<stdin>'
source_filename = "rust_out.7d368e479ffa2ca5-cgu.0"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-musl"

%"std::panicking::begin_panic::Payload<&str>" = type { %"core::option::Option<&str>" }
%"core::option::Option<&str>" = type { ptr, [1 x i64] }

@vtable.1 = external hidden unnamed_addr constant <{ ptr, [16 x i8], ptr, ptr }>, align 8

; Function Attrs: inlinehint noreturn nounwind
define hidden fastcc void @"_ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h275b05efe23611e5E"(ptr noalias nocapture noundef readonly align 8 dereferenceable(24) %_1) unnamed_addr #0 personality ptr @rust_eh_personality {
start:
  %_4 = alloca %"std::panicking::begin_panic::Payload<&str>", align 8
  call void @llvm.lifetime.start.p0(i64 16, ptr nonnull %_4)
  %inner.0 = load ptr, ptr %_1, align 8, !nonnull !3, !align !4, !noundef !3
  %0 = getelementptr inbounds i8, ptr %_1, i64 8
  %inner.1 = load i64, ptr %0, align 8, !noundef !3
  store ptr %inner.0, ptr %_4, align 8
  %1 = getelementptr inbounds i8, ptr %_4, i64 8
  store i64 %inner.1, ptr %1, align 8
  %2 = getelementptr inbounds i8, ptr %_1, i64 16
  %_6 = load ptr, ptr %2, align 8, !nonnull !3, !align !5, !noundef !3
  call void @_ZN3std9panicking20rust_panic_with_hook17h45e7e3752affffd6E(ptr noundef nonnull align 1 %_4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @vtable.1, ptr noalias noundef readonly align 8 dereferenceable_or_null(48) null, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) %_6, i1 noundef zeroext true, i1 noundef zeroext false) #4
  unreachable
}

; Function Attrs: nounwind
declare noundef i32 @rust_eh_personality(i32 noundef, i32 noundef, i64 noundef, ptr noundef, ptr noundef) unnamed_addr #1

; Function Attrs: noreturn nounwind
declare void @_ZN3std9panicking20rust_panic_with_hook17h45e7e3752affffd6E(ptr noundef nonnull align 1, ptr noalias noundef readonly align 8 dereferenceable(24), ptr noalias noundef readonly align 8 dereferenceable_or_null(48), ptr noalias noundef readonly align 8 dereferenceable(24), i1 noundef zeroext, i1 noundef zeroext) unnamed_addr #2

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr nocapture) #3

attributes #0 = { inlinehint noreturn nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #1 = { nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #2 = { noreturn nounwind "probe-stack"="inline-asm" "target-cpu"="generic" "target-features"="+v8a" }
attributes #3 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }
attributes #4 = { noreturn nounwind }

!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 8, !"PIC Level", i32 2}
!1 = !{i32 7, !"PIE Level", i32 2}
!2 = !{!"rustc version 1.78.0-nightly (c67326b06 2024-03-15)"}
!3 = !{}
!4 = !{i64 1}
!5 = !{i64 8}

@nbdd0121
Copy link
Contributor

nbdd0121 commented Apr 14, 2024

I suppose we can also just emit the uwtable attribute whenever we attach a personality function?

bors added a commit to rust-lang-ci/rust that referenced this issue Apr 15, 2024
…mulacrum

ci: test cargo on `aarch64-gnu`

Since `aarch64-unknown-linux-gnu` is a tier-1 target, we should also test cargo on it, especially since cargo's own CI doesn't cover this yet. This might have helped us discover rust-lang#123733 sooner, which is not a cargo problem but was uncovered by a new cargo test (which we'll have to skip for now). Everything else passes in my local run, so at least we'll have a guard against future regressions.
@workingjubilee
Copy link
Member

Huh? I thought AArch64 code was being compiled with frame pointers?

Doesn't look like it. https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/targets/aarch64_unknown_linux_gnu.rs

Apparently we have frame pointers enabled for non-leaf functions on Apple platforms, but not Linux.

Mmm. We probably ought to be adopting frame pointers across the board for AAPCS64 reasons. It's technically acceptable for a platform to deviate, but it should be explicit.

@Amanieu Amanieu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 17, 2024
@davidlt
Copy link

davidlt commented May 14, 2024

I think this also affects riscv64 in Fedora/RISCV. Looking at the builds logs it seems to be panic_abort_doc_tests. It just runs forever slowly consuming the memory until it gets killed.

@decathorpe
Copy link

Huh? I thought AArch64 code was being compiled with frame pointers?

To clarify, Fedora does force-enable frame pointers for x86_64 and aarch64, so this might be affecting Rust builds on Fedora.

@cuviper
Copy link
Member Author

cuviper commented May 15, 2024

I reproduced the problem using nightlies for this report, so I don't think Fedora's configuration matters.

@Reflexe
Copy link

Reflexe commented May 31, 2024

@Amanieu I reproduced it. In my case, I saw that the LLVM IR output was:

; std::panicking::begin_panic::{{closure}}
; Function Attrs: inlinehint noreturn nounwind
define internal void @"_ZN3std9panicking11begin_panic28_$u7b$$u7b$closure$u7d$$u7d$17h4691076880adeaeeE"(ptr align 8 %_1) unnamed_addr #5 personality ptr @rust_eh_personality {
start: 
  %0 = alloca [16 x i8], align 8
  %_7 = alloca [16 x i8], align 8
  %_4 = alloca [16 x i8], align 8
  %inner.0 = load ptr, ptr %_1, align 8
  %1 = getelementptr inbounds i8, ptr %_1, i64 8
  %inner.1 = load i64, ptr %1, align 8
  store ptr %inner.0, ptr %_7, align 8            
  %2 = getelementptr inbounds i8, ptr %_7, i64 8
  store i64 %inner.1, ptr %2, align 8
  %3 = load ptr, ptr %_7, align 8
  %4 = getelementptr inbounds i8, ptr %_7, i64 8
  %5 = load i64, ptr %4, align 8
  store ptr %3, ptr %_4, align 8
  %6 = getelementptr inbounds i8, ptr %_4, i64 8
  store i64 %5, ptr %6, align 8
  %7 = getelementptr inbounds i8, ptr %_1, i64 16
  %_6 = load ptr, ptr %7, align 8
; call std::panicking::rust_panic_with_hook
  call void @_ZN3std9panicking20rust_panic_with_hook17h9e3397264ef8c828E(ptr align 1 %_4, ptr align 8 @vtable.1, ptr align 8 null, ptr align 8 %_6, i1 zeroext true, i1 zeroext false) #13
  unreachable
  
bb1:                                              ; No predecessors!
  %8 = load ptr, ptr %0, align 8
  %9 = getelementptr inbounds i8, ptr %0, i64 8
  %10 = load i32, ptr %9, align 8
  %11 = insertvalue { ptr, i32 } poison, ptr %8, 0
  %12 = insertvalue { ptr, i32 } %11, i32 %10, 1
  resume { ptr, i32 } %12
}

Due to the resume (which for some reason does not exist in your code), we need to add a personality to this function. Otherwise, we would get an error from LLVM.

In any case, if I understand correctlly, the condition that sets the personality function in this case is: https://github.com/rust-lang/rust/blob/master/compiler/rustc_codegen_ssa/src/mir/mod.rs#L182 which I don't fully understand TBH.

Also for everyone without access to a aarch64 machine and want to reproduce (paths are maybe a bit different):

echo 'fn main() { panic!() }' | rustc - -Cpanic=abort --target aarch64-unknown-linux-gnu  -C "linker=aarch64-linux-gnu-gcc" && qemu-aarch64  -E LD_LIBRARY_PATH="/usr/aarch64-linux-gnu/lib64" -E RUST_BACKTRACE=full -L /usr/aarch64-linux-gnu/   ./rust_out

@Amanieu
Copy link
Member

Amanieu commented May 31, 2024

@Reflexe Essentially, whenever we add a personality function we must ensure that we also add the uwtable attribute to the function, otherwise LLVM generates invalid unwinding information.

Reflexe added a commit to Reflexe/rust that referenced this issue Jun 1, 2024
Adding a personality function forces LLVM to generate unwinding info that might be incorrect.
To solve it, always apply the UWTable attribute when setting a personality function.

Fixes rust-lang#123733
@Reflexe
Copy link

Reflexe commented Jun 1, 2024

@Amanieu Thanks! this was pretty easy to implement, I created #125844.

There are two issues still

  1. I don't fully understand why adding uwtable has any effect here: LLVM already generates an unwinding table entry for this function, is it correct to say that it generates an incorrect unwinding table entry?
  2. The issue you saw at the beginning still exists:
> echo 'fn main() { panic!() }' | rustc - -Cpanic=abort --target aarch64-unknown-linux-gnu  -C "linker=aarch64-linux-gnu-gcc" && qemu-aarch64   -E LD_LIBRARY_PATH="/usr/aarch64-linux-gnu/lib64" -E RUST_BACKTRACE=full -L /usr/aarch64-linux-gnu/   ./rust_out
thread 'main' panicked at <anon>:1:13:
explicit panic
stack backtrace:
   0:     0x7cd4114333c8 - trace
                               at /home/reflexe/rust/rust/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5
   1:     0x7cd4114333c8 - trace_unsynchronized<std::sys_common::backtrace::_print_fmt::{closure_env#1}>
                               at /home/reflexe/rust/rust/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7cd4114333c8 - _print_fmt
                               at /home/reflexe/rust/rust/library/std/src/sys_common/backtrace.rs:68:5
   3:     0x7cd4114333c8 - fmt
                               at /home/reflexe/rust/rust/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x7cd411448b80 - fmt
                               at /home/reflexe/rust/rust/library/core/src/fmt/rt.rs:165:63
   5:     0x7cd411448b80 - write
                               at /home/reflexe/rust/rust/library/core/src/fmt/mod.rs:1168:21
   6:     0x7cd41141a65c - write_fmt<std::sys::pal::unix::stdio::Stderr>
                               at /home/reflexe/rust/rust/library/std/src/io/mod.rs:1835:15
   7:     0x7cd41143324c - _print
                               at /home/reflexe/rust/rust/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x7cd41143324c - print
                               at /home/reflexe/rust/rust/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x7cd41142e548 - {closure#1}
  10:     0x7cd41142e22c - default_hook
                               at /home/reflexe/rust/rust/library/std/src/panicking.rs:298:9
  11:     0x7cd41142e964 - rust_panic_with_hook
                               at /home/reflexe/rust/rust/library/std/src/panicking.rs:795:13
  12:     0x7cd4114123f0 - std::panicking::begin_panic::{{closure}}::h4691076880adeaee
  13:     0x7cd4114122f0 - std::sys_common::backtrace::__rust_end_short_backtrace::h8c00d8b5eac77590

The last entry is incorrect here. Not sure if this is related or we should open another issue for this one.
Thanks for the help.

Reflexe added a commit to Reflexe/rust that referenced this issue Jun 1, 2024
Adding a personality function forces LLVM to generate unwinding info that might be incorrect.
To solve it, always apply the UWTable attribute when setting a personality function.

Fixes rust-lang#123733
Reflexe added a commit to Reflexe/rust that referenced this issue Jun 1, 2024
Adding a personality function forces LLVM to generate unwinding info that might be incorrect.
To solve it, always apply the UWTable attribute when setting a personality function.

Fixes rust-lang#123733
@Amanieu
Copy link
Member

Amanieu commented Jun 10, 2024

  1. I don't fully understand why adding uwtable has any effect here: LLVM already generates an unwinding table entry for this function, is it correct to say that it generates an incorrect unwinding table entry?

Yes, it generates an incorrect entry if the uwtable attribute is absent. Specifically, it fails to preserve some registers while the unwinding tables indicate that they are actually preserved.

  1. The issue you saw at the beginning still exists:

No, the last entry here is correct: the backtrace properly ends at __rust_end_short_backtrace which comes from user code and therefore has no unwinding metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows C-bug Category: This is a bug. O-AArch64 Armv8-A or later processors in AArch64 mode P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.