Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

official nextest binaries could include symbols #1345

Closed
jclulow opened this issue Mar 1, 2024 · 3 comments
Closed

official nextest binaries could include symbols #1345

jclulow opened this issue Mar 1, 2024 · 3 comments

Comments

@jclulow
Copy link

jclulow commented Mar 1, 2024

I was debugging a stuck CI worker recently, and while looking at the cargo-nextest process (which was indeed not at fault!) I discovered that there were no symbols for Rust program text in the binary:

root@ip-10-150-1-69:~# pstack 3879 | demangle
3879:   /home/build/.cargo/bin/cargo-nextest nextest run --profile ci --locked
--------------------- thread# 1 / lwp# 1 ---------------------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2c13c10, 2ddf970, 0) + 55
 fffffc7fef16d1ea __cond_wait (2c13c10, 2ddf970) + ba
 fffffc7fef16d22e cond_wait (2c13c10, 2ddf970) + 2e
 fffffc7fef16d275 pthread_cond_wait (2c13c10, 2ddf970) + 15
 0000000000a96108 ???????? ()
 0000000000a95f17 ???????? ()
 0000000000a9682c ???????? ()
 00000000004c4f2c ???????? ()
 00000000004b988d ???????? ()
 000000000049be4e ???????? ()
 000000000048e9d4 ???????? ()
 000000000048d80a ???????? ()
 0000000000439000 ???????? ()
 0000000000434916 ???????? ()
 0000000000434be5 ???????? ()
 0000000000430af3 ???????? ()
 0000000000430a58 ???????? ()
-------- thread# 16 / lwp# 16 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2dde070, 2d789a0, 0) + 55
 fffffc7fef16d1ea __cond_wait (2dde070, 2d789a0) + ba
 fffffc7fef16d22e cond_wait (2dde070, 2d789a0) + 2e
 fffffc7fef16d275 pthread_cond_wait (2dde070, 2d789a0) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a2240) + 77
 fffffc7fef173730 _lwp_start ()
-------------- thread# 3 / lwp# 3 [umem_update] --------------
 fffffc7fef173777 lwp_park (0, fffffc7fee5fee60, 0)
 fffffc7fef16cb85 cond_wait_queue (fffffc7fef2dd710, fffffc7fef2dd6f0, fffffc7fee5fee60) + 55
 fffffc7fef16cfb5 cond_wait_common (fffffc7fef2dd710, fffffc7fef2dd6f0, fffffc7fee5fee60) + 1b5
 fffffc7fef16d2f9 __cond_timedwait (fffffc7fef2dd710, fffffc7fef2dd6f0, fffffc7fee5fef50) + 69
 fffffc7fef16d3cc cond_timedwait (fffffc7fef2dd710, fffffc7fef2dd6f0, fffffc7fee5fef50) + 3c
 fffffc7fef299c06 umem_update_thread (0) + 1d6
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a0a40) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 19 / lwp# 19 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2de2f50, 2ddffa0, 0) + 55
 fffffc7fef16d1ea __cond_wait (2de2f50, 2ddffa0) + ba
 fffffc7fef16d22e cond_wait (2de2f50, 2ddffa0) + 2e
 fffffc7fef16d275 pthread_cond_wait (2de2f50, 2ddffa0) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a1a40) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 14 / lwp# 14 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2b9b310, 2ba89a0, 0) + 55
 fffffc7fef16d1ea __cond_wait (2b9b310, 2ba89a0) + ba
 fffffc7fef16d22e cond_wait (2b9b310, 2ba89a0) + 2e
 fffffc7fef16d275 pthread_cond_wait (2b9b310, 2ba89a0) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a1240) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 17 / lwp# 17 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2ced830, 2dd9970, 0) + 55
 fffffc7fef16d1ea __cond_wait (2ced830, 2dd9970) + ba
 fffffc7fef16d22e cond_wait (2ced830, 2dd9970) + 2e
 fffffc7fef16d275 pthread_cond_wait (2ced830, 2dd9970) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a0240) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 13 / lwp# 13 [tokio-runtime-worker] ---------
 fffffc7fef17ab2a write    (2, 11a8c10, 85)
 0000000000a506db ???????? ()
 0000000000814135 ???????? ()
 0000000000807e57 ???????? ()
 00000000004f4433 ???????? ()
 00000000004f23d9 ???????? ()
 00000000004f0919 ???????? ()
 0000000000a9ea9b ???????? ()
 0000000000a9c2c6 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a3240) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 20 / lwp# 20 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (16450d0, 28b21c0, 0) + 55
 fffffc7fef16d1ea __cond_wait (16450d0, 28b21c0) + ba
 fffffc7fef16d22e cond_wait (16450d0, 28b21c0) + 2e
 fffffc7fef16d275 pthread_cond_wait (16450d0, 28b21c0) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a2a40) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 15 / lwp# 15 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (2de2db0, 2cf4130, 0) + 55
 fffffc7fef16d1ea __cond_wait (2de2db0, 2cf4130) + ba
 fffffc7fef16d22e cond_wait (2de2db0, 2cf4130) + 2e
 fffffc7fef16d275 pthread_cond_wait (2de2db0, 2cf4130) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a4240) + 77
 fffffc7fef173730 _lwp_start ()
-------- thread# 18 / lwp# 18 [tokio-runtime-worker] ---------
 fffffc7fef173777 lwp_park (0, 0, 0)
 fffffc7fef16cb85 cond_wait_queue (1149670, 114c850, 0) + 55
 fffffc7fef16d1ea __cond_wait (1149670, 114c850) + ba
 fffffc7fef16d22e cond_wait (1149670, 114c850) + 2e
 fffffc7fef16d275 pthread_cond_wait (1149670, 114c850) + 15
 0000000000a96108 ???????? ()
 0000000000a9df67 ???????? ()
 0000000000a9cb04 ???????? ()
 0000000000ab296a ???????? ()
 0000000000a8ba54 ???????? ()
 0000000000a8b712 ???????? ()
 0000000000a5e259 ???????? ()
 fffffc7fef1733e7 _thrp_setup (fffffc7fee8a3a40) + 77
 fffffc7fef173730 _lwp_start ()
-------------------- thread# 21 / lwp# 21 --------------------
 0000000000a5e230 ????????(), exit value = 0x0000000000000000
        ** zombie (exited, not detached, not yet joined) **

The binary appears to have been stripped not just of debuginfo but also of all symbols:

root@ip-10-150-1-69:~# file /home/build/.cargo/bin/cargo-nextest
/home/build/.cargo/bin/cargo-nextest:   ELF 64-bit LSB executable AMD64 Version 1, dynamically linked, stripped
root@ip-10-150-1-69:~# ls -lh /home/build/.cargo/bin/cargo-nextest
-rwxr-xr-x   1 build    build      10.7M Jan  9 20:35 /home/build/.cargo/bin/cargo-nextest

We download this binary using:

#...
NEXTEST_VERSION='0.9.67'
#...
curl -sSfL --retry 10 https://get.nexte.st/"$NEXTEST_VERSION"/"$1" | gunzip | tar -xvf - -C ~/.cargo/bin 

It would be awesome if we could at least leave enough symbols in there so that debuggers can get stack traces, even if we don't end up with the full DWARF. Thanks!

@sunshowers
Copy link
Member

Looks like the issue was that the builder I'm using runs strip on the final binary. This is fixed with da972f9, part of cargo-nextest 0.9.68-b.2.

With the binary obtained by:

curl -LsSf https://get-preview.nexte.st/0.9.68-b.2/illumos | gunzip | tar xf - -C ${CARGO_HOME:-~/.cargo}/bin

I ran cargo nextest run, then printed out the stacks:

% pstack 13138     
13138:   /home/rain/.cargo/bin/cargo-nextest nextest run
--------------------- thread# 1 / lwp# 1 ---------------------
 fffffc7feee0ad2a waitid   (0, 3354, fffffc7fffdf67b0, 81)
 000000000068b625 _ZN4duct11HandleInner4wait17h5f5488f8d98e320aE () + 645
 000000000068b39f _ZN4duct11HandleInner4wait17h5f5488f8d98e320aE () + 3bf
 00000000006892b7 _ZN4duct10Expression3run17h81ae954e1a0ba6f3E () + 6f7
 00000000004a1b13 _ZN13cargo_nextest8dispatch7BaseApp17build_binary_list17h14186d9ea63938cdE () + 2763
 00000000004938dd _ZN13cargo_nextest8dispatch3App8exec_run17h8c0142218a61544dE () + 37d
 0000000000489b99 _ZN13cargo_nextest8dispatch7AppOpts4exec17h69cbf5f6fc8197b1E () + 8f9
 000000000048858e _ZN13cargo_nextest8dispatch15CargoNextestApp4exec17h21609dfe47db9fd9E () + 16e
 0000000000437201 _ZN13cargo_nextest4main17h395d283c29841421E () + 801
 00000000004324b6 _ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h23c403888486bcb0E () + 6
 00000000004377ae main () + 2ce
 000000000042e803 _start_crt () + 83
 000000000042e768 _start () + 18
--------------------- thread# 4 / lwp# 4 ---------------------
 fffffc7feee0a65a read     (3, 3d7ff6b, 2000)
 000000000068ecae _ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17hb737e1594e8d40cdE () + 13e
 000000000068ea99 _ZN4core3ops8function6FnOnce40call_once$u7b$$u7b$vtable.shim$u7d$$u7d$17h2655f2e65b9604d1E () + 79
 0000000000a5bde9 _ZN3std3sys4unix6thread6Thread3new12thread_start17h1783cbcbbf061711E () + 29
 fffffc7feee03607 _thrp_setup (fffffc7feec90240) + 77
 fffffc7feee03950 _lwp_start ()
-------------- thread# 3 / lwp# 3 [umem_update] --------------
 fffffc7feee03997 lwp_park (0, fffffc7fee9d1e60, 0)
 fffffc7feedfcda5 cond_wait_queue (fffffc7fef20d710, fffffc7fef20d6f0, fffffc7fee9d1e60) + 55
 fffffc7feedfd1d5 cond_wait_common (fffffc7fef20d710, fffffc7fef20d6f0, fffffc7fee9d1e60) + 1b5
 fffffc7feedfd519 __cond_timedwait (fffffc7fef20d710, fffffc7fef20d6f0, fffffc7fee9d1f50) + 69
 fffffc7feedfd5ec cond_timedwait (fffffc7fef20d710, fffffc7fef20d6f0, fffffc7fee9d1f50) + 3c
 fffffc7fef1c9c06 umem_update_thread (0) + 1d6
 fffffc7feee03607 _thrp_setup (fffffc7feec90a40) + 77
 fffffc7feee03950 _lwp_start ()

Going to get cargo-nextest 0.9.68 with this fix out in the next few days -- let's figure out the rest in the omicron issue.

@taiki-e
Copy link
Contributor

taiki-e commented Mar 5, 2024

This is fixed with da972f9

upload-rust-binary-action now aligned default strip behavior to Cargo 1.77+'s default (strip=debuginfo) (taiki-e/upload-rust-binary-action#66), so I think this workaround is no longer necessary.

@sunshowers
Copy link
Member

Awesome, thanks as always @taiki-e!

sunshowers added a commit that referenced this issue Mar 5, 2024
As mentioned in
#1345 (comment), this
workaround is no longer required with the latest version of
rust-upload-binary-action.
sunshowers added a commit that referenced this issue Mar 16, 2024
Restore behavior changed in 4e259a7, because
it turns out that `strip = "debuginfo"` still strips too much.

This was observed with the cargo-nextest 0.9.68-rc.1 binary on illumos, where
`pstack` showed lots of ??? marks.

Part of #1345.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants