Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust 1.34 generates significantly less debug information for libstd functions vs. Rust 1.33 #60020

Closed
froydnj opened this issue Apr 16, 2019 · 23 comments · Fixed by #109808
Closed
Assignees
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@froydnj
Copy link
Contributor

froydnj commented Apr 16, 2019

A Firefox developer filed a bug report about Rust symbols not being correctly represented in crash reports:

0 	XUL 	GeckoCrash 	toolkit/xre/nsAppRunner.cpp:5093 	context
1 	XUL 	gkrust_shared::panic_hook 	toolkit/library/rust/shared/lib.rs:240 	frame_pointer
2 	XUL 	core::ops::function::Fn::call 	src/libcore/ops/function.rs:69 	cfi
3 	XUL 	rust_panic_with_hook 	src/libstd/panicking.rs:482 	cfi
4 	XUL 	continue_panic_fmt 	src/libstd/panicking.rs:385 	cfi
5 	XUL 	rust_begin_panic 	src/libstd/panicking.rs:312 	cfi
6 	XUL 	panic_fmt 	src/libcore/panicking.rs:85 	cfi
7 	XUL 	panic 	src/libcore/panicking.rs:49 	cfi

The expectation was that, e.g. rust_panic_with_hook would have been std::panicking::rust_panic_with_hook.

We had recently upgraded to Rust 1.34, which led to comparing object files before and after the upgrade. I compared Linux x86-64 binaries; I don't see why the analysis doesn't apply to OS X's Mach-O files, but it's possible the results are different there. (The above crash is from OS X, and we have crashes using Rust 1.33 that do display std::panicking::rust_panic_with_hook and similar.) The ELF symbol table in both cases lists rust_panic_with_hook as _ZN3std9panicking20rust_panic_with_hook$UNIQUE_ID, so that wasn't the problem.

We then looked at the debug information. Rust 1.33 generated, according to readelf --debug-dump=info:

 <3><2155a724>: Abbrev Number: 301 (DW_TAG_subprogram)
    <2155a726>   DW_AT_low_pc      : 0x5738510
    <2155a72a>   DW_AT_high_pc     : 0x6b2
    <2155a72e>   DW_AT_frame_base  : 1 byte block: 54 	(DW_OP_reg4 (esp))
    <2155a730>   DW_AT_linkage_name: (indirect string, offset: 0x9d7fc09): _ZN3std9panicking20rust_panic_with_hook17he447c38467745511E
    <2155a734>   DW_AT_name        : (indirect string, offset: 0x9d7fc45): rust_panic_with_hook
    <2155a738>   DW_AT_decl_file   : 11
    <2155a739>   DW_AT_decl_line   : 447
    <2155a73b>   DW_AT_external    : 1
    <2155a73b>   DW_AT_noreturn    : 1

Notice the existence of both DW_AT_name and DW_AT_linkage_name. Rust 1.34, in contrast, generated:

 <1><216d7afa>: Abbrev Number: 293 (DW_TAG_subprogram)
    <216d7afc>   DW_AT_low_pc      : 0x5734990
    <216d7b00>   DW_AT_high_pc     : 0x6ae
    <216d7b04>   DW_AT_name        : (indirect string, offset: 0x9db1282): rust_panic_with_hook

which drops the DW_AT_linkage_name and is also significantly less informative than its predecessor.

I'm not familiar enough with rustc to know what might have caused this regression. One of my colleagues pointed out #58208, which changed how various bits of panic infrastructure are imported into libstd. It's not clear to me whether it's that specific change, or how the compiler internally describes crate:: symbols to LLVM, or something else entirely.

cc @glandium @michaelwoerister

@jonas-schievink jonas-schievink added A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-nominated labels Apr 16, 2019
@pnkfelix
Copy link
Member

Some first steps that would be good here:

  1. Make an isolated test case demonstrating the problem, preferably something that just needs to be compiled against rustc itself. (Or a whole crate to feed to cargo, if necessary...)
  2. Bisecting to which bors commit, or failing that, which nightly, demonstrates the problem.

in any case I'm not 100% sure what priority to assign to this. Obviously we want our debug info to be high quality. But the first step is to find out whether this change was deliberate or not, and if it was deliberate, what was the issue motivating the change.

@froydnj
Copy link
Contributor Author

froydnj commented Apr 18, 2019

A reasonably small testcase is:

$ rustc --version
rustc 1.33.0 (2aa4c46cf 2019-02-28)
$ cat > testcase.rs <<EOF
pub fn frob(a: &[usize]) -> usize { a[0] }
EOF
$ rustc --crate-type=staticlib -o librust133.a testcase.rs
$ readelf --debug-dump=info librust133.a |grep -C 5 panic_bounds_check
readelf: Warning: unable to apply unsupported reloc type 17 to section .debug_info
 <4><1b7fc>: Abbrev Number: 0
 <3><1b7fd>: Abbrev Number: 44 (DW_TAG_subprogram)
    <1b7fe>   DW_AT_low_pc      : 0x0
    <1b806>   DW_AT_high_pc     : 0x83
    <1b80a>   DW_AT_frame_base  : 1 byte block: 57 	(DW_OP_reg7 (rsp))
    <1b80c>   DW_AT_linkage_name: (indirect string, offset: 0x17710): _ZN4core9panicking18panic_bounds_check17h34b80e64d41db052E
    <1b810>   DW_AT_name        : (indirect string, offset: 0x1774b): panic_bounds_check
    <1b814>   DW_AT_decl_file   : 41
    <1b815>   DW_AT_decl_line   : 55
    <1b816>   DW_AT_external    : 1
    <1b816>   DW_AT_noreturn    : 1
 <4><1b816>: Abbrev Number: 11 (DW_TAG_inlined_subroutine)
$

Using rustc 1.34, you get a very different result:

$ rustc --version
rustc 1.34.0 (91856ed52 2019-04-10)
$ rustc --crate-type staticlib -o librust134.a testcase.rs
$ readelf --debug-dump=info librust134.a |grep -C 5 panic_bounds_check
readelf: Warning: unable to apply unsupported reloc type 17 to section .debug_info
    <7db7>   DW_AT_call_line   : 49
 <2><7db8>: Abbrev Number: 0
 <1><7db9>: Abbrev Number: 34 (DW_TAG_subprogram)
    <7dba>   DW_AT_low_pc      : 0x0
    <7dc2>   DW_AT_high_pc     : 0x77
    <7dc6>   DW_AT_name        : (indirect string, offset: 0x6d21): panic_bounds_check
 <2><7dca>: Abbrev Number: 33 (DW_TAG_inlined_subroutine)
    <7dcb>   DW_AT_abstract_origin: <0x626a>
    <7dcf>   DW_AT_low_pc      : 0x36
    <7dd7>   DW_AT_high_pc     : 0x36
    <7ddb>   DW_AT_call_file   : 44
$

@nikomatsakis
Copy link
Contributor

This is a fairly serious bug for the FF folk as it is breaking their crash report infrastructure. It would be good to do a bisection to verify if, indeed, #58208 is at fault (I don't know why that should be the case). Is there any connection between the use of Rust 2018 and debuginfo?

@Gankra
Copy link
Contributor

Gankra commented Apr 23, 2019

I believe we don't currently use Rust 2018, as upgrading is low priority and requires fixes to our tooling

@froydnj
Copy link
Contributor Author

froydnj commented Apr 23, 2019

Some investigation showed that the dropping of module names is not limited to std symbols. Which means that this is not (specifically) a Rust 2018 issue.

@froydnj
Copy link
Contributor Author

froydnj commented Apr 26, 2019

I am in the process of bisecting this.

@froydnj
Copy link
Contributor Author

froydnj commented Apr 27, 2019

OK, I have run into an issue that must involve some subtlety with rustc that I don't understand. I can use the above instructions with a released rustc to get an artifact I can analyze with readelf. But when I try to do the same with a freshly-built rustc from stage2, the resulting objects in the archive suddenly contain no debug information (even with -C debuginfo=2)...and significantly more .o files from core rust (e.g. compiler_builtins) than the released rustc.

My command for the stage2 rustc is ./build/x86_64-unknown-linux-gnu/stage2/bin/rustc --crate-type staticlib -o libdwarf-test.a ~/dwarf-test.rs -C debuginfo=2.

What is going on here? Why isn't the stage2 rustc acting like the rustc from releases? I don't think the number of object files is really a big issue, but the debug information bit makes it basically impossible to write a bisection script that is reasonably efficient.

@glandium
Copy link
Contributor

Search for debuginfo in the rust section of config.toml.

@michaelwoerister
Copy link
Member

michaelwoerister commented Apr 29, 2019

Try with the following config.toml settings:

codegen-units-std = 1
debuginfo = true
debuginfo-only-std = true
incremental = false

That should make sure that each crate of the standard library results in one object file, and that debuginfo is generated for the standard library (but not the compiler).

@froydnj
Copy link
Contributor Author

froydnj commented Apr 29, 2019

debuginfo and debuginfo-only-std do seem to make the testcase work. They do not explain why -C debuginfo=2 doesn't produce any debuginfo, not even basic information about the function being compiled.

Anyway, several attempts at bisection later, and I haven't been able to reproduce the issue when compiling 1.34.0 on my machine and testing against that. I must be using the wrong flags somewhere along the way.

@pnkfelix
Copy link
Member

@froydnj have you tried using a docker image to replicate the exact form used by our builders when they make the distribution artifacts?

@michaelwoerister
Copy link
Member

They do not explain why -C debuginfo=2 doesn't produce any debuginfo, not even basic information about the function being compiled.

This is really strange. I don't think we have support for disabling debuginfo generation, so any compiler should be able to produce it, regardless of how the compiler was built. What platform are you on?

@michaelwoerister
Copy link
Member

@froydnj Did you check if the function makes it into the staticlib at all? If it is a Rust function then rustc or the linker might remove it along with the debuginfo at some point. Making the function pub extern "C" and adding #[no_mangle] to it should help.

@froydnj
Copy link
Contributor Author

froydnj commented Apr 29, 2019

cargo-bisect-rustc says that the 2019-01-27 nightly introduced the regression. That narrows down the commit range.

I'll try bisecting from there; my early (mostly ignorant of how rustc works) predictions are #57675 or #57407. Maybe #57908.

@michaelwoerister
Copy link
Member

cff0750 (from #57675) looks relevant.

@froydnj
Copy link
Contributor Author

froydnj commented Apr 29, 2019

Commit bisection with cargo-bisect-rustc says that #57675 is responsible (I love bors merge artifact caching!). So @michaelwoerister's suggestion of cff0750 being relevant seems very reasonable; IIUC, this commit made rustc really treat debuginfo=1 as different from debuginfo=2. Before, I guess debuginfo=1 effectively meant debuginfo=2? And/or the rust.debuginfo-lines setting was basically ineffective?

That would also explain--I think--why I couldn't get the problem to reproduce locally; my config.toml was setting rust.debuginfo-lines to false.

It looks like LLVM enables you to tweak the linkage names emitted through the (-mllvm?) dwarf-linkage-names command-line option; I guess rustc would twiddle the -C llvm_args option? Twiddling the linkage names setting wouldn't resolve the issues of less debug information in general, but I think it would resolve the specific crash reporting issues in Firefox that led to this investigation.

Paths forward that I can see:

  1. Revert cff0750.
  2. Make debuginfo-lines/debuginfo=1 really mean "line numbers and 'real' linkage names".
  3. Same as 2, but only during bootstrap.
  4. Stop compiling with debuginfo-lines = true on all channels (currently it defaults to true on stable/beta/nightly, and false on dev).
  5. Some other option (ideas welcome!).

Options 2 and 3 are a bit gross, but presumably don't run into issues of generating overly-verbose debug information, like option 4 would.

@pnkfelix
Copy link
Member

pnkfelix commented May 2, 2019

triage: Marking P-high, but I am not sure it actually warrants that prioritization.

Either @michaelwoerister or @cuviper , can one of you assign this to yourself and take charge on figuring out what we should do here?

@pnkfelix pnkfelix added the P-high High priority label May 2, 2019
@michaelwoerister michaelwoerister self-assigned this May 2, 2019
@michaelwoerister
Copy link
Member

Assigning to myself for now.

@cuviper
Copy link
Member

cuviper commented May 2, 2019

FWIW, I started setting the emission kind in an attempt to solve Windows debug errors, per #57675 (comment). I suggest expanding GitHub's hidden comments for more context. That didn't actually solve the problem -- needed a further LLVM fix for empty type info -- but I still think setting that kind is correct.

@pnkfelix
Copy link
Member

unnominating; the bug is assigned and the nomination currently unwarranted.

@michaelwoerister
Copy link
Member

So, I looked into this some more and here are my findings:

  • cff0750 (i.e. setting the DebugEmissionKind for DICompileUnits) is indeed what is causing the difference.
  • Clang emits even less debuginfo with -gline-tables-only. With Clang 8 none of my test programs contained any DW_TAG_subprogram entries. Stepping through the code with gdb still worked fine though.
  • Forcing the DebugEmissionKind to FullDebug seems to work fine on current master (although I haven't tested on Windows yet) and brings back the DW_AT_linkage_name attributes.

My suggested course of action is to

  • restore the previous behavior by always setting DebugEmissionKind to FullDebug,
  • open a GH issue that documents that we could do better here, and
  • implement this correctly in the future in a more coordinated way.

michaelwoerister added a commit to michaelwoerister/rust that referenced this issue May 21, 2019
rust-lang@cff075009 made LLVM emit
less debuginfo when compiling with "line-tables-only". The change
was essentially correct but the reduced amount of debuginfo broke
a number of tools.

This commit reverts the change so we get back the old behavior,
until we figure out how to do this properly and give external
tools to adapt to the new format.

See rust-lang#60020 for more info.
bors added a commit that referenced this issue May 21, 2019
…hton

debuginfo: Revert to old/more verbose behavior for -Cdebuginfo=1

cff075009 made LLVM emit less debuginfo when compiling with "line-tables-only". The change was essentially correct but the reduced amount of debuginfo broke
a number of tools.

This commit reverts the change so we get back the old behavior, until we figure out how to do this properly and give external tools to adapt to the new format.

See #60020 for more info.

r? @cuviper
cc @jrmuizel & @froydnj
pietroalbini pushed a commit to pietroalbini/rust that referenced this issue May 28, 2019
rust-lang@cff075009 made LLVM emit
less debuginfo when compiling with "line-tables-only". The change
was essentially correct but the reduced amount of debuginfo broke
a number of tools.

This commit reverts the change so we get back the old behavior,
until we figure out how to do this properly and give external
tools to adapt to the new format.

See rust-lang#60020 for more info.
@pnkfelix
Copy link
Member

triage: Not sure where we currently stand on this bug. @michaelwoerister , given the changes you landed in PR #61007, is this resolved now ? If not, could you point out what's changed, if anything, compared to your last comment? (Or just strikeout the outdated parts of that comment?)

@michaelwoerister
Copy link
Member

I opened #64405 to document that our current behavior could be improved. Closing this issue.

bors added a commit to rust-lang-ci/rust that referenced this issue Apr 4, 2023
…oerister

Extend -Cdebuginfo with new options and named aliases

This is a rebase of rust-lang#83947, along with my best guess at what the new options mean. I tried to follow the LLVM source code to get a better idea but ran into quite a lot of trouble (https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/go-to-definition.20in.20src.2Fllvm-project.3F). The description for the original PR follows below.

Note that the changes in this PR have already been through FCP: rust-lang#83947 (comment)

Closes rust-lang#109311. Helps with rust-lang#104968.
r? `@michaelwoerister` cc `@cuviper`

---

The -Cdebuginfo=1 option was never line tables only and can't be due to backwards compatibility issues. This was clarified and an option for emitting line tables only was added. Additionally an option for emitting line info directives only was added, which is needed for some targets, i.e. nvptx. The debug info options should now behave similarly to clang's debug info options.

Fix rust-lang#60020
Fix rust-lang#64405
saethlin pushed a commit to saethlin/miri that referenced this issue Apr 10, 2023
Extend -Cdebuginfo with new options and named aliases

This is a rebase of rust-lang/rust#83947, along with my best guess at what the new options mean. I tried to follow the LLVM source code to get a better idea but ran into quite a lot of trouble (https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/go-to-definition.20in.20src.2Fllvm-project.3F). The description for the original PR follows below.

Note that the changes in this PR have already been through FCP: rust-lang/rust#83947 (comment)

Closes rust-lang/rust#109311. Helps with rust-lang/rust#104968.
r? `@michaelwoerister` cc `@cuviper`

---

The -Cdebuginfo=1 option was never line tables only and can't be due to backwards compatibility issues. This was clarified and an option for emitting line tables only was added. Additionally an option for emitting line info directives only was added, which is needed for some targets, i.e. nvptx. The debug info options should now behave similarly to clang's debug info options.

Fix rust-lang/rust#60020
Fix rust-lang/rust#64405
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jun 11, 2023
…-Simulacrum

bootstrap: Don't override `debuginfo-level = 1` to mean `line-tables-only`

This has real differences in the effective debuginfo: in particular, it omits the module-level information and makes perf less useful (it can't distinguish "self" from "child" time anymore).

Allow passing `line-tables-only` directly in config.toml instead.

See https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/debuginfo.20in.20try.20builds/near/365090631 and https://rust-lang.zulipchat.com/#narrow/stream/238009-t-compiler.2Fmeetings/topic/.5Bsteering.5D.202023-06-09/near/364883519 for more discussion. This effectively reverts the cargo half of rust-lang#110221 to avoid regressing rust-lang#60020 again in 1.72.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
8 participants