Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gimli instead of libbacktrace by default #189

Closed
alexcrichton opened this issue May 23, 2019 · 13 comments · Fixed by #324
Closed

Use gimli instead of libbacktrace by default #189

alexcrichton opened this issue May 23, 2019 · 13 comments · Fixed by #324
Labels
gimli Related to the gimli implementation

Comments

@alexcrichton
Copy link
Member

This issue is intended to track the switch from the C library libbacktrace by default on many platforms to using Gimli by default on these platforms to symbolicate a backtrace. The rationale for this is that the C library libbacktrace has had a number of security issues historically and we're including it in every single Rust binary by default on relevant platforms (in libstd), so correctness is quite important to us.

Platforms that use libbacktrace by default are:

  • MinGW Windows.
  • macOS (when built in libstd, in the crate coresymbolication is on by default).
  • Linux / other cfg(unix) targets.

The usage of libbacktrace is relatively simple. We're either consulting the debuginfo, if available, what the symbol name for an address is or we're consulting the dynamic symbol table (aka dladdr on Unix, but I don't think this is exactly how libbacktrace gets its).

There's one caveat in our usage of libbacktrace in that we have to manually feed in a filename for the current executable to libbacktrace for macOS/Windows platforms. For other Unix platforms I think it finds it through the dynamic library iteration APIs in libc.

Another final caveat is that to fully replace libbacktrace all of the support we use for gimli must be #![no_std]. That way it can be included into the standard library, and the standard library can't depend on itself!

Today the gimli-symbolize feature of this crate actually doesn't depend on gimli at all but rather goes through external crates like addr2line, findshlibs, and memmap. I suspect that we'll probably need to drop the dependencies on findshlibs and memmap in favor of manually calling the libc APIs here, but if we can work around that it would be great! Note, though, that we don't need full featured platform support in these crates.

I think the best thing to do is to probably read up on how libbacktrace acquires debug information to parse. I think it's quite different for Windows/Mac/Linux but I suspect it's relatively straightforward and shouldn't really require that much of libstd/crates.io.

@fitzgen
Copy link
Member

fitzgen commented May 23, 2019

Note that moria (a crate to find an artifact's DWARF debug info) is more functional than it was at the time I contributed the gimli-symbolize backend. The reason that gimli-symbolize is currently linux only is because the DWARF bits necessary for symbolication are generally not stripped on linux, and we didn't have a good way to find external DWARF otherwise.

Today the gimli-symbolize feature of this crate actually doesn't depend on gimli at all but rather goes through external crates like addr2line, findshlibs, and memmap

addr2line has #![no_std] support right now.

findshlibs should be relatively easy to add #![no_std] support to, I think. I'm curious why you think dropping that dep and using the raw APIs is better here.

Regarding memmap, it's probably simpler to fs::read the file into memory.

@fitzgen
Copy link
Member

fitzgen commented May 23, 2019

cc @philipc

@alexcrichton
Copy link
Member Author

I'm curious why you think dropping that dep and using the raw APIs is better here.

Oh sorry to be clear I think it's worse ergonomics and usability-wise to use raw APIs, but I was hoping to head off pain of crates trying to be #![no_std] when there' not really any use case other than "this may be included in libstd". It may just not make sense for a crate like findshlibs to have support for #![no_std], and the support needed by this crate may be small enough that it's just easier to roll by hand in the long run.

For memmap, it's true yeah that fs::read may be better, but we don't even have access to that :(

bors-servo pushed a commit to servo/servo that referenced this issue Aug 16, 2019
Fix build error when using the 'webgl_backtrace' feature

This patch fixes a build error that happens when building with `--features webgl_backtrace`.

However, while working on this I've noticed that calling `Backtrace::new()` instantly causes a segmentation fault. This didn't happen with the example code of [the package](https://crates.io/crates/backtrace) though, so I wasn't sure where to report this (maybe related: [backtrace-rs/#150](rust-lang/backtrace-rs#150), [backtrace-rs/#189](rust-lang/backtrace-rs#189)).

cc @alexcrichton @jdm

---

- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [ ] These changes fix #___ (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___
- [x] These changes fix an issue that happens with a compile time feature not tested by the CI

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/23966)
<!-- Reviewable:end -->
@luser
Copy link

luser commented Oct 22, 2019

Note that moria (a crate to find an artifact's DWARF debug info) is more functional than it was at the time I contributed the gimli-symbolize backend

I think I've mentioned this elsewhere but moria is probably not a good candidate for no-std since it relies on finding debug files on the filesystem. I guess you could make it work if you abstracted away the bits it actually needs from libstd like Path and File and let std instantiate it with its internal implementations?

@luser
Copy link

luser commented Oct 22, 2019

I guess this crate already has an implementation of roughly the same thing moria does anyway, but it'd need similar work to be no-std friendly:

// The loading path for OSX is is so different we just have a completely

tarcieri pushed a commit to iqlusioninc/abscissa that referenced this issue Nov 15, 2019
The `gimli` crate is an optional alternative backend for the `backtrace`
crate based on a pure-Rust DWARF parser as opposed to `libbacktrace`,
which is written in C. The long-term plan is to make `gimli` the default
backend for the `backtrace` crate (which will eventually be include in
`std` and is presently available on `nightly`).

For more information, see:

rust-lang/backtrace-rs#189

This adds considerably to Cargo.toml as `gimli` has quite a few
dependencies, and because of that, it's left off-by-default, but with a
commented-out option to activate it in the template's Cargo.toml.
@alexcrichton
Copy link
Member Author

I've recently gotten my once-every-six-months burst of motivation to work on this again. I've sent what's probably a-bit-to-intense flurry of activity to various crates. My goal has been reducing the dependency tree of the gimli-symbolize feature to a bare minimum:

After all that we're only left with stable_deref_trait left in gimli which is where I just ran out of steam for today. I also wanna make sure the other gimli PRs are ok before moving forward with trying to remove that.

With three trivial changes to these crates:

and a tiny diff to libstd:

diff --git a/Cargo.toml b/Cargo.toml
index 7b5e0fa1c28..3af8f542cb0 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -65,5 +65,12 @@ rustc-std-workspace-core = { path = 'src/tools/rustc-std-workspace-core' }
 rustc-std-workspace-alloc = { path = 'src/tools/rustc-std-workspace-alloc' }
 rustc-std-workspace-std = { path = 'src/tools/rustc-std-workspace-std' }
 
+backtrace = { path = "../backtrace-rs" }
+object = { git = 'https://github.com/alexcrichton/object', branch = 'dep-of-std' }
+addr2line = { git = 'https://github.com/alexcrichton/addr2line', branch = 'dep-of-std' }
+gimli = { git = 'https://github.com/alexcrichton/gimli', branch = 'dep-of-std' }
+
 [patch."https://github.com/rust-lang/rust-clippy"]
 clippy_lints = { path = "src/tools/clippy/clippy_lints" }
diff --git a/src/libstd/Cargo.toml b/src/libstd/Cargo.toml
index 923d5fa8cac..d36ac35e3fd 100644
--- a/src/libstd/Cargo.toml
+++ b/src/libstd/Cargo.toml
@@ -51,6 +51,7 @@ default = ["std_detect_file_io", "std_detect_dlsym_getauxval", "panic-unwind"]
 
 backtrace = [
   "backtrace_rs/dbghelp",          # backtrace/symbolize on MSVC
+  "backtrace_rs/gimli-symbolize",  # symbolize on most platforms
   "backtrace_rs/libbacktrace",     # symbolize on most platforms
   "backtrace_rs/libunwind",        # backtrace on most platforms
   "backtrace_rs/dladdr",           # symbolize on platforms w/o libbacktrace

I've got gimli almost building inside of libstd.

The last major remaining work item to actually hook all this up is to figure out how to get all the filesystem-related probing to work in the backtrace crate when it doesn't depend on the standard library. Right now the gimli-symblize feature requires the std feature since it uses items from the std crate, but when std depends on backtrace this naturally isn't available. The general idea is that somehow this crate needs to access the system bindings in std.

I tried today to hand-code everything in this crate, but it very quickly gets out of hand. While it's possible to duplicate much of std like reading directories, but the amount of support needed by backtrace is pretty nontrivial.


Basically the tl;dr; is that dependencies are no longer a problem. The only problem is how to give the backtrace crate, which std depends on, access to the system resources implemented in std itself.

@est31
Copy link
Member

est31 commented May 8, 2020

The only problem is how to give the backtrace crate, which std depends on, access to the system resources implemented in std itself.

Some solutions:

  • extract the OS dependent components that std uses into their own crate that backtrace depends on, and let std either reexport them or provide thin wrappers around (advantage of reexporting is backtrace using the same API)

  • listing everything backtrace needs inside a trait that std implements. probably quite a lot to write.

  • don't let std depend on backtrace as a separate crate but instead as module (#[path = "../../backtrace/lib.rs"]mod backtrace;)

@luser
Copy link

luser commented May 8, 2020

* listing everything backtrace needs inside a trait that std implements. probably quite a lot to write.

I poked at this a while ago and might have a half-baked patch laying around, it wasn't pretty but it was tractable.

@luser
Copy link

luser commented May 8, 2020

Here's what I had been hacking on:
luser@bd8b31c#diff-44ecd069c5ac9cb97fff0ec96402049cR188

I made a StdFeatures trait and was fleshing it out. I had been contemplating what it would take to make this easier if this is a direction we wanted to take for other things in libstd in the future. My best solution was to have libcore define traits that mapped to various parts of std and then std would have impls for them. (This would also provide a fairly standard way for mocking this functionality for tests.) The downside is that you'd wind up making literally everything that depended on any bit of std functionality generic over those traits which is a bummer. Maybe someone who has thought about it more could come up with a nicer solution, like having crate or module-level generics, I don't know.

@alexcrichton
Copy link
Member Author

When I tried to vendor everything inside of this crate there's basically two reasons that it got out of hand pretty quickly:

  • Primarily macOS does quite a lot in terms of filesystem traversals and inspecting strings. So much was needed here it got pretty hefty pretty quickly. Linux and Windows are generally much easier, all they need to do is open a file which is easy enough to do here by hand.
  • The definition of std::env::current_exe() is gnarly. For libbacktrace there's already a bunch of duplication of libstd and that's only covering a fraction of the platforms libstd has an implementation for. That's something I think is simply too burdensome to duplicate in multiple places.

This also isn't even starting down the road of "what about split debuginfo?" which is even more gnarly in terms of filesystem traversals. I don't really think there's a great solution to this problem today. A StdFeatures trait is pretty limiting because the set of features we need will be growing and likely unwieldy over time. Duplicating object-finding code between libstd and this crate defeats the purpose of having this crate be separately testable. Including this crate via a submodule places a pretty large burden on how to build it in inside of libstd (the features/dependencies/etc here are quite nontrivial).

The only halfway-feasible solution I can think of is to make a libstd_but_not_std crate which std depends on and reexports. The only purpose of that crate would be for crates that std itself depends on but only need that OS-specific part of std. The std crate would weave everything together into one facade. I say "halfway-feasible" because this would solve code-sharing and implementation issues but the core/alloc/std split is already extremely costly and introducing another is not a cost I think upstream wants to incur. Especially if it's solely for this one crate and this one crate alone.

@est31
Copy link
Member

est31 commented May 8, 2020

you'd wind up making literally everything that depended on any bit of std functionality generic over those traits which is a bummer.

You can also do dyn traits, although they limit you in what you can do (no types, etc) and are slower, but most I/O should be a one time thing anyways.

@alexcrichton alexcrichton added the gimli Related to the gimli implementation label May 8, 2020
@alexcrichton
Copy link
Member Author

My current plan is to first handle all active PRs and get everything settled. I'd like to do a final round of auditing to ensure that gimli doesn't have any obvious or major portability holes. After that I plan to update the crates.io version of this crate to enable gimli-symbolize by default, turning off libbacktrace by default. That'll probably have a back-and-forth for a bit while further bugs are weeded out.

After that I'll open a separate issue on this repository for inclusion into the standard library and the issues related with that.

alexcrichton added a commit that referenced this issue May 12, 2020
This commit switches this crate to using `gimli` by default for parsing
DWARF debug information. This is a long time coming and brings a number
of benefits:

* Primarily, Rust is safe. The libbacktrace library has been plagued
  with segfaults ever since we first started using it. Gimli, however,
  is almost entirely safe code. This should make us much more resilient
  in the face of buggy debuginfo.

* Secondarily, it means this library no longer needs a C compiler. Being
  an all-Rust crate generally makes it much easier to cross-compile,
  port, etc.

* Finally, this paves the road for future improvements such as
  split-debuginfo support by default. The libbacktrace library hasn't
  really changed since we started using it years ago, and this gives us
  more control over what's used and how now.

Closes #189
alexcrichton added a commit that referenced this issue May 12, 2020
This commit switches this crate to using `gimli` by default for parsing
DWARF debug information. This is a long time coming and brings a number
of benefits:

* Primarily, Rust is safe. The libbacktrace library has been plagued
  with segfaults ever since we first started using it. Gimli, however,
  is almost entirely safe code. This should make us much more resilient
  in the face of buggy debuginfo.

* Secondarily, it means this library no longer needs a C compiler. Being
  an all-Rust crate generally makes it much easier to cross-compile,
  port, etc.

* Finally, this paves the road for future improvements such as
  split-debuginfo support by default. The libbacktrace library hasn't
  really changed since we started using it years ago, and this gives us
  more control over what's used and how now.

Closes #189
alexcrichton added a commit that referenced this issue May 12, 2020
This commit switches this crate to using `gimli` by default for parsing
DWARF debug information. This is a long time coming and brings a number
of benefits:

* Primarily, Rust is safe. The libbacktrace library has been plagued
  with segfaults ever since we first started using it. Gimli, however,
  is almost entirely safe code. This should make us much more resilient
  in the face of buggy debuginfo.

* Secondarily, it means this library no longer needs a C compiler. Being
  an all-Rust crate generally makes it much easier to cross-compile,
  port, etc.

* Finally, this paves the road for future improvements such as
  split-debuginfo support by default. The libbacktrace library hasn't
  really changed since we started using it years ago, and this gives us
  more control over what's used and how now.

Closes #189
alexcrichton added a commit that referenced this issue May 12, 2020
This commit switches this crate to using `gimli` by default for parsing
DWARF debug information. This is a long time coming and brings a number
of benefits:

* Primarily, Rust is safe. The libbacktrace library has been plagued
  with segfaults ever since we first started using it. Gimli, however,
  is almost entirely safe code. This should make us much more resilient
  in the face of buggy debuginfo.

* Secondarily, it means this library no longer needs a C compiler. Being
  an all-Rust crate generally makes it much easier to cross-compile,
  port, etc.

* Finally, this paves the road for future improvements such as
  split-debuginfo support by default. The libbacktrace library hasn't
  really changed since we started using it years ago, and this gives us
  more control over what's used and how now.

Closes #189
alexcrichton added a commit that referenced this issue May 12, 2020
This commit switches this crate to using `gimli` by default for parsing
DWARF debug information. This is a long time coming and brings a number
of benefits:

* Primarily, Rust is safe. The libbacktrace library has been plagued
  with segfaults ever since we first started using it. Gimli, however,
  is almost entirely safe code. This should make us much more resilient
  in the face of buggy debuginfo.

* Secondarily, it means this library no longer needs a C compiler. Being
  an all-Rust crate generally makes it much easier to cross-compile,
  port, etc.

* Finally, this paves the road for future improvements such as
  split-debuginfo support by default. The libbacktrace library hasn't
  really changed since we started using it years ago, and this gives us
  more control over what's used and how now.

Closes #189
@alexcrichton
Copy link
Member Author

With this now done and shipped, I've opened a dedicated issue to including gimli support in libstd by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gimli Related to the gimli implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants