-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project Metabug: Kill Breakpad #375
Comments
Under the first point we could add making rust-minidump print out as many error codes as possible in a readable format. We have patches on top of breakpad in bug 1704034 and bug 1703248. There's another one coming in bug 1705018 which would give complete coverage of Windows errors. The goal here would be to make all crash exception codes and |
Also yeah, AArch64/Windows unwinding information is special mostly because it didn't match Microsoft documentation. There were quite a few holes and errors that Nathan had to fill when he implemented it. Our implementation in bug 1529355 is probably the gold standard for it. |
One thing I didn't have a chance to mention in the meeting: does anyone know if gimli's dead unwind-rs has any meat to extract for our purposes? See also: Need to read through the former to get a sense of what to salvage from both our impls. Edit: oh wait this is just shelling out to breakpad, ok cool, thought you had started writing more actual stackwalking code. |
Oh fun the perf team has another unwinder for linux and android64 profiling: https://blog.mozilla.org/jseward/2014/05/13/lul-a-lightweight-unwinder-library-for-profiling-gecko/ Apparently it is really slow to start up: https://bugzilla.mozilla.org/show_bug.cgi?id=1635810 |
On the topic of forked-Breakpad-parity Steven Michaud just added code to read and write macOS-specific |
A few notes:
rust-minidump/rust-minidump#22 is one. rust-minidump/rust-minidump#25 would be another, but it's a big chunk of work. Probably OK to punt on it for a first pass.
https://github.com/jrmuizel/pdb-downloader
Yeah, don't do that. I think the right thing to do is to move |
Yeah those are just busywork. I'll probably tackle them myself in the coming months while I wait for builds and tests to complete. |
I think next week I'm going to start trying to rewrite minidump-stackwalk in rust just to properly use rust-minidump in anger and feel out what it's actually missing, and potentially get some preliminary performance numbers. |
Another goal that we have is to revise the symcache format btw. In the end, integrating a compact and fast format for CFI is another goal. IMO, apples compact unwind format is a good inspiration, and I will catch up on https://blog.mozilla.org/jseward/2013/09/03/how-compactly-can-cfiexidx-stack-unwinding-info-be-represented/ as well. |
FWIW, the |
So I roughed that out but the socorro format and the ProcessState's structure differ a fair amount as far as a naive |
I (Socorro maintainer) am happy to change Socorro as needed. If it turns out there are scripts downstream for Socorro that need a certain structure, Socorro's processor can deal with supporting and migrating them. |
I've made a metabug to specifically track what's needed for the first milestone: rust-minidump/rust-minidump#153 |
That's fair. I don't have a real answer here. The existing |
Something just occurred to me while I was thinking about the JSON output. Unfortunately Socorro is not the only downstream user. The minidump-analyzer which we run on user machines also emits Socorro-like JSON but with all the redundant fields stripped away. It's a subset basically. The generated stack trace is sent via crash pings and documented here. This format has downstream users: Jim Mathies wrote a tool to symbolicate the crash pings and used Socorro's signature generation library to aggregate crash pings, see here. |
And here's another thought I had in the context of the minidump-analyzer. Since we ship it on user machines it currently contains a lot of redundant code because Breakpad is all or nothing. For example we have to ship ARM unwinding support even in binaries that only run on x86 machines (and thus process only x86 minidumps). If support for the various architectures and debuginfo formats could be chosen at compile time it would make the binaries quite a bit smaller. Note this isn't a feature request, it's more like wishful thinking for when everything else will have already been done. |
It's possible that no one is looking at the minidump-related parts of crash pings other than Jimm's team. They're using fx-crash-sig (which I maintain) to take a crash ping, symbolicate it with Tecken (which I maintain), and run signature generation on it using siggen (which I maintain). So I can probably handle some amount of change depending on the details. Even so, @gabrielesvelto is right in that it's harder to absorb changes in the crash ping structure than what minidump-stackwalk generates that Socorro consumes. |
We've made a lot of progress, so I've taken the opportunity to cleanup the metabug description and marked off everything that's done! We're getting surprisingly close to completing Milestone 1! I've also given every usecase a proper bold label so that subtasks can more clearly indicate that they are only needed for a particular usecase. |
We removed breakpad from our own symbolicator service 🎉 , and the code in this repo is now behind a feature flag, and will be removed completely on the next major. |
Awesome work, everyone! |
High-level goal: improve rust-minidump (and related libraries) to the point that it can replace all the uses of breakpad in Sentry and Firefox.
For now this should be restricted to the scope of:
Subtasks
NOTE: Larger tasks are checked off even if they have incomplete subtasks to indicate that they are complete for the purposes of the current milestone.
Ensure rust-minidump can parse and expose all the minidump details we rely on
Replace the derlict breakpad-symbols subcrate with a new symbolication implementation
symbolic
into rust-minidump (Replace breakpad-symbols with something based on symbolic rust-minidump/rust-minidump#159)Complete dump_syms support of the various unwinding info formats (via symbolic):
Implement an (offline) unwinder
panic!
usecase (handling personality/lsdas to run dtors, catch_panic) (no use, just cool)Implement client-side minidump generation (Bugzilla#1588530) (moz-breakpad-client, sentry-breakpad-client)
The Context
Minidumps are a Microsoft-designed format for more compact dumps of a process's state when it crashes, notably including full memory dumps of every thread's stacks/registers and mapped code modules (libraries that are linked in and what addresses they were mapped to).
Windows has native APIs for generating minidumps, but this is a feature that's desirable on other platforms, so google-breakpad was created to generate "fake" minidumps on other platforms and process them all uniformly. The most important output of this process is backtraces for every thread, but additional context stored in minidumps may be useful for debugging weird stuff like "the user's antivirus DLL-injected itself into out process and messed everything up" or "oh look the last syscall failed right before we crashed".
Both Firefox and Sentry rely on breakpad for minidump generation and handling. Unfortunately, breakpad is written in dangerous C++ and basically abandoned by google. Mozilla doesn't bother upstreaming our patches anymore, and it's too much work to maintain it.
Usecases
Here's the places where we use breakpad now that should work with a replacement. Each has a codename so that tasks/milestones can reference them.
Mozilla Usecases
minidump-stackwalk: On the server-side, Mozilla uses breakpad in minidump-stackwalk to process minidump-based crash reports for socorro.
moz-breakpad-client: On the client-side, Mozilla uses breakpad in our crash-reporter to generate minidumps. For content-process (~tab) crashes, the main process does this work out-of-crashing-process. For main-process (full browser) crashes, the main-process does this work in-crashing-process. Ideally we would have a separate crash-reporting process on the side that monitors the others so that all our handling can be out-of-crashing-process.
minidump-analyzer: On the client-side, Mozilla uses breakpad in our minidump-analyzer to try to analyze the contents of the minidump using the client machine's knowledge of its own system libraries and any local debuginfo we ship with firefox. This allows us to get more accurate symbolication/unwinding. (This also includes some of our own adhoc symbolication/unwinding code which is Buggy) and ideally would be replaced
moz-stackwalk: As a stretch-goal, this work would ideally also replace the need for moz-stackwalk (our own runtime backtracer for debug build backtraces and profiler probing) and fix-stacks (cleans up moz-stackwalk's output using native symbols).
Sentry Usecases
symbolicator: On the server-side, Sentry uses breakpad inside of symbolicator to process minidumps and extract a meaningful stack trace.
sentry-breakpad-client: On the client-side, sentry-native uses crashpad or alternatively breakpad to create minidumps of the crashing process to send over for server-side post-processing.
Microsoft Usecases?
TBD!
Current Roadmap
Milestone 1 - minidump-stackwalk
Metabug: rust-minidump/rust-minidump#153
Mozilla would like to first get the minidump-stackwalk usecase working, as it's the simplest but also very high traffic (performance matters), and processing user-provided data on our servers (security matters).
minidump-stackwalk only needs to handle pre-processed symbols from our symbol servers (i.e. the breakpad text format), and is operating completely offline from where the minidump was generated.
The hardest part will be generating backtraces for all the threads, which requires a complete offline unwinder.
Milestone 2 - minidump-analyzer
TBD, may choose different goal based on how Milestone 1 goes
Milestone 3 - symbolicator
TBD, may choose different goal based on how Milestone 2 goes
The text was updated successfully, but these errors were encountered: