Improve the mechanism used to suspend threads #28

gabrielesvelto · 2022-05-24T09:26:41Z

Suspending the target process threads is currently an all-or-nothing operation that prevents minidump generation upon failure. The system we currently use is to iterate over the threads we've enumerated during initialization. This is an inherently racy operation as new threads can be created in the meantime and existing ones can quit. This is partially handled in that we tolerate being unable to suspend certain threads assuming that we stop at least a single one.

There should be ways to improve this but they're not clear-cut and probably needs some experimentation. One possibility is to send a SIGSTOP signal to the target process and wait for it to settle down before ptrace()-ing it. SIGSTOP is a big hammer that needs to be used with caution so I don't know what kind of issues we might encounter, but it could make the process more reliable by getting rid of some races (or at least leaving the kernel to handle them).

Additionally we might consider dumping threads we couldn't suspend. The data we'll get might be garbage but I don't really mind, we get garbage all the time in crash reports, it could also be useful. Let's say it would certainly be better than nothing. The reason why we never tried to do it with Breakpad is simple enough: handling errors is a chore! But here it's a lot easier. We could factorize the code for writing a single thread in thread_list_stream::write() and ignore individual failures (or even better, record them!).

The text was updated successfully, but these errors were encountered:

cpeterso · 2022-05-24T18:09:09Z

This work is related to Firefox Android's "no crashing thread identified" bug where many crash reports are missing thread stack traces:

https://bugzilla.mozilla.org/show_bug.cgi?id=1644486

things. If this fails, we continue as we used to. This is an attempt to get a consistent/static process state. Closes rust-minidump#28.

…ace things. (#108) * Send SIGSTOP to linux/android processes before doing other procfs/ptrace things. If this fails, we continue as we used to. This is an attempt to get a consistent/static process state. Closes #28. * Fix mac * Fix warning about unused doc comment * Allow user customization of timeout Not sure this is needed, but better to allow users to customize this rather than rely on a hardcoded value * Update nix --------- Co-authored-by: Jake Shadle <jake.shadle@embark-studios.com>

afranchuk · 2024-04-01T18:07:21Z

I've just tested this in the android emulator by running all test suites there.

I guess the easier way to do this might have been to install rust in the emulator, but instead I messed around with cargo configuration to set up the NDK linker and a runner script which pushed files with adb. There was an additional code change needed: adding support for an env variable to specify the test helper binary which is used to create child processes. The test suites previously used cargo run or CARGO_BIN_EXE_test env vars. I added a bit of code to use a TEST_HELPER env var if set. There's also an extra step of running cargo run --bin test to get it copied to the emulator since cargo test won't do that. I can upstream this if we think it useful, otherwise compiling and running in the emulator is very likely less setup/friction.

All tests pass in the emulator! Which gives a bit more confidence that there's nothing weird around SIGSTOP on android.

Jake-Shadle · 2024-04-02T07:45:26Z

I can upstream this if we think it useful, otherwise compiling and running in the emulator is very likely less setup/friction.

I'd very much appreciate android tests being run in this repo's CI if you can get it working reliably

afranchuk · 2024-04-05T17:14:03Z

@Jake-Shadle I see in the CI that cross is already being used, but the test lines are commented out. Was that due to flakiness? I was considering possibly using https://github.com/ReactiveCircus/android-emulator-runner.

Jake-Shadle · 2024-04-05T17:34:19Z

Cross uses qemu for running test binaries, which doesn't support ptrace at all, unfortunately.

afranchuk · 2024-04-05T17:52:57Z

Oh I see. I'll try out this other action, it seems to make it a bit easier than manually writing all the stuff to fetch and start the emulator.

gabrielesvelto mentioned this issue Oct 26, 2023

Add soft errors to minidump #31

Open

afranchuk mentioned this issue Mar 20, 2024

Send SIGSTOP to linux/android processes before doing other procfs/ptrace things. #108

Merged

Jake-Shadle closed this as completed in #108 Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the mechanism used to suspend threads #28

Improve the mechanism used to suspend threads #28

gabrielesvelto commented May 24, 2022

cpeterso commented May 24, 2022

afranchuk commented Apr 1, 2024

Jake-Shadle commented Apr 2, 2024

afranchuk commented Apr 5, 2024 •

edited

Jake-Shadle commented Apr 5, 2024

afranchuk commented Apr 5, 2024

Improve the mechanism used to suspend threads #28

Improve the mechanism used to suspend threads #28

Comments

gabrielesvelto commented May 24, 2022

cpeterso commented May 24, 2022

afranchuk commented Apr 1, 2024

Jake-Shadle commented Apr 2, 2024

afranchuk commented Apr 5, 2024 • edited

Jake-Shadle commented Apr 5, 2024

afranchuk commented Apr 5, 2024

afranchuk commented Apr 5, 2024 •

edited