Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crashpad_handler processes persist after app close #994

Open
1 of 3 tasks
tomfclarke opened this issue May 24, 2024 · 10 comments
Open
1 of 3 tasks

crashpad_handler processes persist after app close #994

tomfclarke opened this issue May 24, 2024 · 10 comments

Comments

@tomfclarke
Copy link

Description

Using version 0.7.2 of sentry-native, on Apple silicon Mac devices only we are seeing crashpad_handler processes remain running after closing the application. Doesn't occur on Intel Mac devices. Comparing debug sessions of both ARM and Intel Macs it appears that the MACH_NOTIFY_NO_SENDERS message doesn't make it through to the handler (or isn't sent) on ARM devices.

When does the problem happen

  • During build
  • During run-time
  • When capturing a hard crash

Environment

  • OS: macOS Ventura, macOS Sonoma
  • Arch: aarch64
  • Compiler: AppleClang 15
  • CMake version and config: 3.27.8

Steps To Reproduce

  • Open application that uses Sentry.
  • Observe crashpad_handler process created in Activity Monitor.
  • Close application.
  • Wait for crashpad_handler process to exit.

Log output

Looks sane:

"sending envelope"
"submitting task to background worker thread"
"shutting down backend"
"executing task on worker thread"
"shutting down transport"
"shutting down background worker thread"
"submitting task to background worker thread"
"executing task on worker thread"
"background worker thread shut down"
@supervacuus
Copy link
Collaborator

Hi @tomfclarke, thanks for the report.

I have a couple of questions:

  • did this behavior change with 0.7.2 specifically?
  • did you run the sentry_example and saw the same behavior?

Usually when the crashpad_handler doesn't receive a MACH_NOTIFY_NO_SENDERS message (which comes directly from kernel) it means that there is some process still connected to the crashpad_handler.

This can happen when you start other processes in your application (which inherit send/recv-rights from from your process) that might still run after your application is terminated. It recently happened to another user who ran multiple crashpad instances from the same process.

You can use lsmp to list all open mach-port send/recv-rights on the running crashpad_handler instance:

sudo lsmp -p <crashpad_handler_pid>

Especially consider the right-most column to see if any of the processes listed might be started from your application. You should be looking for non-system processes that have send rights. Be aware that the crashpad_handler will have two recv rights associated (but they should be to the same pid).

@tomfclarke
Copy link
Author

Hey @supervacuus

  • did this behavior change with 0.7.2 specifically?

No. We've seen this at least as far back as 0.5.0.

  • did you run the sentry_example and saw the same behavior?

We weren't able to reproduce with sentry_example (perhaps it is something on our side).

This is what lsmp looks like after the application has been closed:
Screenshot 2024-05-22 at 16 59 34

It doesn't appear that any non-system processes are present with send rights but maybe you're able to spot something I can't?

@supervacuus
Copy link
Collaborator

supervacuus commented May 29, 2024

It doesn't appear that any non-system processes are present with send rights but maybe you're able to spot something I can't?

No, the list is sensible. But the mask of the exception port object at the end of your lsmp log caught my interest. Crashpad certainly doesn't register the exception port on EXC_RESOURCE alone.

Can you verify with lsmp that the state of the exc_port_object in the crashpad_handler is the same when your program is still running?

Can you also check what lsmp prints for your Intel machine (while your application is still running)?

Do you run the program in a sandbox (or any other restrictions) or as an x86_64 executable via Rosetta2 on your aarch64 machine?

You can also try building the Native SDK as a Debug project, in which case the crashpad_handler logs will be sent to your application's stdout. They may reveal a non-fatal issue after sentry_init() starts the handler.

@tomfclarke
Copy link
Author

Can you verify with lsmp that the state of the exc_port_object in the crashpad_handler is the same when your program is still running?

Confirmed. exc_port_object state does not change once the application quits.

Can you also check what lsmp prints for your Intel machine (while your application is still running)?

On Intel, lsmp output is largely the same, other than the send count and receive count being 1 higher than ARM.
The line for the extra recv,send present on Intel but not ARM looks like:

  name      ipc-object    rights     flags   boost  reqs  recv  send sonce oref  qlimit  msgcount  context            identifier  type
---------   ----------  ----------  -------- -----  ---- ----- ----- ----- ----  ------  --------  ------------------ ----------- ------------
0x00001707  0x795a0a19  recv,send   --------     0  ---      1     1         Y        5         0  0x0000000000000000

You can also try building the Native SDK as a Debug project, in which case the crashpad_handler logs will be sent to your application's stdout. They may reveal a non-fatal issue after sentry_init() starts the handler.

Here's the sentry output (and some other potentially relevant output) from our app, built in debug config. Debug output from sentry is enabled with sentry_options_set_debug(). This is immediately after opening the app:

"using database path \"<path-to-database>\""
"starting transport"
"starting background worker thread"
"starting backend"
"background worker thread started"
"starting crashpad backend with handler \"<path-to-handler>\""
"using minidump URL \"<minidump-url>\""
[8368:16810278:20240530,110107.934540:INFO retry_upload_thread.cc:39] Start
"started crashpad client handler"
"processing and pruning old runs"
[8358:16810019] WARNING: Secure coding is automatically enabled for restorable state! However, not on all supported macOS versions of this application. Opt-in to secure coding explicitly by implementing NSApplicationDelegate.applicationSupportsSecureRestorableState:.
[8358:16810019] [plugin] AddInstanceForFactory: No factory registered for id <CFUUID 0x600003945940> F8BB1C28-BAE8-11D6-9C31-00039315CD46
[8358:16810019] [si_destination_compare] send failed: Invalid argument
[8358:16810019] [si_destination_compare] send failed: Undefined error: 0
[8358:16810019] [si_destination_compare] send failed: Invalid argument
[8358:16810363] [AMCP] 103939          HALC_ProxyIOContext.cpp:1328  HALC_ProxyIOContext::IOWorkLoop: skipping cycle due to overload
[8358:16810363] [AMCP] 103939          HALC_ProxyIOContext.cpp:1328  HALC_ProxyIOContext::IOWorkLoop: skipping cycle due to overload
[8358:16810363] [AMCP] 103939          HALC_ProxyIOContext.cpp:1328  HALC_ProxyIOContext::IOWorkLoop: skipping cycle due to overload
[8358:16810019] [miscellany] FAULT: <NSRemoteView: 0x2d3190660 com.apple.TextInputUI.xpc.CursorUIViewService TUICursorUIViewService> determined it was necessary to configure <TUINSWindow: 0x2d2ba7c60> to support remote view vibrancy

Does anything here look concerning?

@tomfclarke
Copy link
Author

Do you run the program in a sandbox (or any other restrictions) or as an x86_64 executable via Rosetta2 on your aarch64 machine?

No sandbox. Our app is compiled for aarch64, it doesn't require Rosetta.

@supervacuus
Copy link
Collaborator

supervacuus commented May 30, 2024

Can you verify with lsmp that the state of the exc_port_object in the crashpad_handler is the same when your program is still running?

Confirmed. exc_port_object state does not change once the application quits.

Thx!

Can you also check what lsmp prints for your Intel machine (while your application is still running)?

On Intel, lsmp output is largely the same, other than the send count and receive count being 1 higher than ARM.

Sorry, I meant specifically the exc_port_object line at the end of the output. Does the mask on your Intel machine also say RESOURCE only?

I would also be interested to see what the lsmp of the crashpad_handler started from sentry_example says about the exc_port_object mask on your aarch64 machine.

Does anything here look concerning?

Yes and no. This log looks standard. None of the logs after "processing and pruning old runs" are related to the Native SDK or its backends. The only log line that looks like it could be originating from Crashpad is this one here:

[8368:16810278:20240530,110107.934540:INFO retry_upload_thread.cc:39] Start

But it refers to a non-existent file, and the message looks cut off. The crashpad_handler has a crash_report_upload_thread.cc, and while crashpad_handler would start that thread when booting, it wouldn't log anything during the start. Could the logs be mangled due to multiple threads writing to stdout?

@supervacuus
Copy link
Collaborator

Do you run the program in a sandbox (or any other restrictions) or as an x86_64 executable via Rosetta2 on your aarch64 machine?

No sandbox. Our app is compiled for aarch64, it doesn't require Rosetta.

Thx, I am just trying to create bounds to any potential barriers for a successful NO_SENDERS delivery.

@tomfclarke
Copy link
Author

Does the mask on your Intel machine also say RESOURCE only?

On both ARM and Intel there is only RESOURCE. I should mention that if a debugger is attached to the application the exc_port_object mask shows BAD_ACCESS BAD_INSTRUCTION ARITHMETIC EMULATION SOFTWARE BREAKPOINT SYSCALL MACH_SYSCALL RPC_ALERT RESOURCE GUARD. I'm not sure why attaching a debugger to the app changes this.

I would also be interested to see what the lsmp of the crashpad_handler started from sentry_example says about the exc_port_object mask on your aarch64 machine.

I'm yet to try on aarch64 but I ran lsmp on sentry_example's crashpad_handler process on Intel and it shows the same exc_port_object mask as the app (different when debugging as above).

Could the logs be mangled due to multiple threads writing to stdout?

Your comment about retry_upload_thread.cc not existing lead me to realise that we've added at least one modification/extension to external/crashpad. That file was created by us. I'll see if I can build with the vanilla version of the Sentry SDK and crashpad and test again.

We weren't able to reproduce with sentry_example

I also need to double-check this as I'm not convinced it successfully started the handler when this was checked.

@supervacuus
Copy link
Collaborator

Does the mask on your Intel machine also say RESOURCE only?

On both ARM and Intel there is only RESOURCE. I should mention that if a debugger is attached to the application the exc_port_object mask shows BAD_ACCESS BAD_INSTRUCTION ARITHMETIC EMULATION SOFTWARE BREAKPOINT SYSCALL MACH_SYSCALL RPC_ALERT RESOURCE GUARD. I'm not sure why attaching a debugger to the app changes this.

Okay, this may be a red herring if it is consistent across all your runs/devices. I just found it curious that the mask only shows EXC_RESOURCE. There is no need to investigate further. The debugger will extend the exception-port mask to catch all mach exceptions (similar to how it typically registers to break on all signals in POSIX systems).

Your comment about retry_upload_thread.cc not existing lead me to realise that we've added at least one modification/extension to external/crashpad. That file was created by us. I'll see if I can build with the vanilla version of the Sentry SDK and crashpad and test again.

We weren't able to reproduce with sentry_example

I also need to double-check this as I'm not convinced it successfully started the handler when this was checked.

Ok, thanks. Let us know what you find out.

@tomfclarke
Copy link
Author

Just a small update: sentry_example has no issues even with our modifications to crashpad. Continuing to investigate potential differences between that and our app. We do spawn some child processes so the next step is to test without those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for: Community
Status: Needs More Information
Development

No branches or pull requests

3 participants