Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mono_gc_finalize_notify spinning after process exit #16803

Closed
Therzok opened this issue Sep 12, 2019 · 4 comments · Fixed by #17002
Closed

mono_gc_finalize_notify spinning after process exit #16803

Therzok opened this issue Sep 12, 2019 · 4 comments · Fixed by #17002

Comments

@Therzok
Copy link
Contributor

Therzok commented Sep 12, 2019

Steps to Reproduce

  1. N/A, Chris Hardy might have some insight on slack
  2. https://devdiv.visualstudio.com/DevDiv/_workitems/edit/975740

Current Behavior

VSMac process hangs

Expected Behavior

VSMac process does not hang

On which platforms did you notice this

[*] macOS
[ ] Linux
[ ] Windows

Version Used:

=== Mono Framework MDK ===

Runtime:
Mono 6.4.0.179 (2019-06/3608a495413) (64-bit)
Package version: 604000179

Stacktrace

Something happens on process exit and the process spins indefinitely:

2246 objc_msgSend  (in libobjc.A.dylib) + 29  [0x7fff6f9a085d]
  2246 altstack_handle_and_restore  (in libmonosgen-2.0.1.dylib) + 71  [0x104917bc7]  exceptions-amd64.c:870
    2246 mono_handle_native_crash  (in libmonosgen-2.0.1.dylib) + 213  [0x1048b68d5]  mini-exceptions.c:3328
      2246 mono_dump_native_crash_info  (in libmonosgen-2.0.1.dylib) + 995  [0x10491dd93]  mini-posix.c:1083
        2246 ???  (in <unknown binary>)  [0x103ed5000]
          2246 _sigtramp  (in libsystem_platform.dylib) + 29  [0x7fff70f0ab1d]
            2246 mono_sigchld_signal_handler  (in libmonosgen-2.0.1.dylib) + 18  [0x104980252]  w32process-unix.c:1159
               2246 mono_gc_finalize_notify  (in libmonosgen-2.0.1.dylib) + 80  [0x104aa9be0]  mono-os-semaphore.h:167

Please paste the stack trace here if available.
@marek-safar
Copy link
Member

@BrzVlad any thoughts on this?

@BrzVlad
Copy link
Member

BrzVlad commented Sep 13, 2019

This doesn't look GC related. Looks like something else crashes, that I can't really tell from the stacktrace. Maybe MERP ? @alexischr Did you investigate the other vsmac crash related to MERP ?

@lambdageek
Copy link
Member

We set up a SIGCHLD signal handler mono_sigchld_signal_handler, but the crash reporting code used fork to start up a supervisor process (its job is to kill the parent if the crash reporting gets stuck), and I wonder if that process is dying and we get a SIGCHLD signal and then try to fun the GC finalizer thread from inside crash reporting.

@lambdageek
Copy link
Member

lambdageek commented Sep 19, 2019

I think this is the same issue as (the SIGABRT part of) #16572.

So here is what is supposed to happen when mono gets a SIGABRT when MERP is enabled:

  1. Process (call it "A") gets a SIGABRT
  2. It collects a crash report
  3. In the abort handler it forks off a process "B" which is supposed to invoke MERP to send the crash data:

    mono/mono/mini/mini-posix.c

    Lines 1028 to 1062 in d02c121

    pid = (pid_t) fork ();
    #else
    g_assert_not_reached ();
    #endif
    #if defined (HAVE_PRCTL) && defined(PR_SET_PTRACER)
    if (pid > 0) {
    // Allow gdb to attach to the process even if ptrace_scope sysctl variable is set to
    // a value other than 0 (the most permissive ptrace scope). Most modern Linux
    // distributions set the scope to 1 which allows attaching only to direct children of
    // the current process
    prctl (PR_SET_PTRACER, pid, 0, 0, 0);
    }
    #endif
    #if defined(TARGET_OSX) && !defined(DISABLE_CRASH_REPORTING)
    if (!double_faulted && mono_merp_enabled ()) {
    if (pid == 0) {
    if (output) {
    gboolean merp_upload_success = mono_merp_invoke (crashed_pid, signal, output, &hashes);
    if (!merp_upload_success) {
    g_async_safe_printf("\nThe MERP upload step has failed.\n");
    } else {
    // Remove
    g_async_safe_printf("\nThe MERP upload step has succeeded.\n");
    mono_summarize_timeline_phase_log (MonoSummaryDone);
    }
    mono_summarize_toggle_assertions (FALSE);
    } else {
    g_async_safe_printf("\nMerp dump step not run, no dump created.\n");
    }
    }
    }
  4. The process B forks off a process "C" that does exec("open -a MERPuploader")

    mono/mono/utils/mono-merp.c

    Lines 275 to 302 in d02c121

    pid_t pid = (pid_t) fork ();
    // Only one we define on OSX
    if (pid == 0) {
    const char *open_path = "/usr/bin/open";
    const char *argvOpen[] = {open_path, "-a", config.merpGUIPath, NULL};
    execv (open_path, (char**)argvOpen);
    exit (-1);
    } else {
    int status;
    waitpid (pid, &status, 0);
    gboolean exit_success = FALSE;
    int exit_status = FALSE;
    while (TRUE) {
    if (waitpid(pid, &status, WUNTRACED | WCONTINUED) == -1)
    break;
    if (WIFEXITED(status)) {
    exit_status = WEXITSTATUS(status);
    exit_success = TRUE;
    invoke_success = exit_status == TRUE;
    break;
    } else if (WIFSIGNALED(status)) {
    break;
    }
    }
    }
  5. "B" waits for "C" (the open process) and its child (the actual MERP uploader)

The problem is that "B" inherits "A"'s signal handler for SIGCHLD, so when C terminates (right away since it's just open) B gets a SIGCHLD and mono_sigchld_signal_handler runs

MONO_SIGNAL_HANDLER_FUNC (static, mono_sigchld_signal_handler, (int _dummy, siginfo_t *info, void *context))
{
/*
* Don't want to do any complicated processing here so just wake up the finalizer thread which will call
* mono_w32process_signal_finished ().
*/
int old_errno = errno;
mono_gc_finalize_notify ();
mono_set_errno (old_errno);

The problem is the mono_gc_finalize_notify which is a semaphore post is running on a semaphore in "B" where it hasn't been initialized. So (apparently) it ends up hanging.

Stack Trace of "B"
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x0000000107e8fbe0 libmonosgen-2.0.dylib`mono_gc_finalize_notify [inlined] mono_os_sem_post at mono-os-semaphore.h:167:3 [opt]
    frame #1: 0x0000000107e8fbbb libmonosgen-2.0.dylib`mono_gc_finalize_notify [inlined] mono_coop_sem_post at mono-coop-semaphore.h:65 [opt]
    frame #2: 0x0000000107e8fbbb libmonosgen-2.0.dylib`mono_gc_finalize_notify at gc.c:795 [opt]
    frame #3: 0x0000000107d66232 libmonosgen-2.0.dylib`mono_sigchld_signal_handler(_dummy=, info=, context=) at w32process-unix.c:1159:2 [opt]
    frame #4: 0x00007fff5c7c0b5d libsystem_platform.dylib`_sigtramp + 29
    frame #5: 0x00007fff5c70f355 libsystem_kernel.dylib`__wait4 + 9
    frame #6: 0x0000000107eedd31 libmonosgen-2.0.dylib`mono_merp_invoke at mono-merp.c:283:3 [opt]
    frame #7: 0x0000000107eedd12 libmonosgen-2.0.dylib`mono_merp_invoke(crashed_pid=, signal=, non_param_data=, hashes=) at mono-merp.c:498 [opt]
    frame #8: 0x0000000107d03d73 libmonosgen-2.0.dylib`mono_dump_native_crash_info at mini-posix.c:1083:37 [opt]
    frame #9: 0x0000000107d03d1d libmonosgen-2.0.dylib`mono_dump_native_crash_info(signal="SIGABRT", mctx=, info=) at mini-posix.c:1158 [opt]
    frame #10: 0x0000000107c9c8b5 libmonosgen-2.0.dylib`mono_handle_native_crash(signal="SIGABRT", mctx=0x00007ffeeb34a0c8, info=0x00007ffeeb34a6a8) at mini-exceptions.c:3328:2 [opt]
    frame #11: 0x0000000107d02f71 libmonosgen-2.0.dylib`sigabrt_signal_handler(_dummy=6, _info=0x00007ffeeb34a6a8, context=0x00007ffeeb34a710) at mini-posix.c:234:3 [opt]
    frame #12: 0x00007fff5c7c0b5d libsystem_platform.dylib`_sigtramp + 29
    frame #13: 0x00007fff5c70a22b libsystem_kernel.dylib`mach_msg_trap + 11
    frame #14: 0x00007fff5c70a76c libsystem_kernel.dylib`mach_msg + 60
    frame #15: 0x00007fff306801ee CoreFoundation`__CFRunLoopServiceMachPort + 328
    frame #16: 0x00007fff3067f75c CoreFoundation`__CFRunLoopRun + 1612
    frame #17: 0x00007fff3067eebe CoreFoundation`CFRunLoopRunSpecific + 455
    frame #18: 0x00007fff2f8de1ab HIToolbox`RunCurrentEventLoopInMode + 292
    frame #19: 0x00007fff2f8ddee5 HIToolbox`ReceiveNextEventCommon + 603
    frame #20: 0x00007fff2f8ddc76 HIToolbox`_BlockUntilNextEventMatchingListInModeWithFilter + 64
    frame #21: 0x00007fff2dc7679d AppKit`_DPSNextEvent + 1135
    frame #22: 0x00007fff2dc7548b AppKit`-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1361
    frame #23: 0x000000010e4ab797 libgdk-quartz-2.0.0.dylib`poll_func(ufds=0x00007fd0c017ebf0, nfds=2, timeout_=699) at gdkeventloop-quartz.c:779:11
    frame #24: 0x000000010dd985ab libglib-2.0.0.dylib`g_main_context_iterate + 366
    frame #25: 0x000000010dd9885b libglib-2.0.0.dylib`g_main_loop_run + 207
    frame #26: 0x000000010dfebfe2 libgtk-quartz-2.0.0.dylib`gtk_main at gtkmain.c:1257:7
    frame #27: 0x0000000117812328
    frame #28: 0x00000001083bf4bb
    frame #29: 0x00000001083bef93
    frame #30: 0x0000000107bf8d69 libmonosgen-2.0.dylib`mono_jit_runtime_invoke(method=, obj=, params=0x00007ffeeb34c4d8, exc=0x00007fd0c0840210, error=) at mini-runtime.c:3184:12 [opt]
    frame #31: 0x0000000107e16354 libmonosgen-2.0.dylib`do_runtime_invoke(method=0x00007fd0b7667298, obj=0x0000000000000000, params=0x00007ffeeb34c4d8, exc=0x0000000000000000, error=0x00007ffeeb34c518) at object.c:3017:11 [opt]
    frame #32: 0x0000000107e1a07c libmonosgen-2.0.dylib`do_exec_main_checked [inlined] mono_runtime_invoke_checked(method=, obj=, params=, error=0x00007ffeeb34c518) at object.c:3185:9 [opt]
    frame #33: 0x0000000107e1a06d libmonosgen-2.0.dylib`do_exec_main_checked(method=0x00007fd0b7667298, args=, error=0x00007ffeeb34c518) at object.c:5119 [opt]
    frame #34: 0x0000000107c5a4dd libmonosgen-2.0.dylib`mono_jit_exec at driver.g.c:1329:13 [opt]
    frame #35: 0x0000000107c5a4cf libmonosgen-2.0.dylib`mono_jit_exec(domain=, assembly=, argc=1, argv=0x00007fd0b7660560) at driver.g.c:1274 [opt]
    frame #36: 0x0000000107c5d688 libmonosgen-2.0.dylib`mono_main [inlined] main_thread_handler at driver.g.c:1406:3 [opt]
    frame #37: 0x0000000107c5d653 libmonosgen-2.0.dylib`mono_main(argc=3, argv=) at driver.g.c:2663 [opt]
    frame #38: 0x00000001048b4e25 VisualStudio`main + 1781
    frame #39: 0x00007fff5c5d53d5 libdyld.dylib`start + 1
    frame #40: 0x00007fff5c5d53d5 libdyld.dylib`start + 1

So I think the solution is to mask SIGCHLD in "B" (and maybe in "A", too?) so that it does not interrupt crash reporting once it has started.

@marek-safar marek-safar added this to the 2019-08 (6.6.xx) milestone Sep 24, 2019
monojenkins added a commit that referenced this issue Sep 26, 2019
#17078)

[2019-08] [crashing] Remove Mono signal handlers when starting to handle a crash

Fixes #16803

Backport of #17002.

/cc @marek-safar @alexischr
lambdageek pushed a commit to lambdageek/mono that referenced this issue Oct 1, 2019
jonpryor added a commit to xamarin/xamarin-android that referenced this issue Oct 11, 2019
jonpryor added a commit to jonpryor/xamarin-android that referenced this issue Oct 16, 2019
jonpryor added a commit to xamarin/xamarin-android that referenced this issue Oct 16, 2019
jonpryor added a commit to xamarin/xamarin-android that referenced this issue Oct 28, 2019
jonpryor pushed a commit to xamarin/xamarin-android that referenced this issue Dec 3, 2019
Changes: mono/api-snapshot@fc50bc4...45a61d9

        $ git diff --shortstat fc50bc4f...45a61d93
         22 files changed, 775 insertions(+), 474 deletions(-)

Changes: dotnet/cecil@a6c8f5e...a6a7f5c

        $ git diff --shortstat a6c8f5e1...a6a7f5c0
         55 files changed, 818 insertions(+), 530 deletions(-)

Changes: mono/corefx@1f87de3...49f1c45

        $ git diff --shortstat e4f7102b...49f1c453
         38 files changed, 1171 insertions(+), 419 deletions(-)

Changes: dotnet/linker@ebe2a1f...e8d054b

        $ git diff --shortstat ebe2a1f4...e8d054bf
         137 files changed, 5360 insertions(+), 1781 deletions(-)

Changes: mono/mono@8946e49...18920a8

        $ git diff --shortstat 8946e49a...18920a83
         1811 files changed, 47240 insertions(+), 48331 deletions(-)

Changes: xamarin/xamarin-android-api-compatibility@a61271e...50a3c52

        $ git diff --shortstat a61271e0...50a3c52d
         1 file changed, 2 insertions(+), 791 deletions(-)

Fixes: #3619

Context: https://dev.azure.com/devdiv/DevDiv/_workitems/edit/1005448
Context: https://devdiv.visualstudio.com/DefaultCollection/DevDiv/_workitems/edit/967582
Context: https://github.com/dotnet/coreclr/issues/26370
Context: https://github.com/dotnet/coreclr/issues/26479
Context: https://github.com/dotnet/corefx/issues/40455
Context: https://github.com/dotnet/corefx/issues/40578
Context: mono/mono#7377
Context: mono/mono#12421
Context: mono/mono#12586
Context: mono/mono#14080
Context: mono/mono#14725
Context: mono/mono#14772
Context: mono/mono#15261
Context: mono/mono#15262
Context: mono/mono#15263
Context: mono/mono#15307
Context: mono/mono#15308
Context: mono/mono#15310
Context: mono/mono#15646
Context: mono/mono#15687
Context: mono/mono#15805
Context: mono/mono#15992
Context: mono/mono#15994
Context: mono/mono#15999
Context: mono/mono#16032
Context: mono/mono#16034
Context: mono/mono#16046
Context: mono/mono#16192
Context: mono/mono#16308
Context: mono/mono#16310
Context: mono/mono#16369
Context: mono/mono#16380
Context: mono/mono#16381
Context: mono/mono#16395
Context: mono/mono#16411
Context: mono/mono#16415
Context: mono/mono#16486
Context: mono/mono#16570
Context: mono/mono#16605
Context: mono/mono#16616
Context: mono/mono#16689
Context: mono/mono#16701
Context: mono/mono#16712
Context: mono/mono#16742
Context: mono/mono#16759
Context: mono/mono#16803
Context: mono/mono#16808
Context: mono/mono#16824
Context: mono/mono#16876
Context: mono/mono#16879
Context: mono/mono#16918
Context: mono/mono#16943
Context: mono/mono#16950
Context: mono/mono#16974
Context: mono/mono#17004
Context: mono/mono#17017
Context: mono/mono#17038
Context: mono/mono#17040
Context: mono/mono#17083
Context: mono/mono#17084
Context: mono/mono#17133
Context: mono/mono#17139
Context: mono/mono#17151
Context: mono/mono#17180
Context: mono/mono#17278
Context: mono/mono#17549
Context: mono/mono#17569
Context: mono/mono#17665
Context: mono/mono#17687
Context: mono/mono#17737
Context: mono/mono#17790
Context: mono/mono#17924
Context: mono/mono#17931
Context: https://github.com/mono/mono/issues/26758
Context: https://github.com/mono/mono/issues/37913
Context: xamarin/xamarin-macios#7005
jonpryor pushed a commit to xamarin/xamarin-android that referenced this issue Dec 3, 2019
Changes: mono/api-snapshot@fc50bc4...45a61d9

        $ git diff --shortstat fc50bc4f...45a61d93
         22 files changed, 775 insertions(+), 474 deletions(-)

Changes: dotnet/cecil@a6c8f5e...a6a7f5c

        $ git diff --shortstat a6c8f5e1...a6a7f5c0
         55 files changed, 818 insertions(+), 530 deletions(-)

Changes: mono/corefx@1f87de3...49f1c45

        $ git diff --shortstat e4f7102b...49f1c453
         38 files changed, 1171 insertions(+), 419 deletions(-)

Changes: dotnet/linker@ebe2a1f...e8d054b

        $ git diff --shortstat ebe2a1f4...e8d054bf
         137 files changed, 5360 insertions(+), 1781 deletions(-)

Changes: mono/mono@8946e49...18920a8

        $ git diff --shortstat 8946e49a...18920a83
         1811 files changed, 47240 insertions(+), 48331 deletions(-)

Changes: xamarin/xamarin-android-api-compatibility@a61271e...50a3c52

        $ git diff --shortstat a61271e0...50a3c52d
         1 file changed, 2 insertions(+), 791 deletions(-)

Fixes: #3619

Context: https://dev.azure.com/devdiv/DevDiv/_workitems/edit/1005448
Context: https://devdiv.visualstudio.com/DefaultCollection/DevDiv/_workitems/edit/967582
Context: https://github.com/dotnet/coreclr/issues/26370
Context: https://github.com/dotnet/coreclr/issues/26479
Context: https://github.com/dotnet/corefx/issues/40455
Context: https://github.com/dotnet/corefx/issues/40578
Context: mono/mono#7377
Context: mono/mono#12421
Context: mono/mono#12586
Context: mono/mono#14080
Context: mono/mono#14725
Context: mono/mono#14772
Context: mono/mono#15261
Context: mono/mono#15262
Context: mono/mono#15263
Context: mono/mono#15307
Context: mono/mono#15308
Context: mono/mono#15310
Context: mono/mono#15646
Context: mono/mono#15687
Context: mono/mono#15805
Context: mono/mono#15992
Context: mono/mono#15994
Context: mono/mono#15999
Context: mono/mono#16032
Context: mono/mono#16034
Context: mono/mono#16046
Context: mono/mono#16192
Context: mono/mono#16308
Context: mono/mono#16310
Context: mono/mono#16369
Context: mono/mono#16380
Context: mono/mono#16381
Context: mono/mono#16395
Context: mono/mono#16411
Context: mono/mono#16415
Context: mono/mono#16486
Context: mono/mono#16570
Context: mono/mono#16605
Context: mono/mono#16616
Context: mono/mono#16689
Context: mono/mono#16701
Context: mono/mono#16712
Context: mono/mono#16742
Context: mono/mono#16759
Context: mono/mono#16803
Context: mono/mono#16808
Context: mono/mono#16824
Context: mono/mono#16876
Context: mono/mono#16879
Context: mono/mono#16918
Context: mono/mono#16943
Context: mono/mono#16950
Context: mono/mono#16974
Context: mono/mono#17004
Context: mono/mono#17017
Context: mono/mono#17038
Context: mono/mono#17040
Context: mono/mono#17083
Context: mono/mono#17084
Context: mono/mono#17133
Context: mono/mono#17139
Context: mono/mono#17151
Context: mono/mono#17180
Context: mono/mono#17278
Context: mono/mono#17549
Context: mono/mono#17569
Context: mono/mono#17665
Context: mono/mono#17687
Context: mono/mono#17737
Context: mono/mono#17790
Context: mono/mono#17924
Context: mono/mono#17931
Context: https://github.com/mono/mono/issues/26758
Context: https://github.com/mono/mono/issues/37913
Context: xamarin/xamarin-macios#7005
@Therzok Therzok added this to Needs triage in VS4M Tracker via automation Dec 23, 2019
@Therzok Therzok moved this from Needs triage to Closed in VS4M Tracker Dec 23, 2019
ManickaP pushed a commit to ManickaP/runtime that referenced this issue Jan 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
VS4M Tracker
  
Closed
Development

Successfully merging a pull request may close this issue.

4 participants