Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible GC hang while building msbuild with mono/2018-08 and mono/master #10863

Closed
radical opened this issue Sep 27, 2018 · 9 comments

Comments

@radical
Copy link
Member

@radical radical commented Sep 27, 2018

Mono version: Mono JIT compiler version 5.18.0.103 (2018-08/56bed71f118 Tue Sep 18 07:06:42 EDT 2018)

Steps:

  1. This seemed to help with reproducing the issue: nuget locals all -Clear, but not always.
  2. Build msbuild from xplat-master branch (HEAD: 17218b611a97315eaef9fa4a8add5ef049a2ff8e) with
    $ make
  3. This gets stuck at the bootstrapped build while restoring packages. This is not reproducible every time though.

I attached to the process with lldb and ran monobt all and got https://gist.github.com/radical/b8130fcb125714737d5515f53b4c5031 .
IIUC, some of the threads are waiting *Threading.Wait*, and some seem to be waiting for the gc to resume them. Thread #19 seems to be
the one which causes the gc to kick in:

  thread #19
    frame #0: 0x00007fff7e735246 libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x0000000108f1c678 mono`mono_os_sem_timedwait + 344
    frame #2: 0x0000000108f1bec5 mono`mono_threads_wait_pending_operations + 101
    frame #3: 0x0000000108ebff93 mono`sgen_client_stop_world + 563
    frame #4: 0x0000000108ed32d6 mono`sgen_stop_world + 70
    frame #5: 0x0000000108ed005d mono`sgen_perform_collection + 157
    frame #6: 0x0000000108ec5a40 mono`sgen_alloc_obj_nolock + 928
    frame #7: 0x0000000108ec2697 mono`mono_gc_alloc_vector + 119
    frame #8: 0x109176fd9 (wrapper managed-to-native) object:__icall_wrapper_mono_gc_alloc_vector (intptr,intptr,intptr) [{0x7fcbfd002958} + 0x89]  (0x109176f50 0x109177055) [0x7fcbfbd0dd60 - MSBuild.dll]
    frame #9: 0x000000010b14b6ad mscorlib.dll.dylib`System_Collections_Generic_Dictionary_2_TKey_REF_TValue_REF_Initialize_int + 157
    frame #10: 0x000000010b1496f9 mscorlib.dll.dylib`System_Collections_Generic_Dictionary_2_TKey_REF_TValue_REF__ctor_int_System_Collections_Generic_IEqualityComparer_1_TKey_REF + 73
    frame #11: 0x10e7e7bc3 Microsoft.Build.Internal.CommunicationsUtilities:GetEnvironmentVariables () [{0x7fcbfc8f59a0} + 0xb3]  (0x10e7e7b10 0x10e7e8192) [0x7fcbfbd0dd60 - MSBuild.dll]
    frame #12: 0x10ed05473 Microsoft.Build.BackEnd.RequestBuilder:SetEnvironmentVariableBlock (System.Collections.Generic.IDictionary`2<string, string>) [{0x7fcbfd162870} + 0x43]  (0x10ed05430 0x10ed054b9) [0x7fcbfbd0dd60 - MSBuild.dll]
    frame #13: 0x000000010affc651 mscorlib.dll.dylib`System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_InvokeMoveNext_object + 177
    frame #14: 0x000000010aeff4eb mscorlib.dll.dylib`System_Threading_ExecutionContext_RunInternal_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool + 427
    frame #15: 0x000000010aeff333 mscorlib.dll.dylib`System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool + 67
    frame #16: 0x000000010affc4fa mscorlib.dll.dylib`System_Runtime_CompilerServices_AsyncMethodBuilderCore_MoveNextRunner_Run + 186
    frame #17: 0x000000010af3d1b9 mscorlib.dll.dylib`System_Threading_Tasks_TaskSchedulerAwaitTaskContinuation__c__Runb__2_0_object + 89
    frame #18: 0x000000010af3346c mscorlib.dll.dylib`System_Threading_Tasks_Task_InnerInvoke + 156
    frame #19: 0x000000010af33188 mscorlib.dll.dylib`System_Threading_Tasks_Task_Execute + 56
    frame #20: 0x000000010af333bc mscorlib.dll.dylib`System_Threading_Tasks_Task_ExecutionContextCallback_object + 92
    frame #21: 0x000000010aeff4eb mscorlib.dll.dylib`System_Threading_ExecutionContext_RunInternal_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool + 427
    frame #22: 0x000000010aeff333 mscorlib.dll.dylib`System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool + 67
    frame #23: 0x000000010af39a37 mscorlib.dll.dylib`System_Threading_Tasks_Task_ExecuteWithThreadLocal_System_Threading_Tasks_Task_ + 247
    frame #24: 0x000000010af332f9 mscorlib.dll.dylib`System_Threading_Tasks_Task_ExecuteEntry_bool + 233
    frame #25: 0x000000010af42784 mscorlib.dll.dylib`System_Threading_Tasks_TaskScheduler_TryExecuteTask_System_Threading_Tasks_Task + 68
    frame #26: 0x10ed16d43 Microsoft.Build.BackEnd.RequestBuilder/DedicatedThreadsTaskScheduler:<InjectThread>b__6_0 () [{0x7fcbfd162d98} + 0xc3]  (0x10ed16c80 0x10ed16e23) [0x7fcbfbd0dd60 - MSBuild.dll]
    frame #27: 0x000000010af01f2b mscorlib.dll.dylib`System_Threading_ThreadHelper_ThreadStart_Context_object + 171
    frame #28: 0x000000010aeff4eb mscorlib.dll.dylib`System_Threading_ExecutionContext_RunInternal_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool + 427
    frame #29: 0x000000010aeff333 mscorlib.dll.dylib`System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object_bool + 67
    frame #30: 0x000000010aeff2a8 mscorlib.dll.dylib`System_Threading_ExecutionContext_Run_System_Threading_ExecutionContext_System_Threading_ContextCallback_object + 104
    frame #31: 0x000000010af020b3 mscorlib.dll.dylib`System_Threading_ThreadHelper_ThreadStart + 67
    frame #32: 0x1093701a9 (wrapper runtime-invoke) object:runtime_invoke_void__this__ (object,intptr,intptr,intptr) [{0x7fcbfd049590} + 0xa9]  (0x109370100 0x109370278) [0x7fcbfbd0dd60 - MSBuild.dll]
    frame #33: 0x0000000108c3c203 mono`mono_jit_runtime_invoke + 1619
    frame #34: 0x0000000108e39254 mono`do_runtime_invoke + 84
    frame #35: 0x0000000108e3b01b mono`mono_runtime_delegate_try_invoke + 203
    frame #36: 0x0000000108e5ee6e mono`start_wrapper + 750
    frame #37: 0x00007fff7e906661 libsystem_pthread.dylib`_pthread_body + 340
    frame #38: 0x00007fff7e90650d libsystem_pthread.dylib`_pthread_start + 377
    frame #39: 0x00007fff7e905bf9 libsystem_pthread.dylib`thread_start + 13

The PR #10339 for building msbuild xplat-master with mono/master is failing on the CI. See the logs for OS X Extended build (https://jenkins.mono-project.com/job/test-mono-pull-request-mac-sdk/105/).

@lambdageek

This comment has been minimized.

Copy link
Member

@lambdageek lambdageek commented Sep 27, 2018

Might be hybrid suspend.

Could you try with MONO_THREADS_SUSPEND=preemptive a few times and see if that seems to make it work?

@lambdageek

This comment has been minimized.

Copy link
Member

@lambdageek lambdageek commented Sep 27, 2018

oh, one useful piece of information: try setting MONO_SLEEP_ABORT_LIMIT=5000 (or some other high enough number - it's a time in milliseconds). That will turn GC suspend hangs over 5 seconds into fatal errors and it will print out a scorecard of thread statuses - that would be really helpful to me to try to figure out why it's hanging.

@radical

This comment has been minimized.

Copy link
Member Author

@radical radical commented Sep 27, 2018

@lambdageek MONO_SLEEP_ABORT_LIMIT=n + MONO_THREADS_SUSPEND=preemptive?

@lambdageek

This comment has been minimized.

Copy link
Member

@lambdageek lambdageek commented Sep 27, 2018

@radical no, separately. if it works with preemptive, then we know it's a hybrid suspend bug. in that case, a sleep abort limit will help to understand what the threads are doing

@radical

This comment has been minimized.

Copy link
Member Author

@radical radical commented Sep 27, 2018

With just MONO_SLEEP_ABORT_LIMIT=5000 I get https://gist.github.com/radical/b2273f9a59c06501169fabf13f6c238a .

@lambdageek

This comment has been minimized.

Copy link
Member

@lambdageek lambdageek commented Sep 27, 2018

Aha, that helps. The finalizer thread is in blocking state which is suspicious. In your original backtrace it looks like it's trying to take a GC lock to change its thread state.

So I think we raced between the finalizer trying to set itself up to be ignored and the GC trying to suspend it. So definitely a hybrid/coop bug.

Thanks!

@marek-safar

This comment has been minimized.

Copy link
Member

@marek-safar marek-safar commented Sep 28, 2018

@lambdageek thanks for helping to track that down. Is my understanding correct that this will affect 2018-08 as well?

@marek-safar marek-safar added this to the 2018-08 (5.18.xx) milestone Sep 28, 2018
@lambdageek

This comment has been minimized.

Copy link
Member

@lambdageek lambdageek commented Sep 28, 2018

@marek-safar Correct.

@lambdageek

This comment has been minimized.

Copy link
Member

@lambdageek lambdageek commented Oct 1, 2018

Good news (in that it's a simpler bug to fix): it's not the finalizer (the fact that it's in BLOCKING is misleading because the finalizer sets a flag so that the GC will skip it while the finalizer is sleeping).

The actual problem is this thread:

--thread 0x7f9cc5037200 id 0x700012265000 [0x6503] (Thread Pool Worker) state 105

It's calling mono_w32handle_lock_handles which calls nanosleep in a loop without any thread state transitions. Fix coming shortly.

lambdageek added a commit to lambdageek/mono that referenced this issue Oct 1, 2018
If another thread locked at least one of the other handles, and there is a
cooperative GC STW happening, this function will loop forever.

Add GC Safe transitions around the call to sleep to give this thread a chance
to suspend.

Fixes mono#10863
monojenkins added a commit to monojenkins/mono that referenced this issue Oct 1, 2018
If another thread locked at least one of the other handles, and there is a
cooperative GC STW happening, this function will loop forever.

Add GC Safe transitions around the call to sleep to give this thread a chance
to suspend.

Fixes mono#10863
monojenkins added a commit to monojenkins/mono that referenced this issue Oct 2, 2018
If another thread locked at least one of the other handles, and there is a
cooperative GC STW happening, this function will loop forever.

Add GC Safe transitions around the call to sleep to give this thread a chance
to suspend.

Fixes mono#10863
lambdageek added a commit that referenced this issue Oct 2, 2018
If another thread locked at least one of the other handles, and there is a
cooperative GC STW happening, this function will loop forever.

Add GC Safe transitions around the call to sleep to give this thread a chance
to suspend.

Fixes #10863
akoeplinger added a commit that referenced this issue Oct 2, 2018
…#10911)

If another thread locked at least one of the other handles, and there is a
cooperative GC STW happening, this function will loop forever.

Add GC Safe transitions around the call to sleep to give this thread a chance
to suspend.

Fixes #10863
akoeplinger added a commit that referenced this issue Oct 2, 2018
…#10910)

If another thread locked at least one of the other handles, and there is a
cooperative GC STW happening, this function will loop forever.

Add GC Safe transitions around the call to sleep to give this thread a chance
to suspend.

Fixes #10863
jonpryor added a commit to xamarin/xamarin-android that referenced this issue Dec 6, 2018
Bumps to mono/api-snapshot@b99fc87.
Bumps to mono/bockbuild@5af573e.
Bumps to mono/boringssl@41221b4.
Bumps to mono/corefx@23d0b58.
Bumps to mono/corert@af496fc.
Bumps to mono/linker@7af03ce.
Bumps to mono/NUnitLite@00e259a.
Bumps to mono/reference-assemblies@9325826.
Bumps to mono/roslyn-binaries@249709f.
Bumps to mono/xunit-binaries@bb58347.

	$ git diff --shortstat b63e5378..23f2024a      # mono 
	 1630 files changed, 50926 insertions(+), 92212 deletions(-)

Fixes: mono/mono#6352
Fixes: mono/mono#6947
Fixes: mono/mono#6992
Fixes: mono/mono#7615
Fixes: mono/mono#8340
Fixes: mono/mono#8407
Fixes: mono/mono#8575
Fixes: mono/mono#8627
Fixes: mono/mono#8707
Fixes: mono/mono#8766
Fixes: mono/mono#8848
Fixes: mono/mono#8866
Fixes: mono/mono#8935
Fixes: mono/mono#9010
Fixes: mono/mono#9023
Fixes: mono/mono#9031
Fixes: mono/mono#9033
Fixes: mono/mono#9106
Fixes: mono/mono#9109
Fixes: mono/mono#9155
Fixes: mono/mono#9179
Fixes: mono/mono#9232
Fixes: mono/mono#9234
Fixes: mono/mono#9262
Fixes: mono/mono#9277
Fixes: mono/mono#9292
Fixes: mono/mono#9318
Fixes: mono/mono#9318
Fixes: mono/mono#9332
Fixes: mono/mono#9407
Fixes: mono/mono#9421
Fixes: mono/mono#9505
Fixes: mono/mono#9542
Fixes: mono/mono#9581
Fixes: mono/mono#9623
Fixes: mono/mono#9684
Fixes: mono/mono#9750
Fixes: mono/mono#9753
Fixes: mono/mono#9772
Fixes: mono/mono#9839
Fixes: mono/mono#9869
Fixes: mono/mono#9921
Fixes: mono/mono#9943
Fixes: mono/mono#9947
Fixes: mono/mono#9973
Fixes: mono/mono#9996
Fixes: mono/mono#10000
Fixes: mono/mono#10031
Fixes: mono/mono#10035
Fixes: mono/mono#10227
Fixes: mono/mono#10243
Fixes: mono/mono#10303
Fixes: mono/mono#10448
Fixes: mono/mono#10483
Fixes: mono/mono#10488
Fixes: mono/mono#10863
Fixes: mono/mono#11123
Fixes: mono/mono#11138
Fixes? mono/mono#11146
Fixes: mono/mono#11202
Fixes: mono/mono#11378
Fixes: mono/mono#11479
Fixes: mono/mono#11613
Fixes: #1951
Fixes: xamarin/xamarin-macios#4347
Fixes: xamarin/xamarin-macios#4617
Fixes: xamarin/xamarin-macios#4984
jonpryor added a commit to xamarin/xamarin-android that referenced this issue Apr 24, 2019
Bumps to mono/api-snapshot@ae01378
Bumps to mono/reference-assemblies@e5173a5
Bumps to mono/bockbuild@d30329d
Bumps to mono/boringssl@3d87996
Bumps to mono/corefx@72f7d76
Bumps to mono/corert@1b7d4a1
Bumps to mono/helix-binaries@7e893ea
Bumps to mono/illinker-test-assets@f21ff68
Bumps to mono/linker@13d864e
Bumps to mono/llvm@1aaaaa5 [mono]
Bumps to mono/llvm@2c2cffe [xamarin-android]
Bumps to mono/NUnitLite@0029561
Bumps to mono/roslyn-binaries@0bbc9b4
Bumps to mono/xunit-binaries@8f6e62e

	$ git diff --shortstat 886c4901..e66c7667      # mono
        3597 files changed, 350850 insertions(+), 91128 deletions(-)
	$ git diff --shortstat 349752c464c5fc93b32e7d45825f2890c85c8b7d..2c2cffedf01e0fe266b9aaad2c2563e05b750ff4
	 240 files changed, 18562 insertions(+), 6581 deletions(-)

Context: dotnet/coreclr#22046

Fixes: CVE 2018-8292 on macOS
Fixes: http://work.devdiv.io/737323
Fixes: dotnet/corefx#33965
Fixes: dotnet/standard#642
Fixes: mono/mono#6997
Fixes: mono/mono#7326
Fixes: mono/mono#7517
Fixes: mono/mono#7750
Fixes: mono/mono#7859
Fixes: mono/mono#8360
Fixes: mono/mono#8460
Fixes: mono/mono#8766
Fixes: mono/mono#8922
Fixes: mono/mono#9418
Fixes: mono/mono#9507
Fixes: mono/mono#9951
Fixes: mono/mono#10024
Fixes: mono/mono#10030
Fixes: mono/mono#10038
Fixes: mono/mono#10448
Fixes: mono/mono#10735
Fixes: mono/mono#10735
Fixes: mono/mono#10737
Fixes: mono/mono#10743
Fixes: mono/mono#10834
Fixes: mono/mono#10837
Fixes: mono/mono#10838
Fixes: mono/mono#10863
Fixes: mono/mono#10945
Fixes: mono/mono#11020
Fixes: mono/mono#11021
Fixes: mono/mono#11021
Fixes: mono/mono#11049
Fixes: mono/mono#11091
Fixes: mono/mono#11095
Fixes: mono/mono#11123
Fixes: mono/mono#11138
Fixes: mono/mono#11146
Fixes: mono/mono#11202
Fixes: mono/mono#11214
Fixes: mono/mono#11317
Fixes: mono/mono#11326
Fixes: mono/mono#11378
Fixes: mono/mono#11385
Fixes: mono/mono#11478
Fixes: mono/mono#11479
Fixes: mono/mono#11488
Fixes: mono/mono#11489
Fixes: mono/mono#11527
Fixes: mono/mono#11529
Fixes: mono/mono#11596
Fixes: mono/mono#11603
Fixes: mono/mono#11613
Fixes: mono/mono#11623
Fixes: mono/mono#11663
Fixes: mono/mono#11681
Fixes: mono/mono#11684
Fixes: mono/mono#11693
Fixes: mono/mono#11697
Fixes: mono/mono#11779
Fixes: mono/mono#11809
Fixes: mono/mono#11858
Fixes: mono/mono#11895
Fixes: mono/mono#11898
Fixes: mono/mono#11898
Fixes: mono/mono#11965
Fixes: mono/mono#12182
Fixes: mono/mono#12193
Fixes: mono/mono#12218
Fixes: mono/mono#12235
Fixes: mono/mono#12263
Fixes: mono/mono#12307
Fixes: mono/mono#12331
Fixes: mono/mono#12362
Fixes: mono/mono#12374
Fixes: mono/mono#12402
Fixes: mono/mono#12421
Fixes: mono/mono#12461
Fixes: mono/mono#12479
Fixes: mono/mono#12479
Fixes: mono/mono#12552
Fixes: mono/mono#12603
Fixes: mono/mono#12747
Fixes: mono/mono#12831
Fixes: mono/mono#12843
Fixes: mono/mono#12881
Fixes: mono/mono#13030
Fixes: mono/mono#13284
Fixes: mono/mono#13297
Fixes: mono/mono#13455
Fixes: mono/mono#13460
Fixes: mono/mono#13478
Fixes: mono/mono#13479
Fixes: mono/mono#13522
Fixes: mono/mono#13607
Fixes: mono/mono#13610
Fixes: mono/mono#13610
Fixes: mono/mono#13639
Fixes: mono/mono#13672
Fixes: mono/mono#13834
Fixes: mono/mono#13878
Fixes: mono/mono#6352
Fixes: mono/monodevelop#6898
Fixes: xamarin/maccore#1069
Fixes: xamarin/maccore#1407
Fixes: xamarin/maccore#604
Fixes: xamarin/xamarin-macios#4984
Fixes: xamarin/xamarin-macios#5289
Fixes: xamarin/xamarin-macios#5363
Fixes: xamarin/xamarin-macios#5381
Fixes: https://issuetracker.unity3d.com/issues/editor-crashes-with-g-logv-when-entering-play-mode-with-active-flowcanvas-script
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.