Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport "Don't check if the memory is in core (#64)" to v1.3-stable #155

Closed
k15tfu opened this issue Jan 28, 2020 · 3 comments
Closed

Backport "Don't check if the memory is in core (#64)" to v1.3-stable #155

k15tfu opened this issue Jan 28, 2020 · 3 comments

Comments

@k15tfu
Copy link

k15tfu commented Jan 28, 2020

Hi!

I set up continuous integration on different Linux distros and it turned out that our tests don't work under .NET Core 1.1 on Fedora 30/31/32, but work on CentOS/ Debian/ Ubuntu/ Fedora 29 etc.

Sometimes it fails due to abort() from here (if interested):

Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50        return ret;
[Current thread is 1 (Thread 0x7fa75c48e700 (LWP 16915))]
 compat-libicu60-60.2-2.fc29.x86_64 libgcc-9.2.1-1.fc32.x86_64 libstdc++-9.2.1-1.fc32.x86_64 libunwind-1.3.1-3.fc31.x86_64 libuuid-2.35-0.5.fc32.x86_64 sssd-client-2.2.2-1.fc32.x86_64
(gdb) bt
Missing separate debuginfos, use: dnf debuginfo-install#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fa76197e899 in __GI_abort () at abort.c:79
#2  0x00007fa760f11f4b in PROCEndProcess (hProcess=<optimized out>, uExitCode=2148734214, bTerminateUnconditionally=1) at /root/coreclr/src/pal/src/thread/process.cpp:1385
#3  0x00007fa760b8ea3b in SafeExitProcess (exitCode=2148734214, fAbort=1, sca=SCA_ExitProcessWhenShutdownComplete) at /root/coreclr/src/vm/eepolicy.cpp:579
#4  0x00007fa760b8ff88 in EEPolicy::HandleFatalError (exitCode=2148734214, address=140356859828821, pszMessage=0x0, pExceptionInfo=0x7fa75c48bad0) at /root/coreclr/src/vm/eepolicy.cpp:1506
#5  0x00007fa760c59a66 in LazyMachState::unwindLazyState (baseState=<optimized out>, unwoundState=0x7fa75c48c1f0, threadId=<optimized out>, funCallDepth=<optimized out>, hostCallPreference=AllowHostCalls)
    at /root/coreclr/src/vm/amd64/gmsamd64.cpp:69
#6  0x00007fa760ac5d5e in HelperMethodFrame::InsureInit (this=0x7ffeab5528f0, initialInit=<optimized out>, unwindState=<optimized out>, hostCallPreference=AllowHostCalls) at /root/coreclr/src/vm/frames.cpp:1890
#7  0x00007fa760ac5b62 in HelperMethodFrame::GetFunction (this=0x7ffeab5528f0) at /root/coreclr/src/vm/frames.cpp:1808
#8  0x00007fa760b2332b in StackFrameIterator::ProcessCurrentFrame (this=0x7fa75c48c3d0) at /root/coreclr/src/vm/stackwalk.cpp:2993
#9  0x00007fa760b24d4d in StackFrameIterator::NextRaw (this=0x7fa75c48c3d0) at /root/coreclr/src/vm/stackwalk.cpp:2743
#10 0x00007fa760b22ab8 in StackFrameIterator::Next (this=<optimized out>) at /root/coreclr/src/vm/stackwalk.cpp:1615
#11 Thread::StackWalkFramesEx (this=0x121a600, pRD=<optimized out>, pCallback=0x7fa760ba1020 <GcStackCrawlCallBack(CrawlFrame*, void*)>, pData=0x7fa75c48d890, flags=34048, pStartFrame=0x0) at /root/coreclr/src/vm/stackwalk.cpp:966
#12 0x00007fa760b22ead in Thread::StackWalkFrames (this=0x121a600, pCallback=0x7fa760ba1020 <GcStackCrawlCallBack(CrawlFrame*, void*)>, pData=0x7fa75c48d890, flags=34048, pStartFrame=0x0) at /root/coreclr/src/vm/stackwalk.cpp:1043
#13 0x00007fa760ba1975 in ScanStackRoots (fn=<optimized out>, sc=<optimized out>, pThread=<optimized out>) at /root/coreclr/src/vm/gcenv.ee.cpp:544
#14 GCToEEInterface::GcScanRoots (fn=<optimized out>, condemned=<optimized out>, max_gen=<optimized out>, sc=0x7fa75c48d940) at /root/coreclr/src/vm/gcenv.ee.cpp:573
#15 0x00007fa760d9f74a in WKS::gc_heap::mark_phase (condemned_gen_number=2, mark_only_p=0) at /root/coreclr/src/gc/gc.cpp:19490
#16 0x00007fa760d9cb74 in WKS::gc_heap::gc1 () at /root/coreclr/src/gc/gc.cpp:15233
#17 0x00007fa760da709d in WKS::gc_heap::garbage_collect (n=<optimized out>) at /root/coreclr/src/gc/gc.cpp:16751
#18 0x00007fa760d9979f in WKS::GCHeap::GarbageCollectGeneration (this=<optimized out>, gen=<optimized out>, reason=WKS::reason_induced) at /root/coreclr/src/gc/gc.cpp:35231
#19 0x00007fa760dc1b89 in WKS::GCHeap::GarbageCollectTry (generation=<optimized out>, mode=<optimized out>, this=<optimized out>, low_memory_p=<optimized out>) at /root/coreclr/src/gc/gc.cpp:34846
#20 WKS::GCHeap::GarbageCollect (this=<optimized out>, generation=<optimized out>, low_memory_p=<optimized out>, mode=<optimized out>) at /root/coreclr/src/gc/gc.cpp:34786
#21 0x00007fa760c396ca in ETW::GCLog::ForceGCForDiagnostics () at /root/coreclr/src/vm/eventtrace.cpp:1036
#22 0x00007fa760be13ab in ProfToEEInterfaceImpl::ForceGC (this=<optimized out>) at /root/coreclr/src/vm/proftoeeinterfaceimpl.cpp:4823
#23 0x00007fa75e83c948
#24 0x00007fa75e83eabe
#25 0x00007fa75e83e972
#26 0x00007fa75e83c561
#27 0x00007fa75e83b384
#28 0x00007fa761e86482 in start_thread (arg=<optimized out>) at pthread_create.c:477
#29 0x00007fa761a5a583 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

which means CoreCLR was unable to unwind the stack. It regularly happens on Fedora 30 (or later) which uses libunwind-1.3.1 as the default version, but not on others that use libunwind-1.2.1.

Trying to figure out what the problem is, I found that the issue has been fixed in v1.4-rc1 by @ShutterQuick Don't check if the memory is in core #64 (05d814b), after having the same issue: https://github.com/dotnet/coreclr/issues/15840.

For now it looks pretty clear: msync()/mincore() validation was added 9 years ago (28f33c8), but due to the bug in mincore() detection, msync() was used all the time till v1.3. Then, this bug was fixed in v1.3-stable by ot [PATCH] x86_64: fix mincore_validate (bc8698f), and starting from this moment access_mem was backed by mincore_validate, which fails if the address was swapped out.

I have checked that v1.3.1 with cherry-picked 05d814b works fine and solves my issue. @djwatson Is there anything I can help with to get it fixed in v1.3?

@djwatson
Copy link
Member

djwatson commented Mar 9, 2020

If you want to put up a pull request I am happy to merge it.

@djwatson
Copy link
Member

v1.4.0 has been release which contains this fix. Is this sufficient?

@k15tfu
Copy link
Author

k15tfu commented Mar 31, 2020

@djwatson Sorry for delay. Yes, I'll be still happy to port this to v1.3-stable because some distros will continue to use this version. Here is PR #167.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants