-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% CPU with multithread real time on StaticSpinMutex::LockSlow #614
Comments
I do not recognize this as a know bug. |
I learned something else since I posted : both thread have a CPU affinity set to the same CPU, and only one. So I think real time kernel and CPU affinity is mandatory to reproduce. I'll try to come up with a small reproducer. But in the mean time, looking at the code, do you think the deadlock on the StaticSpinMutex is plausible? |
On Wed, Dec 2, 2015 at 9:04 AM, Leonard notifications@github.com wrote:
|
I think sched_yield just tell OS scheduler that it can look for a new thread with more priority to run. Since the thread trying to lock StaticSpinMutex is the one with the most priority, the scheduler gives the CPU to the same thread again. |
This issue is open for a while and maybe nobody except me cares, but here some advice: |
there have been patches recently adding (more) RTOS support. |
I suspect this very negatively affect us even on plain linux with normal priorities. Need a better mutex. |
We have the same problem here. We cannot use ASan on a real-time application because of this. After a while, the application deadlocks (a high priority thread spinning for the lock and a lower priority thread holding the lock while preempted). Indeed, two threads eventually get into malloc at the same time and need a new batch (AllocateBatch), which is protected by a SpinMutex. The problem is that sched_yield will not let the lower priority thread run, and the OS is not aware of the user-space mutex waiting. Thus it cannot do anything about the priority inversion. I see two possible solutions. One is to define SpinMutex to BlockingMutex for Linux, because it will then use the futexes which were built for that. Futexes stay in user-space and are very fast when uncontented and go to kernel space only when contented, allowing other threads to run. The other solution is to have a second version of the library being compiled and offered as option, for real-time applications on Linux (where SpinMutex is defined to BlockingMutex). |
Hello.
At work, we build our software with g++ 4.9.1 and AddressSanitizer and C++11. We run this software on a PREEMPT RT kernel (currently 3.18.21-rt19). Our software has several thread scheduled by FIFO and sometimes the thread with most priority use almost 100% CPU. I suspect this thread trying to lock the StaticSpinMutex in infinite loop while it's already locked by another thread with less real time priority.
Does it seems plausible? Do you know any issue that could be related, maybe already closed?
I put both end of backtraces so you can look at it.
Realtime thread scheduled by FIFO with priority 35 :
#0 0xb15abb0a in __memset_sse2_rep () from /lib/libc.so.6
#1 0xb1cff49d in __asan::PoisonShadow(unsigned long, unsigned long, unsigned char) () from /lib/libasan.so.1
#2 0xb1cc99c7 in __sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>::Alloca---Type to continue, or q to quit---
teBatch(_sanitizer::AllocatorStats, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >_, unsigned long) () from /lib/libasan.so.1
#3 0xb1cc9e0f in __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >::Refill(__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>, unsigned long) () from /lib/libasan.so.1
#4 0xb1cc8c05 in __asan::Allocate(unsigned long, unsigned long, __sanitizer::StackTrace, __asan::AllocType, bool) () from /lib/libasan.so.1
#5 0xb1cfeb70 in operator new(unsigned int) () from /lib/libasan.so.1
Realtime thread scheduled by FIFO with priority 40 :
#0 0xb21b49e4 in __kernel_vsyscall ()
#1 0xb1569837 in syscall () from /lib/libc.so.6
#2 0xb1d0afb9 in __sanitizer::internal_sched_yield() () from /lib/libasan.so.1
#3 0xb1cc97c0 in __sanitizer::StaticSpinMutex::LockSlow() () from /lib/libasan.so.1
#4 0xb1cc9908 in __sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>::AllocateBatch(_sanitizer::AllocatorStats, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >_, unsigned long) () from /lib/libasan.so.1
#5 0xb1cc9e0f in __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >::Refill(__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>, unsigned long) () from /lib/libasan.so.1
#6 0xb1cc8c05 in __asan::Allocate(unsigned long, unsigned long, __sanitizer::StackTrace, __asan::AllocType, bool) () from /lib/libasan.so.1
#7 0xb1cfeb70 in operator new(unsigned int) () from /lib/libasan.so.1
The text was updated successfully, but these errors were encountered: