100% CPU with multithread real time on StaticSpinMutex::LockSlow #614

leonarf · 2015-10-20T09:07:35Z

Hello.
At work, we build our software with g++ 4.9.1 and AddressSanitizer and C++11. We run this software on a PREEMPT RT kernel (currently 3.18.21-rt19). Our software has several thread scheduled by FIFO and sometimes the thread with most priority use almost 100% CPU. I suspect this thread trying to lock the StaticSpinMutex in infinite loop while it's already locked by another thread with less real time priority.
Does it seems plausible? Do you know any issue that could be related, maybe already closed?

I put both end of backtraces so you can look at it.
Realtime thread scheduled by FIFO with priority 35 :
#0 0xb15abb0a in __memset_sse2_rep () from /lib/libc.so.6
#1 0xb1cff49d in __asan::PoisonShadow(unsigned long, unsigned long, unsigned char) () from /lib/libasan.so.1
#2 0xb1cc99c7 in __sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>::Alloca---Type to continue, or q to quit---

teBatch(_sanitizer::AllocatorStats, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >_, unsigned long) () from /lib/libasan.so.1
#3 0xb1cc9e0f in __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >::Refill(__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>, unsigned long) () from /lib/libasan.so.1
#4 0xb1cc8c05 in __asan::Allocate(unsigned long, unsigned long, __sanitizer::StackTrace, __asan::AllocType, bool) () from /lib/libasan.so.1
#5 0xb1cfeb70 in operator new(unsigned int) () from /lib/libasan.so.1

Realtime thread scheduled by FIFO with priority 40 :
#0 0xb21b49e4 in __kernel_vsyscall ()
#1 0xb1569837 in syscall () from /lib/libc.so.6
#2 0xb1d0afb9 in __sanitizer::internal_sched_yield() () from /lib/libasan.so.1
#3 0xb1cc97c0 in __sanitizer::StaticSpinMutex::LockSlow() () from /lib/libasan.so.1
#4 0xb1cc9908 in __sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>::AllocateBatch(_sanitizer::AllocatorStats, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >_, unsigned long) () from /lib/libasan.so.1
#5 0xb1cc9e0f in __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >::Refill(__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>, unsigned long) () from /lib/libasan.so.1
#6 0xb1cc8c05 in __asan::Allocate(unsigned long, unsigned long, __sanitizer::StackTrace, __asan::AllocType, bool) () from /lib/libasan.so.1
#7 0xb1cfeb70 in operator new(unsigned int) () from /lib/libasan.so.1

kcc · 2015-12-01T18:13:47Z

I do not recognize this as a know bug.
Please check if the fresh Clang's asan has it.
If so, please provide a reasonably small reproducer.
Is this at all possible to reproduce on a regular Linux?

leonarf · 2015-12-02T08:04:24Z

I learned something else since I posted : both thread have a CPU affinity set to the same CPU, and only one. So I think real time kernel and CPU affinity is mandatory to reproduce. I'll try to come up with a small reproducer. But in the mean time, looking at the code, do you think the deadlock on the StaticSpinMutex is plausible?

dvyukov · 2015-12-02T09:31:08Z

On Wed, Dec 2, 2015 at 9:04 AM, Leonard notifications@github.com wrote:

I learned something else since I posted : both thread have a CPU affinity
set to the same CPU, and only one. So I think real time kernel and CPU
affinity is mandatory to reproduce. I'll try to come up with a small
reproducer. But in the mean time, looking at the code, do you think the
deadlock on the StaticSpinMutex is plausible?

StaticSpinMutex is a spin mutex. It does not notify kernel about
waiting, it merely calls sched_yield. So I wonder if sched_yield is
effectively no-op on PREEMPT_RT kernel (a thread that did not exhaust its
time slot won't switch to another thread). If so, it would explain the
slowdown.

leonarf · 2015-12-02T10:33:39Z

I think sched_yield just tell OS scheduler that it can look for a new thread with more priority to run. Since the thread trying to lock StaticSpinMutex is the one with the most priority, the scheduler gives the CPU to the same thread again.

marchartung · 2017-10-10T14:43:01Z

This issue is open for a while and maybe nobody except me cares, but here some advice:
Don't use spinlocks, if you intend to map two threads to the same CPU or Hyperthread. The scheduler will likely priorities the thread which fails on (spin-)locking, since it utilizes completely the CPU (no waits for cache misses or expensive instructions) and during spinlocking it is not supposed to call sched_yield. This results in a massive slow down of the other thread, what could seem to be a deadlock.
Background:
I worked on a static scheduled program, which ordered events through spinlocks. In the cases where two threads were mapped to the same CPU (different Hyperthreads), the performance dropped and when they were mapped to the same hyperthread, they seem to deadlock.

morehouse · 2018-06-07T20:13:34Z

When the scheduler is set to SCHED_FIFO, sched_yield will not switch to a thread with a lower priority -- only to one of the same or higher priority.

@kcc, @dvyukov: Do we intend to support RTOS for ASan?

kcc · 2018-06-07T20:46:16Z

there have been patches recently adding (more) RTOS support.

dvyukov · 2018-06-08T07:42:59Z

I suspect this very negatively affect us even on plain linux with normal priorities. Need a better mutex.

michelrdagenais · 2020-05-12T15:09:10Z

I suspect this very negatively affect us even on plain linux with normal priorities. Need a better mutex.

We have the same problem here. We cannot use ASan on a real-time application because of this. After a while, the application deadlocks (a high priority thread spinning for the lock and a lower priority thread holding the lock while preempted). Indeed, two threads eventually get into malloc at the same time and need a new batch (AllocateBatch), which is protected by a SpinMutex. The problem is that sched_yield will not let the lower priority thread run, and the OS is not aware of the user-space mutex waiting. Thus it cannot do anything about the priority inversion.

I see two possible solutions. One is to define SpinMutex to BlockingMutex for Linux, because it will then use the futexes which were built for that. Futexes stay in user-space and are very fast when uncontented and go to kernel space only when contented, allowing other threads to run. The other solution is to have a second version of the library being compiled and offered as option, for real-time applications on Linux (where SpinMutex is defined to BlockingMutex).

mikea added Type-Defect ProjectAddressSanitizer labels Nov 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

100% CPU with multithread real time on StaticSpinMutex::LockSlow #614

100% CPU with multithread real time on StaticSpinMutex::LockSlow #614

leonarf commented Oct 20, 2015

kcc commented Dec 1, 2015

leonarf commented Dec 2, 2015

dvyukov commented Dec 2, 2015

leonarf commented Dec 2, 2015

marchartung commented Oct 10, 2017

morehouse commented Jun 7, 2018

kcc commented Jun 7, 2018

dvyukov commented Jun 8, 2018

michelrdagenais commented May 12, 2020

100% CPU with multithread real time on StaticSpinMutex::LockSlow #614

100% CPU with multithread real time on StaticSpinMutex::LockSlow #614

Comments

leonarf commented Oct 20, 2015

kcc commented Dec 1, 2015

leonarf commented Dec 2, 2015

dvyukov commented Dec 2, 2015

leonarf commented Dec 2, 2015

marchartung commented Oct 10, 2017

morehouse commented Jun 7, 2018

kcc commented Jun 7, 2018

dvyukov commented Jun 8, 2018

michelrdagenais commented May 12, 2020