Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU with multithread real time on StaticSpinMutex::LockSlow #614

Open
leonarf opened this issue Oct 20, 2015 · 9 comments
Open

100% CPU with multithread real time on StaticSpinMutex::LockSlow #614

leonarf opened this issue Oct 20, 2015 · 9 comments

Comments

@leonarf
Copy link

leonarf commented Oct 20, 2015

Hello.
At work, we build our software with g++ 4.9.1 and AddressSanitizer and C++11. We run this software on a PREEMPT RT kernel (currently 3.18.21-rt19). Our software has several thread scheduled by FIFO and sometimes the thread with most priority use almost 100% CPU. I suspect this thread trying to lock the StaticSpinMutex in infinite loop while it's already locked by another thread with less real time priority.
Does it seems plausible? Do you know any issue that could be related, maybe already closed?

I put both end of backtraces so you can look at it.
Realtime thread scheduled by FIFO with priority 35 :
#0 0xb15abb0a in __memset_sse2_rep () from /lib/libc.so.6
#1 0xb1cff49d in __asan::PoisonShadow(unsigned long, unsigned long, unsigned char) () from /lib/libasan.so.1
#2 0xb1cc99c7 in __sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>::Alloca---Type to continue, or q to quit---

teBatch(_sanitizer::AllocatorStats, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >_, unsigned long) () from /lib/libasan.so.1
#3 0xb1cc9e0f in __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >::Refill(__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>, unsigned long) () from /lib/libasan.so.1
#4 0xb1cc8c05 in __asan::Allocate(unsigned long, unsigned long, __sanitizer::StackTrace
, __asan::AllocType, bool) () from /lib/libasan.so.1
#5 0xb1cfeb70 in operator new(unsigned int) () from /lib/libasan.so.1

Realtime thread scheduled by FIFO with priority 40 :
#0 0xb21b49e4 in __kernel_vsyscall ()
#1 0xb1569837 in syscall () from /lib/libc.so.6
#2 0xb1d0afb9 in __sanitizer::internal_sched_yield() () from /lib/libasan.so.1
#3 0xb1cc97c0 in __sanitizer::StaticSpinMutex::LockSlow() () from /lib/libasan.so.1
#4 0xb1cc9908 in __sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>::AllocateBatch(_sanitizer::AllocatorStats, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >_, unsigned long) () from /lib/libasan.so.1
#5 0xb1cc9e0f in __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback> >::Refill(__sanitizer::SizeClassAllocator32<0ul, 4294967296ull, 16ul, __sanitizer::SizeClassMap<17ul, 64ul, 14ul>, 20ul, __sanitizer::FlatByteMap<4096ull>, __asan::AsanMapUnmapCallback>, unsigned long) () from /lib/libasan.so.1
#6 0xb1cc8c05 in __asan::Allocate(unsigned long, unsigned long, __sanitizer::StackTrace
, __asan::AllocType, bool) () from /lib/libasan.so.1
#7 0xb1cfeb70 in operator new(unsigned int) () from /lib/libasan.so.1

@kcc
Copy link
Contributor

kcc commented Dec 1, 2015

I do not recognize this as a know bug.
Please check if the fresh Clang's asan has it.
If so, please provide a reasonably small reproducer.
Is this at all possible to reproduce on a regular Linux?

@leonarf
Copy link
Author

leonarf commented Dec 2, 2015

I learned something else since I posted : both thread have a CPU affinity set to the same CPU, and only one. So I think real time kernel and CPU affinity is mandatory to reproduce. I'll try to come up with a small reproducer. But in the mean time, looking at the code, do you think the deadlock on the StaticSpinMutex is plausible?

@dvyukov
Copy link
Contributor

dvyukov commented Dec 2, 2015

On Wed, Dec 2, 2015 at 9:04 AM, Leonard notifications@github.com wrote:

I learned something else since I posted : both thread have a CPU affinity
set to the same CPU, and only one. So I think real time kernel and CPU
affinity is mandatory to reproduce. I'll try to come up with a small
reproducer. But in the mean time, looking at the code, do you think the
deadlock on the StaticSpinMutex is plausible?

StaticSpinMutex is a spin mutex. It does not notify kernel about
waiting, it merely calls sched_yield. So I wonder if sched_yield is
effectively no-op on PREEMPT_RT kernel (a thread that did not exhaust its
time slot won't switch to another thread). If so, it would explain the
slowdown.

@leonarf
Copy link
Author

leonarf commented Dec 2, 2015

I think sched_yield just tell OS scheduler that it can look for a new thread with more priority to run. Since the thread trying to lock StaticSpinMutex is the one with the most priority, the scheduler gives the CPU to the same thread again.

@marchartung
Copy link

This issue is open for a while and maybe nobody except me cares, but here some advice:
Don't use spinlocks, if you intend to map two threads to the same CPU or Hyperthread. The scheduler will likely priorities the thread which fails on (spin-)locking, since it utilizes completely the CPU (no waits for cache misses or expensive instructions) and during spinlocking it is not supposed to call sched_yield. This results in a massive slow down of the other thread, what could seem to be a deadlock.
Background:
I worked on a static scheduled program, which ordered events through spinlocks. In the cases where two threads were mapped to the same CPU (different Hyperthreads), the performance dropped and when they were mapped to the same hyperthread, they seem to deadlock.

@morehouse
Copy link
Contributor

When the scheduler is set to SCHED_FIFO, sched_yield will not switch to a thread with a lower priority -- only to one of the same or higher priority.

@kcc, @dvyukov: Do we intend to support RTOS for ASan?

@kcc
Copy link
Contributor

kcc commented Jun 7, 2018

there have been patches recently adding (more) RTOS support.

@dvyukov
Copy link
Contributor

dvyukov commented Jun 8, 2018

I suspect this very negatively affect us even on plain linux with normal priorities. Need a better mutex.

@michelrdagenais
Copy link

I suspect this very negatively affect us even on plain linux with normal priorities. Need a better mutex.

We have the same problem here. We cannot use ASan on a real-time application because of this. After a while, the application deadlocks (a high priority thread spinning for the lock and a lower priority thread holding the lock while preempted). Indeed, two threads eventually get into malloc at the same time and need a new batch (AllocateBatch), which is protected by a SpinMutex. The problem is that sched_yield will not let the lower priority thread run, and the OS is not aware of the user-space mutex waiting. Thus it cannot do anything about the priority inversion.

I see two possible solutions. One is to define SpinMutex to BlockingMutex for Linux, because it will then use the futexes which were built for that. Futexes stay in user-space and are very fast when uncontented and go to kernel space only when contented, allowing other threads to run. The other solution is to have a second version of the library being compiled and offered as option, for real-time applications on Linux (where SpinMutex is defined to BlockingMutex).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants