New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deadlock with tbb #1402
Comments
Thanks for reporting this and doing the investigation that you already have! One workaround is setting |
Hmm. There might be a racy access to one of the non-tbb maps, and it got in a state that allowed an infinite loop? Just a note: we're not currently using the stat cache for anything at facebook, so it's probably not very well stress-tested at this point. I'd probably recommend turning it off for now (unless you have huge numbers of requires/includes in each endpoint it's probably not going to be a very large performance hit). |
We just noticed we're defaulting StatCache to on in the OSS config---I'll change it to default off. |
We're using the stat cache in production at Wikimedia and we hit this bug every so often. There are some notes in T89912 on our Phabricator instance. I think this should be re-opened. |
Note we're still observing this bug up through at least hhvm 3.6.5. The TL;DR on the ticket we linked last year above ( https://phabricator.wikimedia.org/T89912#1286874 ) is that a probable workaround or fix is to change one of the StatCache mutexes to recursive/re-entrant, but we haven't actually tested whether this alleviates the issue or not. The non-reentrant mutex in question is still present up through the latest master, here: https://github.com/facebook/hhvm/blob/master/hphp/runtime/base/stat-cache.cpp#L173 . |
I believe that I was hitting a similar bug. I had stat cache enabled, but didn't have any issues with 3.12, 3.13, or 3.14. I started triggering this issue in 3.15. Attached is the trace with (I assume) Thread 7 being the offending thread with the lock. As suggested in this thread, disabled stat cache for now. |
We've had the same issue with all 3.15 and 3.17 versions. |
After #7567 got fixed, we were able to hit the same deadlock as before with latest 3.18.0-dev (master branch): Enabled hhvm.server.stat_cache leads to the deadlock. 'perf top' shows hhvm process stuck in _ZN3tbb10interface519concurrent_hash_mapISsN4HPHP19AtomicSharedPtrImplINS2_9StatCache4NodeELb0EEENS2_17stringHashCompareENS_13tbb_allocatorISt Compiled with latest TBB: Same as before, disabling hhvm.server.stat_cache works around the problem but leads to significantly higher SYS CPU usage because of constant uncached stat() calls. So we have to stick with 3.14.5 |
Let me know if its not fixed. It looks to be the same issue. |
Hhvm stops handling any event while we execute some parallelling requests.
Threads info:
USER %CPU PRI SCNT WCHAN USER SYSTEM TID TIME
work 95.6 - - - - - - 00:06:42
work 0.0 19 - - - - 30713 00:00:00
work 0.0 19 - - - - 31667 00:00:00
work 0.0 19 - - - - 31668 00:00:00
work 0.0 19 - - - - 31669 00:00:00
work 0.0 19 - - - - 31670 00:00:00
work 0.0 19 - - - - 31671 00:00:00
work 0.0 19 - - - - 31672 00:00:00
work 0.0 19 - - - - 31673 00:00:00
work 0.0 19 - - - - 31674 00:00:00
work 95.1 19 - - - - 31675 00:06:38
work 0.0 19 - - - - 31676 00:00:00
work 0.0 19 - - - - 31677 00:00:00
work 0.0 19 - - - - 31678 00:00:00
work 0.0 19 - - - - 31679 00:00:00
work 0.1 19 - - - - 31680 00:00:00
work 0.0 19 - - - - 32612 00:00:00
I've send an abort signal to the thread which'is consuming CPU most seriously, then got this stacktrace.
0 HPHP::bt_handler(int) at crash-reporter.cpp:0
1 restore_rt at sigaction.c:0
2 GI___sched_yield at :0
3 tbb::interface5::concurrent_hash_map<std::basic_string<char, std::char_traits, std::allocator >, HPHP::AtomicSmartPtrHPHP::StatCache::Node, HPHP::stringHashCompare, tbb::tbb_allocator<std::pair<std::basic_string<char, std::char_traits, std::allocator >, HPHP::AtomicSmartPtrHPHP::StatCache::Node > > >::lookup(bool, std::basic_string<char, std::char_traits, std::allocator > const&, HPHP::AtomicSmartPtrHPHP::StatCache::Node const, tbb::interface5::concurrent_hash_map<std::basic_string<char, std::char_traits, std::allocator >, HPHP::AtomicSmartPtrHPHP::StatCache::Node, HPHP::stringHashCompare, tbb::tbb_allocator<std::pair<std::basic_string<char, std::char_traits, std::allocator >, HPHP::AtomicSmartPtrHPHP::StatCache::Node > > >::const_accessor, bool) at /home/work/hhvm/bin/hhvm:0
4 HPHP::StatCache::removePath(std::basic_string<char, std::char_traits, std::allocator > const&, HPHP::StatCache::Node) at /home/work/hhvm/bin/hhvm:0
5 void HPHP::StatCache::Node::touchLocked(bool) at /home/work/hhvm/bin/hhvm:0
6 HPHP::StatCache::Node::expirePaths(bool) at /home/work/hhvm/bin/hhvm:0
7 HPHP::StatCache::handleEvent(inotify_event const) at /home/work/hhvm/bin/hhvm:0
8 HPHP::StatCache::refresh() at /home/work/hhvm/bin/hhvm:0
9 HPHP::hphp_session_init() at /home/work/hhvm/bin/hhvm:0
10 HPHP::HttpRequestHandler::handleRequest(HPHP::Transport_) at /home/work/hhvm/bin/hhvm:0
11 HPHP::ServerWorkerstd::shared_ptr<HPHP::LibEventJob, HPHP::LibEventTransportTraits>::doJobImpl(std::shared_ptrHPHP::LibEventJob, bool) at /home/work/hhvm/bin/hhvm:0
12 HPHP::ServerWorkerstd::shared_ptr<HPHP::LibEventJob, HPHP::LibEventTransportTraits>::doJob(std::shared_ptrHPHP::LibEventJob) at /home/work/hhvm/bin/hhvm:0
13 HPHP::JobQueueWorkerstd::shared_ptr<HPHP::LibEventJob, true, false, HPHP::JobQueueDropVMStack>::start() at /home/work//hhvm/bin/hhvm:0
14 HPHP::AsyncFuncImpl::ThreadFunc(void_) at /home/work/hhvm/bin/hhvm:0
15 start_thread at pthread_create.c:0
16 __clone at /opt/compiler/gcc-4.8.1/lib/libc.so.6:0
Seems like a deadlock. Will anybody give us some help?
Tbb's version is 4.2, and hhvm 2.2.
Something specially is that one or more php scripts could be refreshed frequently during requests.
The text was updated successfully, but these errors were encountered: