Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads stuck waiting for CurlMultiAwait [critical] #7076

Closed
wake-up-neo opened this issue May 18, 2016 · 4 comments
Closed

Threads stuck waiting for CurlMultiAwait [critical] #7076

wake-up-neo opened this issue May 18, 2016 · 4 comments

Comments

@wake-up-neo
Copy link
Contributor

HHVM Version

HipHop VM 3.13.1 (rel)
Compiler: tags/HHVM-3.13.1-0-g4f382ad928a6e2a0607a8dcb251002aca77f11f6
Repo schema: 655b5912cb8136e9df6f9be972153e38ac446e0f

The problem

I'm facing the problem of hanging of all hhvm threads without step-by-step simulation of the issue, this may happens once a day, or once a week on one of the servers without correlation of highload/traffic etc.

So what is really happens - hhvm(-cgi) stops responding with 0 CPU usage.
This process in the example below were running for 24 hours I guess with lot's of successfully completed tasks.

Then
strace -f -p {PID}

Process 11637 attached with 26 threads
[pid 20002] futex(0x7fc6967f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 20000] futex(0x7fc69b3f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19999] futex(0x7fc69fff9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19998] futex(0x7fc6a4bf9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19997] futex(0x7fc6a97f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19996] futex(0x7fc6ae3f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19995] futex(0x7fc6b2bf9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19994] futex(0x7fc6b73f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19993] futex(0x7fc6e7bf9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19992] futex(0x7fc6f53f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19991] futex(0x7fc6eaff9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19990] futex(0x7fc6ea3f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19988] futex(0x7fc6e8bf9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19989] futex(0x7fc6e97f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19987] futex(0x7fc6e83f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19986] futex(0x7fc6f67f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 19985] futex(0x7fc6f5bf9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 12096] futex(0x7fc6f77f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 12095] futex(0x7fc6f6ff9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11704] futex(0x7fc6f07f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11703] futex(0x7fc6f7ff9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11669] futex(0x7fc6fd3f9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11663] futex(0x7fc702ff9b34, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11658] futex(0x7fc707bf9d64, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11657] epoll_wait(6, <unfinished ...>
[pid 11637] futex(0x7fc71145720c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 11657] <... epoll_wait resumed> {}, 32, 7) = 0

gdb

Thread 26 (Thread 0x7fc7087ff700 (LWP 11657)):

0 0x00007fc718d66e53 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81

1 0x00007fc71b234b08 in ?? () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5

2 0x00007fc71b21fd5a in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5

3 0x00000000032e2ba6 in folly::EventBase::loopBody(int) ()

4 0x00000000031b3225 in folly::EventBase::loopForever() ()

5 0x000000000287600a in proxygen::WorkerThread::runLoop() ()

6 0x0000000002876380 in std::thread::_Impl<std::_Bind_simple<proxygen::WorkerThread::start()::{lambda()#1} ()> >::_M_run() ()

7 0x00007fc7195f6970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

8 0x00007fc7198530a4 in start_thread (arg=0x7fc7087ff700) at pthread_create.c:309

9 0x00007fc718d6687d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 25 (Thread 0x7fc707bff700 (LWP 11658)):

0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

1 0x00007fc7195f2c7c in std::condition_variable::wait(std::unique_lockstd::mutex&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

2 0x00000000031b4140 in folly::EventBase::runInEventBaseThreadAndWait(std::function<void ()> const&) ()

3 0x0000000001de33e7 in HPHP::CurlMultiAwait::CurlMultiAwait(HPHP::req::ptrHPHP::CurlMultiResource, double) ()

4 0x0000000001def832 in HPHP::f_curl_multi_await(HPHP::Resource const&, double) ()

5 0x000000000503ef54 in ?? ()

6 0x00007fc70307faa0 in ?? ()

7 0x0000000005048c5b in ?? ()

8 0x00007fc70307fb60 in ?? ()

9 0x00000000050466a0 in ?? ()

10 0x00007fc70307fbf0 in ?? ()

11 0x000000000505a685 in ?? ()

12 0x0000000003214a5e in enterTCHelper ()

13 0x0000000001659953 in HPHP::jit::x64::enterTCImpl(unsigned char_, HPHP::ActRec_) ()

14 0x00000000032149cc in HPHP::jit::MCGenerator::enterTC(unsigned char_, HPHP::ActRec_) ()

15 0x000000000321263e in HPHP::ExecutionContext::invokeFunc(HPHP::TypedValue_, HPHP::Func const_, HPHP::Variant const&, HPHP::ObjectData_, HPHP::Class_, HPHP::VarEnv_, HPHP::StringData_, HPHP::ExecutionContext::InvokeFlags, bool) ()

16 0x00000000032b644d in HPHP::ExecutionContext::invokeUnit(HPHP::TypedValue_, HPHP::Unit const_) ()

17 0x00000000032b6277 in HPHP::invoke_file(HPHP::String const&, bool, char const*) ()

18 0x00000000032b6601 in HPHP::include_impl_invoke(HPHP::String const&, bool, char const*) ()

19 0x00000000032b42e4 in HPHP::hphp_invoke(HPHP::ExecutionContext*, std::string const&, bool, HPHP::Array const&, HPHP::VRefParamValue const&, std::string const&, std::string const&, bool&, std::string&, bool, bool, bool) ()

20 0x00000000032ac1f2 in HPHP::HttpRequestHandler::executePHPRequest(HPHP::Transport*, HPHP::RequestURI&, HPHP::SourceRootInfo&, bool) ()

21 0x00000000032a5aef in HPHP::HttpRequestHandler::handleRequest(HPHP::Transport*) ()

22 0x00000000010cd67f in HPHP::ServerWorkerstd::shared_ptr<HPHP::FastCGIJob, HPHP::FastCGITransportTraits>::doJobImpl(std::shared_ptrHPHP::FastCGIJob, bool) ()

23 0x00000000010ce2bf in HPHP::JobQueueWorkerstd::shared_ptr<HPHP::FastCGIJob, HPHP::Server*, true, false, HPHP::JobQueueDropVMStack>::start() ()

24 0x000000000251bc67 in HPHP::AsyncFuncImpl::ThreadFunc(void*) ()

25 0x0000000000d89ef7 in HPHP::start_routine_wrapper(void*) ()

26 0x00007fc7198530a4 in start_thread (arg=0x7fc707bff700) at pthread_create.c:309

27 0x00007fc718d6687d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 24 (Thread 0x7fc702fff700 (LWP 11663)):

0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

1 0x00007fc7195f2c7c in std::condition_variable::wait(std::unique_lockstd::mutex&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

2 0x00000000031b4140 in folly::EventBase::runInEventBaseThreadAndWait(std::function<void ()> const&) ()

3 0x0000000001de33e7 in HPHP::CurlMultiAwait::CurlMultiAwait(HPHP::req::ptrHPHP::CurlMultiResource, double) ()

4 0x0000000001def832 in HPHP::f_curl_multi_await(HPHP::Resource const&, double) ()

---Type to continue, or q to quit---

5 0x000000000503ef54 in ?? ()

6 0x00007fc70333f550 in ?? ()

7 0x0000000000000000 in ?? ()

Thread 23 (Thread 0x7fc6fd3ff700 (LWP 11669)):

0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

1 0x00007fc7195f2c7c in std::condition_variable::wait(std::unique_lockstd::mutex&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

2 0x00000000031b4140 in folly::EventBase::runInEventBaseThreadAndWait(std::function<void ()> const&) ()

3 0x0000000001de33e7 in HPHP::CurlMultiAwait::CurlMultiAwait(HPHP::req::ptrHPHP::CurlMultiResource, double) ()

4 0x0000000001def832 in HPHP::f_curl_multi_await(HPHP::Resource const&, double) ()

5 0x000000000503ef54 in ?? ()

6 0x00007fc70eebfab0 in ?? ()

7 0x0000000005048c5b in ?? ()

8 0x00007fc70eebfb70 in ?? ()

9 0x00000000050466a0 in ?? ()

10 0x00007fc70eebfc00 in ?? ()

11 0x0000000008c3892c in ?? ()

12 0x0000000003214a5e in enterTCHelper ()

13 0x0000000001659953 in HPHP::jit::x64::enterTCImpl(unsigned char_, HPHP::ActRec_) ()

14 0x00000000032149cc in HPHP::jit::MCGenerator::enterTC(unsigned char_, HPHP::ActRec_) ()

15 0x000000000321263e in HPHP::ExecutionContext::invokeFunc(HPHP::TypedValue_, HPHP::Func const_, HPHP::Variant const&, HPHP::ObjectData_, HPHP::Class_, HPHP::VarEnv_, HPHP::StringData_, HPHP::ExecutionContext::InvokeFlags, bool) ()

16 0x00000000032b644d in HPHP::ExecutionContext::invokeUnit(HPHP::TypedValue_, HPHP::Unit const_) ()

17 0x00000000032b6277 in HPHP::invoke_file(HPHP::String const&, bool, char const*) ()

18 0x00000000032b6601 in HPHP::include_impl_invoke(HPHP::String const&, bool, char const*) ()

19 0x00000000032b42e4 in HPHP::hphp_invoke(HPHP::ExecutionContext*, std::string const&, bool, HPHP::Array const&, HPHP::VRefParamValue const&, std::string const&, std::string const&, bool&, std::string&, bool, bool, bool) ()

20 0x00000000032ac1f2 in HPHP::HttpRequestHandler::executePHPRequest(HPHP::Transport*, HPHP::RequestURI&, HPHP::SourceRootInfo&, bool) ()

21 0x00000000032a5aef in HPHP::HttpRequestHandler::handleRequest(HPHP::Transport*) ()

22 0x00000000010cd67f in HPHP::ServerWorkerstd::shared_ptr<HPHP::FastCGIJob, HPHP::FastCGITransportTraits>::doJobImpl(std::shared_ptrHPHP::FastCGIJob, bool) ()

23 0x00000000010ce2bf in HPHP::JobQueueWorkerstd::shared_ptr<HPHP::FastCGIJob, HPHP::Server*, true, false, HPHP::JobQueueDropVMStack>::start() ()

24 0x000000000251bc67 in HPHP::AsyncFuncImpl::ThreadFunc(void*) ()

25 0x0000000000d89ef7 in HPHP::start_routine_wrapper(void*) ()

26 0x00007fc7198530a4 in start_thread (arg=0x7fc6fd3ff700) at pthread_create.c:309

27 0x00007fc718d6687d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 22 (Thread 0x7fc6f7fff700 (LWP 11703)):

0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185

1 0x00007fc7195f2c7c in std::condition_variable::wait(std::unique_lockstd::mutex&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

2 0x00000000031b4140 in folly::EventBase::runInEventBaseThreadAndWait(std::function<void ()> const&) ()

3 0x0000000001de33e7 in HPHP::CurlMultiAwait::CurlMultiAwait(HPHP::req::ptrHPHP::CurlMultiResource, double) ()

4 0x0000000001def832 in HPHP::f_curl_multi_await(HPHP::Resource const&, double) ()

5 0x000000000503ef54 in ?? ()

6 0x00007fc6fe7bfab0 in ?? ()

7 0x0000000005048c5b in ?? ()

8 0x00007fc6fe7bfb70 in ?? ()

9 0x00000000050466a0 in ?? ()

10 0x00007fc6fe7bfc00 in ?? ()

11 0x0000000008c3892c in ?? ()

12 0x0000000003214a5e in enterTCHelper ()

13 0x0000000001659953 in HPHP::jit::x64::enterTCImpl(unsigned char_, HPHP::ActRec_) ()

......

server.ini

cat /etc/hhvm/server.ini
; php options

pid = /var/run/hhvm/pid

; hhvm specific

hhvm.server.port = 9000
hhvm.server.type = fastcgi
hhvm.server.default_document = index.php
hhvm.log.use_log_file = true
hhvm.log.file = /var/log/hhvm/error.log
hhvm.repo.central.path = /var/run/hhvm/hhvm.hhbc
hhvm.keep_perf_pid_map = 0
hhvm.perf_pid_map = 0
hhvm.perf_data_map = 0

hhvm.server.enable_keep_alive = false
hhvm.server.expose_hphp = false
hhvm.server.graceful_shutdown_wait = 30
hhvm.server.thread_round_robin = true

php.ini

cat /etc/hhvm/php.ini
; php options
session.save_handler = files
session.save_path = /var/lib/hhvm/sessions
session.gc_maxlifetime = 1440

; hhvm specific
hhvm.log.level = Warning
hhvm.log.always_log_unhandled_exceptions = true
hhvm.log.runtime_error_reporting_level = 8191
hhvm.mysql.typed_results = false

Code

It's not suitable to publish code here since it doesn't reproduce the problem by itself.
Let's say I'm using code from the example (very similar)
https://docs.hhvm.com/hack/reference/function/curl_multi_await/

I tried to change
$select = await curl_multi_await($mh);
to
$select = curl_multi_select($master);
to make it sync, but it still do hang out all threads sometimes. I also tried all possible combinations of using await, incl. await AwaitAllWaitHandle::fromArray, \HH\Asio\join($awaitableTasks), ->getWaitHandle()->join() etc. and there were no difference for this problem.

I have all necessary timeouts, on curl level, on hhvm level, and also hard-coded "break" after "critical" timeout inside while (curl_multi_exec) which was added after facing the issue and doesn't solve it, with all necessary descriptors closing etc. and now I don't really know what to do since I'm already trying to find the problem for two months.

Here is example of similar server with the same code and configuration, and it doesn't have this problem at all since I'm using it. You may see it by the number "6126010". Another servers, that may stuck - showing the same strace as below but once they hangs - every thread except master thread starts showing FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>

Process 4726 attached with 11 threads
[pid 28753] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126010, NULL <unfinished ...>
[pid 21694] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126012, NULL <unfinished ...>
[pid 28753] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 21694] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 28753] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126014, NULL <unfinished ...>
[pid 21694] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126014, NULL <unfinished ...>
[pid 20979] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126002, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 17529] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126000, NULL <unfinished ...>
[pid 20979] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126014, NULL <unfinished ...>
[pid 17529] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 17529] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126014, NULL <unfinished ...>
[pid 5309] epoll_wait(28, <unfinished ...>
[pid 5308] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126008, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 5307] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126006, NULL <unfinished ...>
[pid 5308] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126014, NULL <unfinished ...>
[pid 5307] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 5307] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126014, NULL <unfinished ...>
[pid 4758] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126014, NULL <unfinished ...>
[pid 4748] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126004, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 4746] epoll_wait(6, <unfinished ...>
[pid 4748] futex(0x7f71eac72d64, FUTEX_WAIT_PRIVATE, 6126014, NULL <unfinished ...>
[pid 4726] futex(0x7f71eac5720c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 4746] <... epoll_wait resumed> {{EPOLLIN, {u32=17, u64=17}}}, 32, -1) = 1
[pid 4746] accept4(17, {sa_family=AF_INET6, sin6_port=htons(64297), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_NONBLOCK) = 24
...

Will appreciate any help regarding this issue.

@wake-up-neo
Copy link
Contributor Author

wake-up-neo commented May 18, 2016

Here is the issue, it waits infinitely

#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007fc7195f2c7c in std::condition_variable::wait(std::unique_lockstd::mutex&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00000000031b4140 in folly::EventBase::runInEventBaseThreadAndWait(std::function const&) ()
#3 0x0000000001de33e7 in HPHP::CurlMultiAwait::CurlMultiAwait(HPHP::req::ptrHPHP::CurlMultiResource, double) ()

@wake-up-neo
Copy link
Contributor Author

Could this be related to a very large number of tiny (bytes) and very fast (milliseconds) http requests together with this https://bugs.php.net/bug.php?id=61141 ?

@wake-up-neo
Copy link
Contributor Author

@wake-up-neo
Copy link
Contributor Author

I'm re-openning this issue again since the pull request is stuck in discussion with no progress.

@wake-up-neo wake-up-neo reopened this Jul 31, 2016
hhvm-bot pushed a commit that referenced this issue Aug 16, 2016
Summary:
#7076
Closes #7205

Reviewed By: aorenste

Differential Revision: D3544325

Pulled By: aorenste

fbshipit-source-id: 35f3679809e98324ff8426e97b503e953d18d6bd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant