Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IsolatedProcessTest::test_integration_concat_with_snapshots_stdout hangs ~1/5 the time #6242

Closed
jsirois opened this issue Jul 26, 2018 · 8 comments
Assignees
Labels

Comments

@jsirois
Copy link
Contributor

jsirois commented Jul 26, 2018

When I run ./pants test.pytest --no-timeouts --cache-ignore tests/python/pants_test/engine:isolated_process -- -ktest_integration_concat_with_snapshots_stdout -v repeatedly, I get a hang ~1/10 the time.

@jsirois
Copy link
Contributor Author

jsirois commented Jul 26, 2018

Some details:

The threads in play:

(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7ff4137ad4c0 (LWP 19359) "python2.7" 0x00007ff4133b5ffc in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
  2    Thread 0x7ff3f2bf5700 (LWP 19441) "python2.7" 0x00007ff4130e585f in epoll_wait () from /usr/lib/libc.so.6
  3    Thread 0x7ff408a3c700 (LWP 19442) "tokio-runtime-w" 0x00007ff4133b5ffc in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
  4    Thread 0x7ff3f2ff7700 (LWP 19443) "tokio-runtime-w" 0x00007ff4133b5ffc in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
  5    Thread 0x7ff3f2df6700 (LWP 19444) "tokio-runtime-w" 0x00007ff4133b5ffc in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
  6    Thread 0x7ff40863a700 (LWP 19445) "tokio-runtime-w" 0x00007ff4133b5ffc in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
  7    Thread 0x7ff3f39fc700 (LWP 19446) "io-0" 0x00007ff4133b929c in __lll_lock_wait () from /usr/lib/libpthread.so.0
  8    Thread 0x7ff3f20d9700 (LWP 19447) "io-1" 0x00007ff4133b929c in __lll_lock_wait () from /usr/lib/libpthread.so.0
  9    Thread 0x7ff40883b700 (LWP 19448) "io-2" 0x00007ff4133b929c in __lll_lock_wait () from /usr/lib/libpthread.so.0
  10   Thread 0x7ff3f31f8700 (LWP 19449) "io-3" 0x00007ff4133b5ffc in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
  11   Thread 0x7ff408238700 (LWP 19450) "io-4" 0x00007ff4133b929c in __lll_lock_wait () from /usr/lib/libpthread.so.0
  12   Thread 0x7ff3f35fa700 (LWP 19451) "io-5" 0x00007ff4133b929c in __lll_lock_wait () from /usr/lib/libpthread.so.0
  13   Thread 0x7ff408439700 (LWP 19452) "io-6" 0x00007ff4133b929c in __lll_lock_wait () from /usr/lib/libpthread.so.0
  14   Thread 0x7ff3f3dfe700 (LWP 19453) "io-7" 0x00007ff4133b929c in __lll_lock_wait () from /usr/lib/libpthread.so.0
  15   Thread 0x7ff3f37fb700 (LWP 19454) "tokio-runtime-w" 0x00007ff4133b5ffc in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ff4137ad4c0 (LWP 19359))]
#0  0x00007ff4133b5ffc in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
(gdb) bt
#0  0x00007ff4133b5ffc in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007ff409357839 in std::sys::unix::condvar::Condvar::wait (self=<optimized out>, mutex=<optimized out>) at /checkout/src/libstd/sys/unix/condvar.rs:78
#2  std::sys_common::condvar::Condvar::wait (self=0x55bcbee4e1f0, mutex=0x55bcbee4e1c0) at /checkout/src/libstd/sys_common/condvar.rs:51
#3  std::sync::condvar::Condvar::wait (self=0x55bcbee4e258, guard=...) at /checkout/src/libstd/sync/condvar.rs:214
#4  futures::task_impl::std::ThreadNotify::park (self=0x55bcbee4e240) at /home/jsirois/.cache/pants/rust/cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.21/src/task_impl/std/mod.rs:536
#5  0x00007ff408fd1c2b in futures::task_impl::std::<impl futures::task_impl::Spawn<F>>::wait_future::{{closure}} (notify=<optimized out>) at /home/jsirois/.cache/pants/rust/cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.21/src/task_impl/std/mod.rs:244
#6  futures::task_impl::std::ThreadNotify::with_current::{{closure}} (notify=<optimized out>) at /home/jsirois/.cache/pants/rust/cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.21/src/task_impl/std/mod.rs:506
#7  <std::thread::local::LocalKey<T>>::try_with (self=<optimized out>, f=...) at /checkout/src/libstd/thread/local.rs:294
#8  <std::thread::local::LocalKey<T>>::with (self=<optimized out>, f=...) at /checkout/src/libstd/thread/local.rs:248
#9  futures::task_impl::std::ThreadNotify::with_current (f=...) at /home/jsirois/.cache/pants/rust/cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.21/src/task_impl/std/mod.rs:506
#10 futures::task_impl::std::<impl futures::task_impl::Spawn<F>>::wait_future (self=<optimized out>) at /home/jsirois/.cache/pants/rust/cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.21/src/task_impl/std/mod.rs:240
#11 futures::future::Future::wait (self=...) at /home/jsirois/.cache/pants/rust/cargo/registry/src/github.com-1ecc6299db9ec823/futures-0.1.21/src/future/mod.rs:299
#12 engine::scheduler::Scheduler::execute (self=<optimized out>, request=<optimized out>, session=<optimized out>) at src/scheduler.rs:253
#13 engine::scheduler_execute::{{closure}}::{{closure}}::{{closure}} (session=<optimized out>) at src/lib.rs:338
#14 engine::with_session (session_ptr=<optimized out>, f=...) at src/lib.rs:754
#15 engine::scheduler_execute::{{closure}}::{{closure}} (execution_request=<optimized out>) at src/lib.rs:336
#16 engine::with_execution_request (execution_request_ptr=<optimized out>, f=...) at src/lib.rs:767
#17 engine::scheduler_execute::{{closure}} (scheduler=<optimized out>) at src/lib.rs:335
#18 engine::with_scheduler (scheduler_ptr=<optimized out>, f=...) at src/lib.rs:741
#19 scheduler_execute (scheduler_ptr=0x55bcbf0c65f0, session_ptr=<optimized out>, execution_request_ptr=0x55bcbf09df90) at src/lib.rs:334
#20 0x00007ff408ff3e10 in _cffi_f_scheduler_execute (self=<optimized out>, args=<optimized out>) at src/cffi/native_engine.c:2656
#21 0x00007ff412c903a9 in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0

I can't determine anything immediately useful except that probably epoll_wait has missed an io event from the cat process.

@illicitonion
Copy link
Contributor

I can't reproduce this on OSX... Will give it a go on a Linux machine when I have a chance...

@jsirois
Copy link
Contributor Author

jsirois commented Jul 31, 2018

Noting this is just as frequent on linux CI from an informal survey of PRs with !

@jsirois
Copy link
Contributor Author

jsirois commented Aug 1, 2018

This is hitting CI quite frequently so I'll try to dig a bit more here presently.

@jsirois jsirois self-assigned this Aug 1, 2018
@jsirois
Copy link
Contributor Author

jsirois commented Aug 1, 2018

Notes:

  • dropping the cat ExecuteProcessRequest timeout to 5 seconds does not lead to a timeout, hang still present.
    ...

@illicitonion
Copy link
Contributor

alexcrichton/tokio-process#42 may be related? That but describes deadlock when running multiple processes, I haven't looked at it in enough detail to know whether it could also trigger for single invocations.

Have you seen this in the rust shard, or only via python?

@stuhood
Copy link
Sponsor Member

stuhood commented Aug 1, 2018

Mm... good find.

jsirois added a commit to jsirois/pants that referenced this issue Aug 3, 2018
Previously we hand-rolled collection of process output streams; we now
leverage tokio-process to do this for us.

Fixes pantsbuild#6242
@jsirois
Copy link
Contributor Author

jsirois commented Aug 3, 2018

@illicitonion re

Have you seen this in the rust shard, or only via python?

Only via the ./pants test.pytest --no-timeouts --cache-ignore tests/python/pants_test/engine:isolated_process -- -ktest_integration_concat_with_snapshots_stdout -v mentioned in the issue description, but I had not tried to hit this state running rust tests either - nor have I witnessed a hang in rust tests in CI.

stuhood pushed a commit that referenced this issue Aug 3, 2018
Previously we hand-rolled collection of process output streams; we now
leverage tokio-process to do this for us.

Fixes #6242
jsirois added a commit to jsirois/pants that referenced this issue Aug 3, 2018
Previously streams were combined incorrectly, leading to deadlock
whenever a process exit event preceeded end of stream events. We no
longer assume stream shutdown events preceed exit events eliminating
deadlock.

Closes pantsbuild#6089
Fixes pantsbuild#6242
jsirois added a commit that referenced this issue Aug 3, 2018
Previously streams were combined incorrectly, leading to deadlock
whenever a process exit event preceeded end of stream events. We no
longer assume stream shutdown events preceed exit events eliminating
deadlock.

Closes #6089
Fixes #6242
CMLivingston pushed a commit to CMLivingston/pants that referenced this issue Aug 27, 2018
Previously we hand-rolled collection of process output streams; we now
leverage tokio-process to do this for us.

Fixes pantsbuild#6242
CMLivingston pushed a commit to CMLivingston/pants that referenced this issue Aug 27, 2018
Previously streams were combined incorrectly, leading to deadlock
whenever a process exit event preceeded end of stream events. We no
longer assume stream shutdown events preceed exit events eliminating
deadlock.

Closes pantsbuild#6089
Fixes pantsbuild#6242
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants