Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown hang on Travis at termination of MRI stdlib suite #5718

Open
headius opened this issue Apr 26, 2019 · 5 comments

Comments

Projects
None yet
3 participants
@headius
Copy link
Member

commented Apr 26, 2019

Something in the MRI stdlib test suite (rake test:mri:stdlib) appears to hang when shutting down the test suite. The suite itself runs green, but after the last test completes it appears to intermittently hang and fail to complete, which leads to Travis marking it as a failure.

In 44d0e0c I have moved this suite to allowed failures, but we will want to figure out why this suite sometimes hangs and move it back out.

@headius headius added this to the JRuby 9.2.8.0 milestone Apr 26, 2019

@ahorek

This comment has been minimized.

Copy link
Contributor

commented May 4, 2019

Hi @headius, I had a similar issue with https://github.com/rails/sprockets

so I've decided to investigate a little and here's a minimum repro
https://github.com/ahorek/jrubyhang

the test passes, but on jruby it hangs during shutdown.

@kares

This comment has been minimized.

Copy link
Member

commented May 6, 2019

@ahorek you seem to be using Concurrent library, which isn't used by minitest in the MRI suite, right?
... your Promise will likely spin-up an executor, which might need some proper (manual) shutdown.
wonder if that worked in previous versions, anyway this sounds like an at_exit pool shutdown issue.

@ahorek

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

you seem to be using Concurrent library, which isn't used by minitest in the MRI suite, right?

right, but I think it might be the same problem.

which might need some proper (manual) shutdown

should Concurrent::Promise workers be autoterminated? I didn't find a documentation about how to shutdown these workers manually in the code.
This issue might be also relevant ruby-concurrency/concurrent-ruby#701 @pitr-ch ?
the same test passes on MRI.

it looks like it's waiting for other threads to terminate during minitests shutdown. But it seems to terminate properly in irb.

@headius

This comment has been minimized.

Copy link
Member Author

commented May 8, 2019

Aha, are those executor threads being created as daemon threads? If not, they'd keep the JVM alive until (I believe) some timeout in the executor actually shuts them down. Or the executor could be eagerly shut down at some point.

@headius

This comment has been minimized.

Copy link
Member Author

commented May 14, 2019

So at least for the concurrent-ruby case, it does appear to be a thread pool created with non-daemon threads that never shut down when the Ruby app completes.

I see a stack trace like this (note it is not marked as a daemon thread):

"pool-1-thread-1" #16 prio=5 os_prio=31 tid=0x00007f916054c000 nid=0x5803 waiting on condition [0x000070000eff6000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000076de77188> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

And if I dig into the heap to see who's holding references to that thread, I eventually get back to a thread pool held by:

varTable in com.concurrent_ruby.ext.SynchronizationLibrary$JRubyLockableObject#2

So that issue does appear to be a problem with the promise-running thread pool in concurrent-ruby not being configured with daemon threads. Thoughts on this, @petr-ch?

I do not know if this relates to the JRuby test suite hang, since I would not expect anything in that suite to use concurrent-ruby.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.