Skip to content

Thread#backtrace gets stuck in an infinite loop #1888

Closed
thedarkone opened this Issue Sep 5, 2012 · 8 comments

3 participants

@thedarkone

Rubinius version: rubinius 2.0.0dev (1.8.7 149a3642 yyyy-mm-dd JI) [x86_64-apple-darwin10.2.0] (1.9 mode doesn't help).

Rubinius gets stuck during a run of the test_cache_loops.rb stress test of the thread_safe gem. It looks as a some kind of deadlock as the process drops to 0% CPU usage. Since there's no way to get a thread dump from a running Rubinius I'm using a poor mans one implemented in pure ruby: here and here. This usually works, but in this case the VM goes into infinite loop (using single core at 100%) on the t.backtrace.join("\n") line.

I run into this while refactoring a method, so in end this might end to be a bug in my code (except the Thread#backtrace infi-loop). I can make this go away by manually inlining a single iteration of the AtomiceReferenceCacheBackend#find_value_in_node_list into the compute_if_absent method like this. JRuby runs the same code without any issues though, so I'm thinking this might be an rbx bug after all.

To reproduce:
git clone git://github.com/thedarkone/thread_safe.git
cd thread_safe
git checkout rbx-debug
ruby -rubygems -Ilib ./test/test_cache_loops.rb

For a speedier run the THREAD_COUNT const can be somewhat bumped down, I'm still getting full reproducibility with THREAD_COUNT = 8 on a dual core machine.

@dbussink
Rubinius member
dbussink commented Sep 6, 2012

This looks like it's related to the thread deadlock issues @ryoqun is working on atm. We should probably check if this is still a problem after those changes are merged in.

@dbussink
Rubinius member
dbussink commented Oct 2, 2012

@ryoqun I was wondering, did you use this too for working on those thread deadlocks you fixed?

@ryoqun
Rubinius member
ryoqun commented Oct 3, 2012

@dbussink No, I didn't. I was going to look at this after all of my thread deadlock fixes were merged in. So, I'll look at it now. Hopefully, I'll get a bug fix credit without working. :p

@ryoqun
Rubinius member
ryoqun commented Oct 3, 2012

Hmm. I couldn't reproduce this on ubuntu.. I'll test this on mac later.

@thedarkone

Alright I've messed up guys, I've pushed 2 more commits into the rbx-debug branch:

  • the first one actually disables the Rubinius workaround I'm using on master
  • the second commit: turns out there also needs to be this other debug code to trigger the infinite backtrace loops... without it I'm getting LocalJumpError: no block given errors, which I'm now going to look into.

This is still reproducible for me on the current 32c35a4f master.

@ryoqun
Rubinius member
ryoqun commented Oct 4, 2012

@thedarkone thanks for the additional info. Somehow, I probably managed to reproduce your bug on my mac (failed on my ubuntu...). I'll further investigate into this later.

Just for a confirmation, is this output similar to what you got when rubiniux gets stuck?: https://gist.github.com/3834961

@thedarkone

@ryoqun: yes, this is where it gets stuck (my, bad should have posted the expected output).

What happens is that during a test run an exception is thrown by compute_if_absent (I opened a separate #1940 ticket for that) (Bug A) which is caught by this code as soon as it tries to get a e.backtrace that thread gets deadlocked (Bug B) then all other worker threads finish and rubinius goes to 0% CPU usage. There is a watchdog thread that that wakes up every 20 seconds to check on worker threads making progress, it discovers that no progress is being made prints out _compute_if_absent_loop_outer: STUCK!!!!, dumps what the concurrent cache looks like and then proceeds to iterate over all existing threads and tries to print their backtraces and gets stuck right at the first thread (#<Thread:0x2f98 id=1 sleep>) going into infinite loop chewing 100% of single core (Bug C).

This ticket should be about solving backtrace deadlocks/infinite loops (Bug B and Bug C). Unless all of the bugs are caused by the same issue it is better to solve them in reverse order (C, B, A), this way all the rbx bugs get fixed.

@dbussink
Rubinius member

The backtrace issue is probably fixed in 59ab694. If you still see deadlock / problems with the code, please open a new issue for that.

@dbussink dbussink closed this Oct 10, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.