Gathering global thread backtraces fails often on CentOS with JRuby 1.7 #406

Closed
ferrous26 opened this Issue Nov 23, 2012 · 3 comments

Projects

None yet

2 participants

@ferrous26
Contributor

Using JRuby 1.7 on 64-bit CentOS, I get a lot of null pointer exceptions when I try get backtraces for all running threads. I threw together a sample that can cause the exception every single time on the machine that I am using:

Thread.new do
  10.times do 
    Thread.new do
      1_000_000_000.times do
        1 + 1
      end
    end
  end
end

100.times do
  Thread.list.map &:backtrace
end

The backtrace that I get from JRuby looks like this:

RubyThread.java:212:in `getContext': java.lang.NullPointerException
        from RubyThread.java:973:in `backtrace'
        from RubyThread$INVOKER$i$0$0$backtrace.gen:-1:in `call'
        from JavaMethod.java:861:in `call'
        from CachingCallSite.java:70:in `call'
        from RubySymbol.java:428:in `yieldInner'
        from RubySymbol.java:433:in `yield'
        from Block.java:130:in `yield'
        from RubyArray.java:2347:in `collect'
        from RubyArray.java:2360:in `map19'
        from RubyArray$INVOKER$i$0$0$map19.gen:-1:in `call'
        from CachingCallSite.java:143:in `callBlock'
        from CachingCallSite.java:149:in `call'
        from bug.rb:12:in `block_4$RUBY$__file__'
        from bug$block_4$RUBY$__file__:-1:in `call'
        from CompiledBlock19.java:139:in `yield'
        from Block.java:130:in `yield'
        from RubyFixnum.java:273:in `times'
        from RubyFixnum$INVOKER$i$0$0$times.gen:-1:in `call'
        from CachingCallSite.java:316:in `cacheAndCall'
        from CachingCallSite.java:145:in `callBlock'
        from CachingCallSite.java:154:in `callIter'
        from bug.rb:11:in `__file__'
        from bug.rb:-1:in `load'
        from Ruby.java:779:in `runScript'
        from Ruby.java:772:in `runScript'
        from Ruby.java:649:in `runNormally'
        from Ruby.java:498:in `runFromMain'
        from Main.java:375:in `doRunFromMain'
        from Main.java:264:in `internalRun'
        from Main.java:230:in `run'
        from Main.java:214:in `run'
        from Main.java:194:in `main'

java -version info is:

java version "1.7.0_09-icedtea"
OpenJDK Runtime Environment (rhel-2.3.3.el6_3.1-x86_64)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
@ferrous26
Contributor

I should have mentioned that I do not have this issue when I run the sample code on OS X 10.8.2 with JRuby 1.7

@ferrous26
Contributor

Actually, it does occasionally happen on OS X now...hmm.

@benweint
benweint commented Jan 3, 2013

I'm also able to reproduce on OS X 10.8.2 with JRuby 1.7.1. Full version info:

jruby 1.7.1 (1.9.3p327) 2012-12-03 30a153b on Java HotSpot(TM) 64-Bit Server VM 1.6.0_37-b06-434-11M3909 [darwin-x86_64]

@headius headius added a commit that referenced this issue Sep 17, 2013
@headius headius Make Thread#backtrace a bit more thread-safe.
The original NPE for #406 was due to the thread not having started
running yet; its "context" reference was null. I added logic to
check for that and not try to produce a backtrace.

In addition, I discovered that the building of the trace was also
not threadsafe, since the "backtrace" and "backtraceIndex" fields
in ThreadContext could be updated at the same time by the original
thread. My changes here should make it less likely that backtrace
building will walk off the end of the "backtrace" array, but there
will still be cases where the index and the array get out of sync
and the backtrace contains a couple bogus lines. It is unclear
to me whether we should forcibly prevent the target thread from
updating these fields while the backtrace-generating thread is
generating, since it would surely introduce overhead into the
normal backtrace updating process.

Fixes #406.
fec1af4
@headius headius added a commit that closed this issue Sep 17, 2013
@headius headius Make Thread#backtrace a bit more thread-safe.
The original NPE for #406 was due to the thread not having started
running yet; its "context" reference was null. I added logic to
check for that and not try to produce a backtrace.

In addition, I discovered that the building of the trace was also
not threadsafe, since the "backtrace" and "backtraceIndex" fields
in ThreadContext could be updated at the same time by the original
thread. My changes here should make it less likely that backtrace
building will walk off the end of the "backtrace" array, but there
will still be cases where the index and the array get out of sync
and the backtrace contains a couple bogus lines. It is unclear
to me whether we should forcibly prevent the target thread from
updating these fields while the backtrace-generating thread is
generating, since it would surely introduce overhead into the
normal backtrace updating process.

Fixes #406.
fec1af4
@headius headius closed this in fec1af4 Sep 17, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment