Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
NullPointerException bug: openssl.Digest #1000
I'm having issues running Rails 4 on JRuby 1.7.5 head (2013-09-10)
After running some traffic against a vanilla Rails app, I start seeing a NullPointerException concerning org.jruby.ext.openssl.Digest.getAlgorithm. Full trace below.
In my quick test I could not reproduce this on JRuby master with Java 6. I have a few more questions for you:
I looked at the code referenced in the error and I can't see how it would be null. The Digest object gets initialized with a digest algorithm before ::new would return it, and nothing else sets it to null. The logic that loads the algorithm appears to always return either non-null or raise a different exception.
I have a suspicion that Java 7u40 and/or InvokeDynamic is to blame.
On master with 7u25, I was able to complete 483 requests.
It ran out of memory.
To be able to replicate, I'd go with code at https://github.com/greghuc/rails-jruby-1.7.5-bug.git mentioned above.
I had issues with ab on mac, which were fixed by: http://superuser.com/questions/323840/apache-bench-test-erroron-os-x-apr-socket-recv-connection-reset-by-peer-54
In answer to questions:
Bug manifests for java 1.7.0_40 and 1.6.0_51 (the standard issue on my Mac)
It's only happening in production mode, not development mode.
It happens with JRUBY_OPTS=-X-C applied, though the exception trace is slightly different. See below
It doesn't happen in JRuby 1.7.4
Exception from JRUBY_OPTS=-X-C mode:
So here's the minimum steps to reproduce this bug:
I just followed these steps and saw the exception above.
Here's my config:
Yeah, this has to be a concurrency bug. Also since it can happen with -X-C and also happen with Java 6 that eliminates both invokedynamic and the JIT as being the main issue. Either some poor assumption in Rails 4 (or dependent gem) reusing something per-thread across threads or something we are improperly doing internally. I am hoping it is us since we can fix the issue before release. Otherwise we will need to file an issue against an external project.
I am still looking at this but it is just plain bizarre. I still believe this to be a concurrency issue (based on explanation above) but from looking at the code it is difficult to see how. This occasionally gets a Digest instance where the name of the digest SHA1 and the MessageDigest algo are both null. Figuring out how we get this instance with null values has been elusive. Everywhere I print out info about this instance in the call chain it is not null until the point it NPEs.