New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoloading appears to lead to race conditions #3318

Closed
YorickPeterse opened this Issue Sep 10, 2015 · 3 comments

Comments

Projects
None yet
5 participants
@YorickPeterse

It appears that JRuby 9.0.1.0 still has problems autoloading constants under certain conditions. We noticed this issue quite a bit on JRuby 1.7.something, to a certain extend it still seems to occur on 9.0.1.0 (though less often).

The problem is basically as following: we run numerous Ruby applications on JRuby for sentiment analysis (various applications bind to Java code, hence JRuby). All these applications use the AWS SDK (a recurring source of headaches at this point). The AWS SDK in turn autoloads pretty much everything, it also dynamically defines a bunch of classes based on AWS API responses if I remember correctly (though this seems unrelated to the problem at hand). In certain instances the AWS code will throw a NameError when it can't find a certain constant. For example (stacktrace is from Rollbar, hence the somewhat odd formatting):

NameError: uninitialized constant Aws::Xml::Parser::Stack::Frame
File "org/jruby/RubyModule.java" line 3166 in const_missing
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/aws-sdk-core/xml/parser/stack.rb" line 48 in child_frame
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/aws-sdk-core/xml/parser/stack.rb" line 17 in start_element
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/aws-sdk-core/xml/parser/engines/oga.rb" line 21 in on_element
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/sax_parser.rb" line 170 in run_callback
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/sax_parser.rb" line 94 in on_element
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/parser.rb" line 507 in _rule_27
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/parser.rb" line 239 in block in each_token
File "org/jruby/RubyProc.java" line 308 in call
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/lexer.rb" line 237 in add_token
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/lexer.rb" line 398 in add_element
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/lexer.rb" line 369 in on_element_name
File "org/liboga/xml/Lexer.java" line 836 in advance_native
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/lexer.rb" line 190 in block in advance
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/lexer.rb" line 137 in read_data
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/lexer.rb" line 189 in advance
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/parser.rb" line 236 in each_token
File "org/libll/Driver.java" line 303 in parse
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/xml/parser.rb" line 269 in parse
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oga-1.3.1-java/lib/oga/oga.rb" line 35 in sax_parse_xml
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/aws-sdk-core/xml/parser/engines/oga.rb" line 14 in parse
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/aws-sdk-core/xml/parser.rb" line 29 in parse
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/aws-sdk-core/query/handler.rb" line 53 in parse_xml
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/aws-sdk-core/query/handler.rb" line 29 in block in call
File "org/jruby/RubyProc.java" line 308 in call
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/response.rb" line 43 in block in on
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/http/response.rb" line 139 in block in on_success
File "org/jruby/RubyProc.java" line 308 in call
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/http/response.rb" line 166 in block in listener
File "org/jruby/RubyProc.java" line 308 in call
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/http/response.rb" line 130 in on_done
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/http/response.rb" line 137 in on_success
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/response.rb" line 42 in on
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/response.rb" line 51 in on_success
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/aws-sdk-core/query/handler.rb" line 27 in call
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/aws-sdk-core/plugins/user_agent.rb" line 12 in call
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/plugins/endpoint.rb" line 34 in call
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/aws-sdk-core/plugins/param_validator.rb" line 21 in call
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/plugins/raise_response_errors.rb" line 14 in call
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/aws-sdk-core/plugins/param_converter.rb" line 21 in call
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/plugins/response_target.rb" line 21 in call
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/request.rb" line 70 in send_request
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/aws-sdk-core-2.1.19/lib/seahorse/client/base.rb" line 207 in block in get_queue_url
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oni-4.0.0/lib/oni/daemons/sqs.rb" line 63 in queue_url
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oni-4.0.0/lib/oni/daemons/sqs.rb" line 55 in queue
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oni-4.0.0/lib/oni/daemons/sqs.rb" line 34 in receive
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oni-4.0.0/lib/oni/daemon.rb" line 174 in run_thread
File "/usr/local/rvm/gems/jruby-9.0.1.0/gems/oni-4.0.0/lib/oni/daemon.rb" line 158 in block in spawn_thread

The file in question: https://github.com/aws/aws-sdk-ruby/blob/9eeeb192565e4ed97a427e9e95d99b922274ea42/aws-sdk-core/lib/aws-sdk-core/xml/parser/stack.rb#L48

Autoload is defined here: https://github.com/aws/aws-sdk-ruby/blob/5929ddfb7266961f5acc8fb189685bc1178ba19f/aws-sdk-core/lib/aws-sdk-core/xml/parser.rb#L7-L8

The interesting thing here is that it seems the AWS code is trying to load the constant relative to the wrong root constant. That is, the Frame class is loaded from Aws::Xml::Parser::Stack while it's actually defined in Aws::Xml::Parser. Prior to updating this code to version 2 of the AWS SDK I had similar issues where the code was trying to load the class AWS::SQS::Queue but would end up using the top-level Queue class, resulting in an ArgumentError when the class was initialized (as AWS::SQS::Queue#initialize has an arity of 2).

Sadly Rollbar didn't actually track what application this error originated from (don't get me started). However, the exception occurs during bootup when it tries to connect to the AWS queue. Since all applications use the same bootup phase this should be possible to reproduce using any of them.

Thankfully the code is (for a change) open source. As such you should be able to reproduce this as following:

  1. You'll need to have an AWS account and set the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_REGION
  2. Run the script setup.rb as noted below
  3. gem install opener-language-identifier, you'll need to have libarchive installed if I remember correctly
  4. run language-identifier-daemon, terminate it and repeat this process until the error occurs (this is basically where you'll need a lot of luck)

setup.rb:

# gem install aws-sdk
require 'aws-sdk'

sqs = Aws::SQS::Client.new

response = sqs.create_queue(:queue_name => 'opener-language-identifier')

puts "Created queue with URL #{response.queue_url}"

Alternatively, if you want to also test the runtime for any autoloading problems, use the following setup script instead:

# gem install aws-sdk
require 'aws-sdk'
require 'json'

sqs = Aws::SQS::Client.new

sqs.create_queue(:queue_name => 'opener-output')

response = sqs.create_queue(:queue_name => 'opener-language-identifier')

puts "Created queue with URL #{response.queue_url}"

payload = JSON.dump(
  :identifier => '123',
  :input_url  => 'https://s3-eu-west-1.amazonaws.com/storage.olery.com/olery-jruby-autoloading-review.txt',
  :callbacks  => ['sqs://opener-output']
)

1000.times do
  sqs.send_message(:queue_url => response.queue_url, :message_body => review)
end

Then, instead of terminating the language-identifier-daemon command right away just let it run. If you're on a laptop you might want to take it off your lap, because it's probably going to get very hot.

@YorickPeterse

This comment has been minimized.

Show comment
Hide comment
@YorickPeterse

YorickPeterse Sep 10, 2015

Output of ruby --version:

jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 OpenJDK 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [linux-amd64]. 

This is on Amazon Linux (basically CentOS) running on EC2. We're using Java 1.8, though invoke dynamic was not (explicitly) enabled.

Output of ruby --version:

jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 OpenJDK 64-Bit Server VM 25.51-b03 on 1.8.0_51-b16 +jit [linux-amd64]. 

This is on Amazon Linux (basically CentOS) running on EC2. We're using Java 1.8, though invoke dynamic was not (explicitly) enabled.

@y8

This comment has been minimized.

Show comment
Hide comment
@y8

y8 Sep 15, 2015

Same happens with rails master (rails 5) on the jruby 9.0.1.0 (jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 Java HotSpot(TM) 64-Bit Server VM 24.65-b04 on 1.7.0_67-b01 +jit [darwin-x86_64]).

You can reproduce this easily:

  1. git clone git://github.com/rails/rails.git
  2. cd rails
  3. bundle exec railties/exe/rails new ../rails-funny-race --edge --skip-test --skip-spring --api -M -O
  4. cd ../rails-funny-race
  5. bundle exec rails server
  6. Open the localhost:3000
  7. You will get ERROR LoadError: load error: action_controller/base -- java.lang.StackOverflowError: null in the log
  8. Refresh again, and you will get LoadError (load error: action_controller/api -- java.lang.StackOverflowError: null) error page with a stacktrace
  9. Refresh again, and you will see Welcome aboard page

It will stuck in the infinite loop somewhere around this method:
https://github.com/rails/rails/blob/master/actionpack/lib/action_controller/metal.rb#L215

I've wrapped this method with lock, like this

    class_attribute :inheritance_mutex
    self.inheritance_mutex = Mutex.new

    def self.inherited(base) # :nodoc:
      inheritance_mutex.synchronize do
        base.middleware_stack = middleware_stack.dup
        super
      end
    end

And that fixed the issue.

y8 commented Sep 15, 2015

Same happens with rails master (rails 5) on the jruby 9.0.1.0 (jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 Java HotSpot(TM) 64-Bit Server VM 24.65-b04 on 1.7.0_67-b01 +jit [darwin-x86_64]).

You can reproduce this easily:

  1. git clone git://github.com/rails/rails.git
  2. cd rails
  3. bundle exec railties/exe/rails new ../rails-funny-race --edge --skip-test --skip-spring --api -M -O
  4. cd ../rails-funny-race
  5. bundle exec rails server
  6. Open the localhost:3000
  7. You will get ERROR LoadError: load error: action_controller/base -- java.lang.StackOverflowError: null in the log
  8. Refresh again, and you will get LoadError (load error: action_controller/api -- java.lang.StackOverflowError: null) error page with a stacktrace
  9. Refresh again, and you will see Welcome aboard page

It will stuck in the infinite loop somewhere around this method:
https://github.com/rails/rails/blob/master/actionpack/lib/action_controller/metal.rb#L215

I've wrapped this method with lock, like this

    class_attribute :inheritance_mutex
    self.inheritance_mutex = Mutex.new

    def self.inherited(base) # :nodoc:
      inheritance_mutex.synchronize do
        base.middleware_stack = middleware_stack.dup
        super
      end
    end

And that fixed the issue.

@kares kares added this to the JRuby 9.1.0.0 milestone Feb 18, 2016

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 28, 2016

Member

Rails 5 edge is able to boot and run on JRuby now, so if the Rails 5 case we indicative this appears to be working fine in 9.1. If someone can reproduce this again we'd be all over investigating it again.

Member

headius commented Apr 28, 2016

Rails 5 edge is able to boot and run on JRuby now, so if the Rails 5 case we indicative this appears to be working fine in 9.1. If someone can reproduce this again we'd be all over investigating it again.

@headius headius closed this Apr 28, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment