Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrent loading does not resolve loaded constant #4091

Closed
kares opened this issue Aug 19, 2016 · 10 comments
Closed

concurrent loading does not resolve loaded constant #4091

kares opened this issue Aug 19, 2016 · 10 comments

Comments

@kares
Copy link
Member

@kares kares commented Aug 19, 2016

seems to be a regression compared to 1.7 or even 9.0.5

a bit tricky - so far I have been only able to do a reproduction case that 'sometimes' fails :

https://gist.github.com/kares/596f5a3648249181373dca2242332613

#<JRuby::Sample:0x1a09f207>
ERROR: #<NameError: uninitialized constant JRuby::Sample>
  org/jruby/RubyModule.java:3309:in `const_missing'
  repro/_run.rb:8:in `block in repro/_run.rb'
#<JRuby::Sample:0x3bf68c46>

HINT: loading seems to always fail from the same thread as the one first hitting the require.

@enebo enebo added this to the JRuby 9.1.3.0 milestone Aug 19, 2016
@headius
Copy link
Member

@headius headius commented Aug 19, 2016

It looks to me like there's a bug in how we lock down concurrent requires. I suspect the failing thread is passing over the require call before it is complete, proceeding to access constants that are not there.

I'll be poking at this today.

@kares
Copy link
Member Author

@kares kares commented Aug 19, 2016

that's what I thought but the thread doing the require than fails resolving the constant.

@headius
Copy link
Member

@headius headius commented Aug 19, 2016

Are you sure? Here's output from my run, logging each thread before it attempts to read or write Sample constant:

[] ~/projects/jruby $ jruby -X-C -I. blah.rb
main is #<Thread:0x536aaa8d>
#<Thread:0x138ef471> accessing Sample
#<Thread:0x267a6717> accessing Sample
#<Thread:0x138ef471> error#<Thread:0x267a6717> error
#<Thread:0x1cc58fd0> about to define Sample

#<Thread:0x1cc58fd0> accessing Sample
#<JRuby::Sample:0x18fa0890>

This shows three non-main threads attempting to load that file and access that constant. Notice that "71" and "17" say they're going to access Sample before "d0" says it's about to define it, and the two formers both error out because it's not there yet. "d0" successfully accesses sample after defining it.

The only way we'd see "71" and "17" proceed to access without trying to define is if require either told them the file was already required (concurrency problem with loaded features?) or if it allowed them to proceed while the require was still in progress. I suspect the latter, since loaded features shouldn't be updated until the load/require has completed.

@headius
Copy link
Member

@headius headius commented Aug 19, 2016

On a hunch, I ran with -w. The results are very interesting:

[] ~/projects/jruby $ jruby -w -X-C -I. blah.rb
main is #<Thread:0x536aaa8d>
/Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55: warning: loading in progress, circular require considered harmful - /Users/headius/projects/jruby/sample.rb
/Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55: warning: loading in progress, circular require considered harmful - /Users/headius/projects/jruby/sample.rb
/Users/headius/projects/jruby/sample.rb:4: warning: Useless use of + in void context.
          require at org/jruby/RubyKernel.java:955
          require at /Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55
  block in (root) at blah.rb:6
#<Thread:0x38f2831c> accessing Sample
          require at org/jruby/RubyKernel.java:955
          require at /Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55
  block in (root) at blah.rb:6
#<Thread:0x43c0b3f1> about to define Sample#<Thread:0x1199c7ba> accessing Sample

#<Thread:0x1199c7ba> error
#<Thread:0x38f2831c> error
#<Thread:0x43c0b3f1> accessing Sample
#<JRuby::Sample:0x47ff2f51>

It seems like the locking logic for require is treating concurrent requires as circular rather than conflicting, and sending the calling thread on its way with nothing but a stern warning.

@headius
Copy link
Member

@headius headius commented Aug 19, 2016

And a bit more logging around the require lock confirms it:

Thread[Ruby-0-Thread-1: blah.rb:4,5,main] locking /Users/headius/projects/jruby/sample.rb for require
Thread[Ruby-0-Thread-2: blah.rb:4,5,main] locking /Users/headius/projects/jruby/sample.rb for require
Thread[Ruby-0-Thread-2: blah.rb:4,5,main] could not acquire lock for /Users/headius/projects/jruby/sample.rb
Thread[Ruby-0-Thread-3: blah.rb:4,5,main] locking /Users/headius/projects/jruby/sample.rb for require
Thread[Ruby-0-Thread-3: blah.rb:4,5,main] could not acquire lock for /Users/headius/projects/jruby/sample.rb

Here we see thread 1 attempt to acquire the lock, and then 2 and 3 follow it. Both 2 and 3 then follow the path that leads to circular warning (requireLocks.lock returned false rather than blocking).

@headius
Copy link
Member

@headius headius commented Aug 19, 2016

I should have examined this commit more closely: 455bcf0

This is the culprit, and potentially the cause of many other require-time threading bugs we've seen.

This lock can't be a tryLock because we use a false return value here to indicate circular requires. It may have prevented some deadlocks, but only by allowing everyone to ignore the require lock altogether.

We obviously need better tests for this.

@headius
Copy link
Member

@headius headius commented Aug 19, 2016

Ref #3341 from the bad commit.

headius added a commit that referenced this issue Aug 19, 2016
This reverts commit 455bcf0.

See #4091 for long explanation, but basically this lock *must*
block, because a false return from this method means the lock had
already been acquired.
@headius
Copy link
Member

@headius headius commented Aug 19, 2016

This should be fixed now by reverting 455bcf0 in 93707ca.

NEEDS TESTS.

@headius headius added the needs tests label Aug 19, 2016
headius added a commit that referenced this issue Aug 19, 2016
@headius
Copy link
Member

@headius headius commented Aug 19, 2016

This also appears to be the cause of other circular require warnings, like those seen in test:jruby.

kares added a commit that referenced this issue Aug 20, 2016
@kares
Copy link
Member Author

@kares kares commented Aug 20, 2016

eh, sorry for the confusion I did not use puts as it affected my reproducibility but a concurrent queue of messages printed at the end - lead me wrong. I can confirm its doing good now, have added a test-case. thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.