concurrent loading does not resolve loaded constant #4091

Closed
kares opened this Issue Aug 19, 2016 · 10 comments

Comments

Projects
None yet
3 participants
@kares
Member

kares commented Aug 19, 2016

seems to be a regression compared to 1.7 or even 9.0.5

a bit tricky - so far I have been only able to do a reproduction case that 'sometimes' fails :

https://gist.github.com/kares/596f5a3648249181373dca2242332613

#<JRuby::Sample:0x1a09f207>
ERROR: #<NameError: uninitialized constant JRuby::Sample>
  org/jruby/RubyModule.java:3309:in `const_missing'
  repro/_run.rb:8:in `block in repro/_run.rb'
#<JRuby::Sample:0x3bf68c46>

HINT: loading seems to always fail from the same thread as the one first hitting the require.

@enebo enebo added this to the JRuby 9.1.3.0 milestone Aug 19, 2016

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 19, 2016

Member

It looks to me like there's a bug in how we lock down concurrent requires. I suspect the failing thread is passing over the require call before it is complete, proceeding to access constants that are not there.

I'll be poking at this today.

Member

headius commented Aug 19, 2016

It looks to me like there's a bug in how we lock down concurrent requires. I suspect the failing thread is passing over the require call before it is complete, proceeding to access constants that are not there.

I'll be poking at this today.

@kares

This comment has been minimized.

Show comment
Hide comment
@kares

kares Aug 19, 2016

Member

that's what I thought but the thread doing the require than fails resolving the constant.

Member

kares commented Aug 19, 2016

that's what I thought but the thread doing the require than fails resolving the constant.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 19, 2016

Member

Are you sure? Here's output from my run, logging each thread before it attempts to read or write Sample constant:

[] ~/projects/jruby $ jruby -X-C -I. blah.rb
main is #<Thread:0x536aaa8d>
#<Thread:0x138ef471> accessing Sample
#<Thread:0x267a6717> accessing Sample
#<Thread:0x138ef471> error#<Thread:0x267a6717> error
#<Thread:0x1cc58fd0> about to define Sample

#<Thread:0x1cc58fd0> accessing Sample
#<JRuby::Sample:0x18fa0890>

This shows three non-main threads attempting to load that file and access that constant. Notice that "71" and "17" say they're going to access Sample before "d0" says it's about to define it, and the two formers both error out because it's not there yet. "d0" successfully accesses sample after defining it.

The only way we'd see "71" and "17" proceed to access without trying to define is if require either told them the file was already required (concurrency problem with loaded features?) or if it allowed them to proceed while the require was still in progress. I suspect the latter, since loaded features shouldn't be updated until the load/require has completed.

Member

headius commented Aug 19, 2016

Are you sure? Here's output from my run, logging each thread before it attempts to read or write Sample constant:

[] ~/projects/jruby $ jruby -X-C -I. blah.rb
main is #<Thread:0x536aaa8d>
#<Thread:0x138ef471> accessing Sample
#<Thread:0x267a6717> accessing Sample
#<Thread:0x138ef471> error#<Thread:0x267a6717> error
#<Thread:0x1cc58fd0> about to define Sample

#<Thread:0x1cc58fd0> accessing Sample
#<JRuby::Sample:0x18fa0890>

This shows three non-main threads attempting to load that file and access that constant. Notice that "71" and "17" say they're going to access Sample before "d0" says it's about to define it, and the two formers both error out because it's not there yet. "d0" successfully accesses sample after defining it.

The only way we'd see "71" and "17" proceed to access without trying to define is if require either told them the file was already required (concurrency problem with loaded features?) or if it allowed them to proceed while the require was still in progress. I suspect the latter, since loaded features shouldn't be updated until the load/require has completed.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 19, 2016

Member

On a hunch, I ran with -w. The results are very interesting:

[] ~/projects/jruby $ jruby -w -X-C -I. blah.rb
main is #<Thread:0x536aaa8d>
/Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55: warning: loading in progress, circular require considered harmful - /Users/headius/projects/jruby/sample.rb
/Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55: warning: loading in progress, circular require considered harmful - /Users/headius/projects/jruby/sample.rb
/Users/headius/projects/jruby/sample.rb:4: warning: Useless use of + in void context.
          require at org/jruby/RubyKernel.java:955
          require at /Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55
  block in (root) at blah.rb:6
#<Thread:0x38f2831c> accessing Sample
          require at org/jruby/RubyKernel.java:955
          require at /Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55
  block in (root) at blah.rb:6
#<Thread:0x43c0b3f1> about to define Sample#<Thread:0x1199c7ba> accessing Sample

#<Thread:0x1199c7ba> error
#<Thread:0x38f2831c> error
#<Thread:0x43c0b3f1> accessing Sample
#<JRuby::Sample:0x47ff2f51>

It seems like the locking logic for require is treating concurrent requires as circular rather than conflicting, and sending the calling thread on its way with nothing but a stern warning.

Member

headius commented Aug 19, 2016

On a hunch, I ran with -w. The results are very interesting:

[] ~/projects/jruby $ jruby -w -X-C -I. blah.rb
main is #<Thread:0x536aaa8d>
/Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55: warning: loading in progress, circular require considered harmful - /Users/headius/projects/jruby/sample.rb
/Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55: warning: loading in progress, circular require considered harmful - /Users/headius/projects/jruby/sample.rb
/Users/headius/projects/jruby/sample.rb:4: warning: Useless use of + in void context.
          require at org/jruby/RubyKernel.java:955
          require at /Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55
  block in (root) at blah.rb:6
#<Thread:0x38f2831c> accessing Sample
          require at org/jruby/RubyKernel.java:955
          require at /Users/headius/projects/jruby/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:55
  block in (root) at blah.rb:6
#<Thread:0x43c0b3f1> about to define Sample#<Thread:0x1199c7ba> accessing Sample

#<Thread:0x1199c7ba> error
#<Thread:0x38f2831c> error
#<Thread:0x43c0b3f1> accessing Sample
#<JRuby::Sample:0x47ff2f51>

It seems like the locking logic for require is treating concurrent requires as circular rather than conflicting, and sending the calling thread on its way with nothing but a stern warning.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 19, 2016

Member

And a bit more logging around the require lock confirms it:

Thread[Ruby-0-Thread-1: blah.rb:4,5,main] locking /Users/headius/projects/jruby/sample.rb for require
Thread[Ruby-0-Thread-2: blah.rb:4,5,main] locking /Users/headius/projects/jruby/sample.rb for require
Thread[Ruby-0-Thread-2: blah.rb:4,5,main] could not acquire lock for /Users/headius/projects/jruby/sample.rb
Thread[Ruby-0-Thread-3: blah.rb:4,5,main] locking /Users/headius/projects/jruby/sample.rb for require
Thread[Ruby-0-Thread-3: blah.rb:4,5,main] could not acquire lock for /Users/headius/projects/jruby/sample.rb

Here we see thread 1 attempt to acquire the lock, and then 2 and 3 follow it. Both 2 and 3 then follow the path that leads to circular warning (requireLocks.lock returned false rather than blocking).

Member

headius commented Aug 19, 2016

And a bit more logging around the require lock confirms it:

Thread[Ruby-0-Thread-1: blah.rb:4,5,main] locking /Users/headius/projects/jruby/sample.rb for require
Thread[Ruby-0-Thread-2: blah.rb:4,5,main] locking /Users/headius/projects/jruby/sample.rb for require
Thread[Ruby-0-Thread-2: blah.rb:4,5,main] could not acquire lock for /Users/headius/projects/jruby/sample.rb
Thread[Ruby-0-Thread-3: blah.rb:4,5,main] locking /Users/headius/projects/jruby/sample.rb for require
Thread[Ruby-0-Thread-3: blah.rb:4,5,main] could not acquire lock for /Users/headius/projects/jruby/sample.rb

Here we see thread 1 attempt to acquire the lock, and then 2 and 3 follow it. Both 2 and 3 then follow the path that leads to circular warning (requireLocks.lock returned false rather than blocking).

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 19, 2016

Member

I should have examined this commit more closely: 455bcf0

This is the culprit, and potentially the cause of many other require-time threading bugs we've seen.

This lock can't be a tryLock because we use a false return value here to indicate circular requires. It may have prevented some deadlocks, but only by allowing everyone to ignore the require lock altogether.

We obviously need better tests for this.

Member

headius commented Aug 19, 2016

I should have examined this commit more closely: 455bcf0

This is the culprit, and potentially the cause of many other require-time threading bugs we've seen.

This lock can't be a tryLock because we use a false return value here to indicate circular requires. It may have prevented some deadlocks, but only by allowing everyone to ignore the require lock altogether.

We obviously need better tests for this.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 19, 2016

Member

Ref #3341 from the bad commit.

Member

headius commented Aug 19, 2016

Ref #3341 from the bad commit.

headius added a commit that referenced this issue Aug 19, 2016

Revert "improve RequireLock to better match its semantic"
This reverts commit 455bcf0.

See #4091 for long explanation, but basically this lock *must*
block, because a false return from this method means the lock had
already been acquired.
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 19, 2016

Member

This should be fixed now by reverting 455bcf0 in 93707ca.

NEEDS TESTS.

Member

headius commented Aug 19, 2016

This should be fixed now by reverting 455bcf0 in 93707ca.

NEEDS TESTS.

@headius headius added the needs tests label Aug 19, 2016

headius added a commit that referenced this issue Aug 19, 2016

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Aug 19, 2016

Member

This also appears to be the cause of other circular require warnings, like those seen in test:jruby.

Member

headius commented Aug 19, 2016

This also appears to be the cause of other circular require warnings, like those seen in test:jruby.

kares added a commit that referenced this issue Aug 20, 2016

@kares

This comment has been minimized.

Show comment
Hide comment
@kares

kares Aug 20, 2016

Member

eh, sorry for the confusion I did not use puts as it affected my reproducibility but a concurrent queue of messages printed at the end - lead me wrong. I can confirm its doing good now, have added a test-case. thanks.

Member

kares commented Aug 20, 2016

eh, sorry for the confusion I did not use puts as it affected my reproducibility but a concurrent queue of messages printed at the end - lead me wrong. I can confirm its doing good now, have added a test-case. thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment