New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Somehow super stopped working in jruby 9.0.5.0 #3643

Closed
godfat opened this Issue Feb 4, 2016 · 17 comments

Comments

Projects
None yet
3 participants
@godfat
Contributor

godfat commented Feb 4, 2016

Sorry I don't have simple reproducible code for this either.
Here's the trace:
https://travis-ci.org/godfat/pork/jobs/107027673
The line was:
https://github.com/godfat/pork/blob/pork-1.4.4/lib/pork/test.rb#L16
You can see that in line 13 that super works, but not the one in line 16.
It used to work on jruby 9.0.4.0.
You could reproduce this by cloning the repo and run jruby -S rake test whenever dependencies were all installed.

@enebo enebo added this to the JRuby 9.1.0.0 milestone Feb 4, 2016

@enebo enebo added the ir label Feb 4, 2016

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 15, 2016

Member

Still failing on master today. Investigating.

Member

headius commented Apr 15, 2016

Still failing on master today. Investigating.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 15, 2016

Member

This appears to be a bug in the JIT. If I run your suite with only the interpreter, it passes.

Member

headius commented Apr 15, 2016

This appears to be a bug in the JIT. If I run your suite with only the interpreter, it passes.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 15, 2016

Member

Wow, this is really hard to narrow down. It seems to only fail when the JIT is enabled, but the code that fails does not appear to be jitting. I'm wondering if we're getting a bad stack trace here...

Member

headius commented Apr 15, 2016

Wow, this is really hard to narrow down. It seems to only fail when the JIT is enabled, but the code that fails does not appear to be jitting. I'm wondering if we're getting a bad stack trace here...

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 15, 2016

Member

Ok...tiptoeing forward.

The execute method that fails to super can actually be reduced down to just two super calls. The second will fail, indicating something happening during the run of the test suite is causing the superclass (the unprepended version of Pork) to lose its execute method.

  def execute
    super
    super
  end
Member

headius commented Apr 15, 2016

Ok...tiptoeing forward.

The execute method that fails to super can actually be reduced down to just two super calls. The second will fail, indicating something happening during the run of the test suite is causing the superclass (the unprepended version of Pork) to lose its execute method.

  def execute
    super
    super
  end
@godfat

This comment has been minimized.

Show comment
Hide comment
@godfat

godfat Apr 16, 2016

Contributor

Thank you for looking into this!

This also happens on master of pork. In additionally, I tried to remove test files to reduce the scope. I found that just leave test/test_bacon.rb and test/test_stat.rb could reproduce. The other files located in test/* could be removed. However, running either of them alone would have no problems. Both jruby -Ilib test/test_bacon.rb and jruby -Ilib test/test_stat.rb would be fine.

Wait a minute... I just tried to run jruby -Ilib -r ./test/test_bacon.rb test/test_stat.rb and it should be the same as jruby -S rake test except some test loader codes. However that works. Tried rake again, now it also works. Tried again, now they didn't.

Not sure why it's not always failing now. Randomness?

Now this always works: both env PORK_MODE=sequential jruby -S rake test and env PORK_MODE=sequential jruby -Ilib -r ./test/test_bacon.rb test/test_stat.rb.

Contributor

godfat commented Apr 16, 2016

Thank you for looking into this!

This also happens on master of pork. In additionally, I tried to remove test files to reduce the scope. I found that just leave test/test_bacon.rb and test/test_stat.rb could reproduce. The other files located in test/* could be removed. However, running either of them alone would have no problems. Both jruby -Ilib test/test_bacon.rb and jruby -Ilib test/test_stat.rb would be fine.

Wait a minute... I just tried to run jruby -Ilib -r ./test/test_bacon.rb test/test_stat.rb and it should be the same as jruby -S rake test except some test loader codes. However that works. Tried rake again, now it also works. Tried again, now they didn't.

Not sure why it's not always failing now. Randomness?

Now this always works: both env PORK_MODE=sequential jruby -S rake test and env PORK_MODE=sequential jruby -Ilib -r ./test/test_bacon.rb test/test_stat.rb.

@godfat

This comment has been minimized.

Show comment
Hide comment
@godfat

godfat Apr 16, 2016

Contributor

Since you mentioned that Pork loses its execute method, I tried to observe this by inserting some monitoring code:

  def execute
    super
    begin
      super
    rescue NoMethodError
      require 'rib/config'
      Rib.anchor binding
    end
  end

I tried Pork.singleton_class.ancestors[1].instance_method(:execute), and it seems there's something: #<UnboundMethod: Module(#<Module:0x5656be13>)#execute>.

Could it be call site cache broke or something?

Contributor

godfat commented Apr 16, 2016

Since you mentioned that Pork loses its execute method, I tried to observe this by inserting some monitoring code:

  def execute
    super
    begin
      super
    rescue NoMethodError
      require 'rib/config'
      Rib.anchor binding
    end
  end

I tried Pork.singleton_class.ancestors[1].instance_method(:execute), and it seems there's something: #<UnboundMethod: Module(#<Module:0x5656be13>)#execute>.

Could it be call site cache broke or something?

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 20, 2016

Member

I narrowed this failure down to test_stat and noticed the randomness also. I'm not sure where it comes from.

Member

headius commented Apr 20, 2016

I narrowed this failure down to test_stat and noticed the randomness also. I'm not sure where it comes from.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 22, 2016

Member

I still have no new information except to say that it definitely does seem timing-sensitive. I added some logging to test.rb and the problem went away :-\

Member

headius commented Apr 22, 2016

I still have no new information except to say that it definitely does seem timing-sensitive. I added some logging to test.rb and the problem went away :-\

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Apr 22, 2016

Member

I bisected this to 27e6a64. I find it pretty weird this particular change caused an issue. One other thing I did notice is even with this reverted every so often (like 1 in 10 times) I see this CLI fail:

env PORK_TEST='test/test_readme.rb:9' PORK_MODE=parallel PORK_SEED=26094625828361438140512518840338868882 /Users/enebo/work/jruby/bin/jruby -S /Users/enebo/work/jruby/bin/rake test

This will also fail even if in -X-C with a threshold of -1. Seems like a race. Whether it is related to the above problem or not I don't know.

Anyways, reverting this definitely makes a big positive difference but it seems pretty critical for us to understand why. I also tried disabling BlotJITTask and that had no effect. Weird...

Member

enebo commented Apr 22, 2016

I bisected this to 27e6a64. I find it pretty weird this particular change caused an issue. One other thing I did notice is even with this reverted every so often (like 1 in 10 times) I see this CLI fail:

env PORK_TEST='test/test_readme.rb:9' PORK_MODE=parallel PORK_SEED=26094625828361438140512518840338868882 /Users/enebo/work/jruby/bin/jruby -S /Users/enebo/work/jruby/bin/rake test

This will also fail even if in -X-C with a threshold of -1. Seems like a race. Whether it is related to the above problem or not I don't know.

Anyways, reverting this definitely makes a big positive difference but it seems pretty critical for us to understand why. I also tried disabling BlotJITTask and that had no effect. Weird...

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 25, 2016

Member

@enebo Yeah that commit seems really unlikely to cause this problem...I'm surprised.

Looking at this again today.

Member

headius commented Apr 25, 2016

@enebo Yeah that commit seems really unlikely to cause this problem...I'm surprised.

Looking at this again today.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 26, 2016

Member

Ok, after much fussing about, we've determined that the following method in pork is getting compiled incorrectly:

    def would desc=:default, opts={}, &test
      @tests << [:would   , desc    , test  || lambda{}, opts]
    end

Specifically, the lambda{} is missing logic to ensure its frame gets popped along exceptional paths. The final IR for this method is here:

2016-04-26T13:52:00.683-05:00: BasicCompilerPassListener: Starting Ensure Temporary Variables Assigned on scope CLOSURE would_CLOSURE_1[/Users/headius/projects/jruby/pork/lib/pork/imp.rb:27]
2016-04-26T13:52:00.683-05:00: BasicCompilerPassListener: 
Graph:
BB [1:CL1_LBL_1:-1]:>[2,6]
BB [2:CL1_LBL_2:-1]:>[6], <[1]
BB [6:CL1_LBL_3:-1]:<[1,2]

2016-04-26T13:52:00.683-05:00: BasicCompilerPassListener: 
Instructions[/Users/headius/projects/jruby/pork/lib/pork/imp.rb#27#,IRClosure,EnsureTempsAssigned]:
BB [1:CL1_LBL_1:-1]
    %cl_1_5 = copy(nil)
        %cl_1_6 = copy(%cl_1_5)
    %cl_1_5 = save_binding_viz()
    %cl_1_6 = push_block_frame()
    update_block_state(%self)
    prepare_no_block_args()
BB [2:CL1_LBL_2:-1]
    restore_binding_viz(%cl_1_5)
    pop_block_frame(%cl_1_6)
    return(nil)
BB [6:CL1_LBL_3:-1]


------ Rescue block map ------

:Instructions
2016-04-26T13:52:00.683-05:00: BasicCompilerPassListener: Finished Ensure Temporary Variables Assigned on scope in 0ms.

Note that prepare_no_block_args can raise an exception, but there's no handler for that exception to ensure the frame gets popped.

Seems to a problem constructing and wiring up the global ensure block properly; it's disappearing.

Member

headius commented Apr 26, 2016

Ok, after much fussing about, we've determined that the following method in pork is getting compiled incorrectly:

    def would desc=:default, opts={}, &test
      @tests << [:would   , desc    , test  || lambda{}, opts]
    end

Specifically, the lambda{} is missing logic to ensure its frame gets popped along exceptional paths. The final IR for this method is here:

2016-04-26T13:52:00.683-05:00: BasicCompilerPassListener: Starting Ensure Temporary Variables Assigned on scope CLOSURE would_CLOSURE_1[/Users/headius/projects/jruby/pork/lib/pork/imp.rb:27]
2016-04-26T13:52:00.683-05:00: BasicCompilerPassListener: 
Graph:
BB [1:CL1_LBL_1:-1]:>[2,6]
BB [2:CL1_LBL_2:-1]:>[6], <[1]
BB [6:CL1_LBL_3:-1]:<[1,2]

2016-04-26T13:52:00.683-05:00: BasicCompilerPassListener: 
Instructions[/Users/headius/projects/jruby/pork/lib/pork/imp.rb#27#,IRClosure,EnsureTempsAssigned]:
BB [1:CL1_LBL_1:-1]
    %cl_1_5 = copy(nil)
        %cl_1_6 = copy(%cl_1_5)
    %cl_1_5 = save_binding_viz()
    %cl_1_6 = push_block_frame()
    update_block_state(%self)
    prepare_no_block_args()
BB [2:CL1_LBL_2:-1]
    restore_binding_viz(%cl_1_5)
    pop_block_frame(%cl_1_6)
    return(nil)
BB [6:CL1_LBL_3:-1]


------ Rescue block map ------

:Instructions
2016-04-26T13:52:00.683-05:00: BasicCompilerPassListener: Finished Ensure Temporary Variables Assigned on scope in 0ms.

Note that prepare_no_block_args can raise an exception, but there's no handler for that exception to ensure the frame gets popped.

Seems to a problem constructing and wiring up the global ensure block properly; it's disappearing.

@godfat

This comment has been minimized.

Show comment
Hide comment
@godfat

godfat Apr 26, 2016

Contributor

Please let me know if there's anything I could do, or I've written some bad Ruby code that should be avoided.

I tried to change lambda{} to lambda{a=1} but it would still happen. Not sure if that would change anything though.

Contributor

godfat commented Apr 26, 2016

Please let me know if there's anything I could do, or I've written some bad Ruby code that should be avoided.

I tried to change lambda{} to lambda{a=1} but it would still happen. Not sure if that would change anything though.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 26, 2016

Member

@godfat You did nothing wrong, this was just buggy.

The trigger in your code was that the lambda{} you create for blockless-would calls is eventually used to instance_eval against the execution context. Your lambda takes no args and instance_eval passes one, so it would raise ArgumentError (confirmed in both JRuby and MRI) that you are presumably swallowing somewhere. When the exception was raised from the lambda, our buggy logic left a call frame on the stack. The result was that the second call to super for execute started supering from one class higher than it should have, missing the actual superclass's execute.

Two workarounds for you are to use proc instead of lambda, since it doesn't check arity, or make the lambda accept one (or rest) argument.

Fixes plus test coming shortly. Here's the smallest repro I have right now:

class A
  def x(proc)
    instance_eval(&proc) rescue nil
  end
end
A.prepend(Module.new { def x(proc); super; super; end })
def foo
  A.new.x(lambda{})
end
foo

edit: fixed code to rescue nil for the ArgumentError

Member

headius commented Apr 26, 2016

@godfat You did nothing wrong, this was just buggy.

The trigger in your code was that the lambda{} you create for blockless-would calls is eventually used to instance_eval against the execution context. Your lambda takes no args and instance_eval passes one, so it would raise ArgumentError (confirmed in both JRuby and MRI) that you are presumably swallowing somewhere. When the exception was raised from the lambda, our buggy logic left a call frame on the stack. The result was that the second call to super for execute started supering from one class higher than it should have, missing the actual superclass's execute.

Two workarounds for you are to use proc instead of lambda, since it doesn't check arity, or make the lambda accept one (or rest) argument.

Fixes plus test coming shortly. Here's the smallest repro I have right now:

class A
  def x(proc)
    instance_eval(&proc) rescue nil
  end
end
A.prepend(Module.new { def x(proc); super; super; end })
def foo
  A.new.x(lambda{})
end
foo

edit: fixed code to rescue nil for the ArgumentError

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 26, 2016

Member

Solution is found and fix is on the way. The bug caused bare lambda to not properly clean up frame stack when an exception was raised in argument processing. Argument processing was added to the entryBB, which never gets GEB (global ensure block) treatment. In addition, a culled GEB from earlier in processing was not properly cleared, so a new one was never created.

Member

headius commented Apr 26, 2016

Solution is found and fix is on the way. The bug caused bare lambda to not properly clean up frame stack when an exception was raised in argument processing. Argument processing was added to the entryBB, which never gets GEB (global ensure block) treatment. In addition, a culled GEB from earlier in processing was not properly cleared, so a new one was never created.

@godfat

This comment has been minimized.

Show comment
Hide comment
@godfat

godfat Apr 26, 2016

Contributor

Oh wow, impressive. Debugging is really hard!

And actually that's also a bug on my side. I didn't know instance_eval would pass an argument at all. I'll change it to lambda{|_|} then, assuming older rubies would also pass an argument. I would certainly try hard to avoid proc at all.

The only reason the test would still pass on other rubies is that I didn't check which exception was raised exactly from the test case, namely here: https://github.com/godfat/pork/blob/pork-1.4.4/test/test_stat.rb#L53-L56

It's only checking if there's a backtrace. I tried to add this line:

expect(err).kind_of?(Pork::Error)

Now it's failing. I'll commit the fix along with test tomorrow. The exception should be swallowed for reporting, but it should really be Pork::Error for missing assertions rather than ArgumentError at all. To think about it, it probably makes no sense to give a default lambda which would fail anyway.

Interestingly, if I did right in the first place, then it would not trigger this JRuby bug. Once again I think buggy/sloppy codes are really capable to trigger a VM/server/platform bug.

Thank you so much for the time!

Contributor

godfat commented Apr 26, 2016

Oh wow, impressive. Debugging is really hard!

And actually that's also a bug on my side. I didn't know instance_eval would pass an argument at all. I'll change it to lambda{|_|} then, assuming older rubies would also pass an argument. I would certainly try hard to avoid proc at all.

The only reason the test would still pass on other rubies is that I didn't check which exception was raised exactly from the test case, namely here: https://github.com/godfat/pork/blob/pork-1.4.4/test/test_stat.rb#L53-L56

It's only checking if there's a backtrace. I tried to add this line:

expect(err).kind_of?(Pork::Error)

Now it's failing. I'll commit the fix along with test tomorrow. The exception should be swallowed for reporting, but it should really be Pork::Error for missing assertions rather than ArgumentError at all. To think about it, it probably makes no sense to give a default lambda which would fail anyway.

Interestingly, if I did right in the first place, then it would not trigger this JRuby bug. Once again I think buggy/sloppy codes are really capable to trigger a VM/server/platform bug.

Thank you so much for the time!

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Apr 26, 2016

Member

@godfat I'm very glad we were able to find this! We had received other reports of sporadic super issues and they were all more difficult than this to sort out. By working through this one step by step we may have fixed multiple bug reports.

Member

headius commented Apr 26, 2016

@godfat I'm very glad we were able to find this! We had received other reports of sporadic super issues and they were all more difficult than this to sort out. By working through this one step by step we may have fixed multiple bug reports.

@headius headius closed this in 6682704 Apr 26, 2016

@godfat

This comment has been minimized.

Show comment
Hide comment
@godfat

godfat Apr 27, 2016

Contributor

Awesome! :D

Contributor

godfat commented Apr 27, 2016

Awesome! :D

godfat added a commit to godfat/pork that referenced this issue Apr 27, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment