New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using explicit return with external reference from define_method is 66x slower on 9k #3715

Closed
jsvd opened this Issue Mar 7, 2016 · 12 comments

Comments

Projects
None yet
4 participants
@jsvd

jsvd commented Mar 7, 2016

Environment

Darwin Joaos-MBP.lan 15.3.0 Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64 x86_64

java version "1.8.0_20"
Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)

Testing the following code:

require 'benchmark/ips'

class Slow
  m = :meh?
  define_method(m) do
    return m
  end
end

class Fast
  m = :meh?
  define_method(m) do
    m
  end
end

f = Fast.new
s = Slow.new

Benchmark.ips do |x|

  x.config(:time => 6, :warmup => 3)

  x.report("slow") { s.meh? }

  x.report("fast") { f.meh? }

  x.compare!
end

Expected Behavior

I expect the slowness of the Slow class to be consistent across ruby implementations

Actual Behavior

% rvm ruby-2.2.1,jruby-1.7.23,jruby-9.0.5.0 do ruby def_method.rb
# ruby 2.2.1
Warming up --------------------------------------
                slow   116.300k i/100ms
                fast   122.202k i/100ms
Calculating -------------------------------------
                slow      4.492M (± 7.4%) i/s -     26.865M
                fast      5.591M (± 8.8%) i/s -     33.239M

Comparison:
                fast:  5590653.4 i/s
                slow:  4492099.6 i/s - 1.24x slower

# jruby 1.7.23
Warming up --------------------------------------
                slow   154.333k i/100ms
                fast   189.819k i/100ms
Calculating -------------------------------------
                slow      2.793M (± 4.9%) i/s -     16.822M
                fast      6.102M (± 6.2%) i/s -     36.445M

Comparison:
                fast:  6101768.0 i/s
                slow:  2792631.3 i/s - 2.18x slower

# jruby 9.0.5.0
Warming up --------------------------------------
                slow     9.248k i/100ms
                fast   144.102k i/100ms
Calculating -------------------------------------
                slow    114.075k (± 6.7%) i/s -    684.352k
                fast      7.562M (± 8.3%) i/s -     44.816M

Comparison:
                fast:  7561944.8 i/s
                slow:   114074.7 i/s - 66.29x slower

@jsvd jsvd changed the title from using explicit return with external reference from define_method is 66x slower to using explicit return with external reference from define_method is 66x slower on 9k Mar 7, 2016

@jsvd

This comment has been minimized.

jsvd commented Mar 7, 2016

Updated numbers with :warmup => 10, :time => 60:

% rvm ruby-2.2.1,jruby-1.7.23,jruby-9.0.5.0 do ruby def_method.rb
Warming up --------------------------------------
                slow   117.598k i/100ms
                fast   122.533k i/100ms
Calculating -------------------------------------
                slow      4.503M (± 6.0%) i/s -    269.182M
                fast      5.509M (± 6.6%) i/s -    329.001M

Comparison:
                fast:  5509111.5 i/s
                slow:  4502861.3 i/s - 1.22x slower

Warming up --------------------------------------
                slow   187.674k i/100ms
                fast   200.813k i/100ms
Calculating -------------------------------------
                slow      2.913M (± 4.0%) i/s -    174.537M
                fast      6.401M (± 5.4%) i/s -    382.750M

Comparison:
                fast:  6401442.9 i/s
                slow:  2912786.3 i/s - 2.20x slower

Warming up --------------------------------------
                slow    10.374k i/100ms
                fast   152.612k i/100ms
Calculating -------------------------------------
                slow    118.201k (± 4.7%) i/s -      7.075M
                fast      8.066M (± 6.5%) i/s -    481.033M

Comparison:
                fast:  8065720.3 i/s
                slow:   118201.0 i/s - 68.24x slower
@jsvd

This comment has been minimized.

jsvd commented Mar 7, 2016

Still happens on jruby master (102a088)

@enebo

This comment has been minimized.

Member

enebo commented Mar 7, 2016

I have a pretty good idea on what the issue is. A slightly less good idea on how it should be fixed. The highest level description of the problem is that we cannot know at the time we generate the IR for this block whether it is a generic block or whether it is a method definition. In a generic block, return has extra potential behavior like in this snippet:

def foo
   [1].each { |e| retiurn e }
end

As a result of this scenario we generate extra IR instructions to check for this case. That extra checking is what is killing performance. Here is the IR for the two versions:

Slow

2016-03-07T13:52:45.765-06:00: JVMVisitor: Printing JIT IR for Slow_CLOSURE_1
begin CLOSURE<Slow_CLOSURE_1>
signature(pre=0,opt=0,post=0,rest=NONE,kwargs=0,kwreq=0,kwrest=false)
used variables
  m(1:0)

basic_block #1: CL1_LBL_1:-1
  00: %cl_1_6 := copy(nil<>)
  01: %cl_1_4 := copy(%cl_1_6)
  02: %cl_1_3 := copy(%cl_1_6)
  03:  %t_m_5 := copy(%cl_1_6)
  04: %cl_1_7 := copy(%cl_1_6)
  05: %cl_1_6 := save_binding_viz
  06: %cl_1_7 := push_block_frame(frameName: Slow_CLOSURE_1)
  07:            push_block_binding
  08:            update_block_state(self<%self>)
  09:  %t_m_5 := binding_load(*m, scope: CLOSURE Slow_CLOSURE_1[../snippets/perf6.rb:4])
  10:            prepare_no_block_args

basic_block #2: CL1_LBL_2:-1

basic_block #3: _CLOSURE_START_0:6
  00:            line_num(lineNumber: 5)
  01:            check_for_lje(maybeLambda: true)
  02:            nonlocal_return(%t_m_5, methodName: --none--)

basic_block #4: _GLOBAL_ENSURE_BLOCK__0:11
  00: %cl_1_3 := recv_jruby_exc
  01:            pop_binding
  02:            restore_binding_viz(%cl_1_6)
  03:            pop_block_frame(%cl_1_7)
  04: %cl_1_4 := runtime_helper(%cl_1_3, helperMethod: HANDLE_BREAK_AND_RETURNS_IN_LAMBDA)
  05:            return_or_rethrow_saved_exc(%cl_1_4)

basic_block #6: CL1_LBL_3:-1

fast:

2016-03-07T13:52:45.771-06:00: JVMVisitor: Printing JIT IR for Fast_CLOSURE_1
begin CLOSURE<Fast_CLOSURE_1>
signature(pre=0,opt=0,post=0,rest=NONE,kwargs=0,kwreq=0,kwrest=false)
used variables
  m(1:0)

basic_block #1: CL1_LBL_1:-1
  0: %cl_1_6 := copy(nil<>)
  1:  %t_m_5 := copy(%cl_1_6)
  2: %cl_1_7 := copy(%cl_1_6)
  3: %cl_1_6 := save_binding_viz
  4: %cl_1_7 := push_block_frame(frameName: Fast_CLOSURE_1)
  5:            push_block_binding
  6:            update_block_state(self<%self>)
  7:  %t_m_5 := binding_load(*m, scope: CLOSURE Fast_CLOSURE_1[../snippets/perf6.rb:11])
  8:            prepare_no_block_args

basic_block #2: CL1_LBL_2:-1
  0:            line_num(lineNumber: 12)
  1:            pop_binding
  2:            restore_binding_viz(%cl_1_6)
  3:            pop_block_frame(%cl_1_7)
  4:            return(%t_m_5)

basic_block #6: CL1_LBL_3:-1

As for solutions...When there is no capture at all we completely rewrite the block as if it was a method. This is crazy fast but we did not do the extra work to make this work with read-only captures. If we could then this particular case would also be super fast. We have no infrastructure to detect read-only captures so I am not sure how much work it would be to solve it this way.

A second possible solution would be to rebuild the source since IRClosure contains the original source for the block. We could assume the return in this case will only fall out of the method definition and it would be as fast as the 'fast' version.

I added a 'super fast' to represent our potential if we put in the extra lambda hoisting logic we need for the former solution vs minimal rewriting in the latter solution:

require 'benchmark/ips'

class Slow
  m = :meh?
  define_method(m) do
    return m
  end
end

class SuperFast
  define_method(:meh?) do
    :meh?
  end
end

class Fast
  m = :meh?
  define_method(m) do
    m
  end
end

sf = SuperFast.new
f = Fast.new
s = Slow.new

Benchmark.ips do |x|

  x.config(:time => 6, :warmup => 3)

  x.report("super fast") { sf.meh? }

  x.report("fast") { f.meh? }

  x.report("slow") { s.meh? }

  x.compare!
end
jruby ../snippets/perf6.rb 
Warming up --------------------------------------
          super fast    89.771k i/100ms
                fast    98.728k i/100ms
                slow     9.461k i/100ms
Calculating -------------------------------------
          super fast      9.556M (±10.7%) i/s -     56.286M
                fast      6.608M (±11.1%) i/s -     38.899M
                slow    119.572k (± 3.7%) i/s -    719.036k

Comparison:
          super fast:  9556172.2 i/s
                fast:  6608150.1 i/s - 1.45x slower
                slow:   119571.5 i/s - 79.92x slower

jsvd added a commit to jsvd/ruby-cabin that referenced this issue Mar 8, 2016

avoid explicit returns in define_method
in jruby 9k using explicit return in a define_method block makes the
code execution extremely slow due to its inability to optimize the code

see more: jruby/jruby#3715
@enebo

This comment has been minimized.

Member

enebo commented Mar 8, 2016

@headius actually bothered to profile our existing logic and found a gaping regression in 2066af4. This ups performance to nearly non-return:

jruby -d -Xir.debug=rue ../snippets/perf6.rb
Warming up --------------------------------------
          super fast    89.990k i/100ms
                fast    99.361k i/100ms
                slow    92.612k i/100ms
Calculating -------------------------------------
          super fast      9.302M (±10.8%) i/s -     54.894M
                fast      6.218M (±11.8%) i/s -     36.565M
                slow      5.363M (± 9.7%) i/s -     31.766M

Comparison:
          super fast:  9301788.5 i/s
                fast:  6218085.8 i/s - 1.50x slower
                slow:  5363297.7 i/s - 1.73x slower

I am also working on rebuilding the block in the presence of define_method so we can strip out unneeded instrs. Once that is done we should see slow match fast.

@colinsurprenant

This comment has been minimized.

colinsurprenant commented Mar 8, 2016

Great stuff! thanks for explanations & details, extremely useful learning!

@jsvd

This comment has been minimized.

jsvd commented Mar 10, 2016

Awesome feedback @enebo, thanks! Since the commit brings jruby back to previous performance levels, I'll close this issue.

@jsvd jsvd closed this Mar 10, 2016

@headius

This comment has been minimized.

Member

headius commented Mar 10, 2016

We'll leave this open until @enebo is happy with his define_method + return improvements.

@enebo

This comment has been minimized.

Member

enebo commented Mar 11, 2016

Yeah I want this on the radar since I believe I can have this particular test case scream

@enebo enebo modified the milestones: JRuby 9.1.1.0, JRuby 9.1.0.0 Apr 13, 2016

@enebo

This comment has been minimized.

Member

enebo commented Apr 13, 2016

Deferring this one. Most of the perf issue has been fixed but my attempts at rebuilding closures on the fly made me realize we need to internally refactor some aspects of IRs lifecycle again. Nearly everything I tried tripped over something we assumed we would not do.

@headius headius modified the milestones: JRuby 9.1.2.0, JRuby 9.1.1.0 May 11, 2016

@enebo enebo modified the milestones: JRuby 9.1.2.0, JRuby 9.1.3.0 May 23, 2016

@headius

This comment has been minimized.

Member

headius commented Aug 15, 2016

@enebo Up to you if you want to try to do more on this for 9.1.3.0. If not, punt to 9.1.4.0 or 9.2.

@enebo enebo added this to the JRuby 9.1.4.0 milestone Aug 24, 2016

@enebo enebo removed this from the JRuby 9.1.3.0 milestone Aug 24, 2016

@enebo

This comment has been minimized.

Member

enebo commented Aug 24, 2016

I was hoping to land the inlining work which is also grappling with lifecycle issues. I did not get it done though. Once it is done then converting this live will be trivial. Kicking down the road a bit longer.

@headius

This comment has been minimized.

Member

headius commented May 15, 2018

The original reported issue no longer seems to be a problem. The "slow" version is still slower than the "fast" version, but less than 2x.

Warming up --------------------------------------
          super fast   195.371k i/100ms
                fast   193.401k i/100ms
                slow   181.930k i/100ms
Calculating -------------------------------------
          super fast     17.181M (± 6.8%) i/s -    102.374M in   5.999028s
                fast      9.927M (± 7.6%) i/s -     59.181M in   6.009142s
                slow      7.210M (± 8.1%) i/s -     42.935M in   6.006843s

Comparison:
          super fast: 17181472.4 i/s
                fast:  9926705.7 i/s - 1.73x  slower
                slow:  7209949.9 i/s - 2.38x  slower

@enebo You can reopen this if you want, but perhaps open a different bug if you still want to have something open for the remaining perf difference.

@headius headius closed this May 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment