Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized __method__ and __callee__ #8172

Merged
merged 7 commits into from Mar 29, 2024

Conversation

headius
Copy link
Member

@headius headius commented Mar 28, 2024

This PR optimizes the __method__ and __callee__ methods by doing the following:

  • Introduce the FrameNameInstr instruction that performs a lightweight, frameless operation when the target __method__ or __callee__ is the built-in version (similar to block_given? in Implement block_given? call as optimized instruction #8170).
  • Introduce a new mapping from AliasMethod compound names to the pair of symbols they would produce for __method__ and __callee__, avoiding extra allocation of strings while parsing that compound string.

As with #8170, if either of the target methods have been replaced by a user, we fall back to a normal invocation. If either method is called via metaprogramming paths, they will force a frame and use it as before.

A benchmark is included, and shows that both forms are now much faster (due to them no longer needing the caller's frame), and neither slow down when being used inside an aliased call:

[] jruby $ (chruby jruby-9.4.5.0 ; jruby bench/bench_frame_name_methods.rb)
Warming up --------------------------------------
     __method__ same   522.657k i/100ms
__method__ different   252.464k i/100ms
     __callee__ same   523.748k i/100ms
__callee__ different   264.330k i/100ms
Calculating -------------------------------------
     __method__ same      5.213M (± 1.8%) i/s -     26.133M in   5.014736s
__method__ different      2.565M (± 4.0%) i/s -     12.876M in   5.027256s
     __callee__ same      5.475M (± 1.8%) i/s -     27.759M in   5.071986s
__callee__ different      2.793M (± 1.9%) i/s -     14.009M in   5.017930s
[] jruby $ jruby bench/bench_frame_name_methods.rb                        
Warming up --------------------------------------
     __method__ same   885.744k i/100ms
__method__ different   936.830k i/100ms
     __callee__ same   926.174k i/100ms
__callee__ different   938.103k i/100ms
Calculating -------------------------------------
     __method__ same      9.313M (± 0.4%) i/s -     46.944M in   5.040932s
__method__ different      9.119M (± 0.5%) i/s -     45.905M in   5.034226s
     __callee__ same      9.187M (± 1.1%) i/s -     46.309M in   5.041351s
__callee__ different      9.056M (± 0.5%) i/s -     45.967M in   5.075968s

The __method__ and __callee__ methods currently force a frame due
to the default implementation needing to access the caller's frame
to get the method name or compound method name passed on the
stack. This optimization moves these calls to a specialized
instruction that can access the method/callee name directly
without needing a frame, improving performance of methods that use
this behavior.
Prior to this patch the compound name passed on the stack from an
AliasMethod was repeatedly parsed and split before acquiring the
associated symbol, leading to wasteful allocation of additional
String objects. The change here adds a new map to the symbol table
that tracks the two symbols associated with a compound name, so
that only a single lookup is needed and no allocation happens.
@headius headius added this to the JRuby 10.0.0.0 milestone Mar 28, 2024
Records have optimization characteristics that may be helpful
here.
* Cache the last "simple" name to pass through, which should
  usually be the same name every time. This avoids re-acquiring
  the symbol when we are looking for the same simple name.
* Inject the SymbolTable into the handle chain to avoid having to
  re-acquire it and the Ruby instance every time.

With these changes, simple name acquisition nearly doubles in
performance without indy, and all forms of acquisition improve by
around 5-6x.

This is in addition to the doubled performance from the original
optimization.
This mimics the JIT's logic to do the same when loading the
"frameName" and is necessary to avoid regressing __method__
behavior inside peculiar contexts (like define_method or eval).
@headius
Copy link
Member Author

headius commented Mar 28, 2024

Additional optimizations:

  • Use a record to hold the symbol pair.
  • Cache the last-seen simple method symbol in the site and use it when appropriate.
  • Inject the symbol table into the indy handle chain to avoid repeatedly acquiring it.

With all changes, final bench numbers look like this:

[] jruby $ jruby -Xcompile.invokedynamic bench/bench_frame_name_methods.rb                         
Warming up --------------------------------------
     __method__ same     5.484M i/100ms
__method__ different     5.391M i/100ms
     __callee__ same     4.506M i/100ms
__callee__ different     4.501M i/100ms
Calculating -------------------------------------
     __method__ same     57.464M (± 0.7%) i/s -    290.649M in   5.058218s
__method__ different     57.374M (± 0.9%) i/s -    291.106M in   5.074304s
     __callee__ same     45.069M (± 1.0%) i/s -    225.302M in   4.999640s
__callee__ different     45.042M (± 1.4%) i/s -    225.070M in   4.997901s                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        [] jruby $ jruby bench/bench_frame_name_methods.rb
Warming up --------------------------------------
     __method__ same     1.620M i/100ms
__method__ different   943.069k i/100ms
     __callee__ same     1.604M i/100ms
__callee__ different   960.143k i/100ms
Calculating -------------------------------------
     __method__ same     16.377M (± 1.6%) i/s -     82.611M in   5.045839s
__method__ different      9.156M (± 1.3%) i/s -     46.210M in   5.047853s
     __callee__ same     15.847M (± 1.8%) i/s -     80.219M in   5.063688s
__callee__ different      9.183M (± 2.0%) i/s -     46.087M in   5.021116s

Note that the slow compound name path appears to have some volatility in the JVM JIT, and occasionally now drops about 25% of its performance:

[] jruby $ jruby bench/bench_frame_name_methods.rb 
Warming up --------------------------------------
     __method__ same     1.648M i/100ms
__method__ different   764.744k i/100ms
     __callee__ same     1.622M i/100ms
__callee__ different   754.497k i/100ms
Calculating -------------------------------------
     __method__ same     16.376M (± 2.2%) i/s -     82.403M in   5.034454s
__method__ different      7.017M (± 4.3%) i/s -     35.178M in   5.022898s
     __callee__ same     15.908M (± 6.5%) i/s -     79.478M in   5.034325s
__callee__ different      7.137M (± 0.7%) i/s -     36.216M in   5.074376s       

This may be due to the additional optimizations confounding HotSpot sometimes, or it may have always been there but not during the benchmark runs I performed.

There's no super and __method__ should return nil inside module
and class bodies, so we pass null to indicate this.
@headius headius merged commit 4216efb into jruby:9.5-dev Mar 29, 2024
44 of 59 checks passed
@headius headius deleted the optimized_method_callee branch March 29, 2024 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant