Remove bp and speed-up bmethod calls #8060

XrXr · 2023-07-12T21:48:42Z

This commit removes the __bp__ field from rb_control_frame_t. It was
introduced to help MJIT, but since MJIT was replaced by RJIT, we can use
vm_base_ptr() to compute it from the SP of the previous control frame
instead. Removing the field avoids needing to set it up when pushing new
frames.

Simply removing __bp__ would cause crashes since RJIT and YJIT used a
slightly different stack layout for bmethod calls than the interpreter.
At the moment of the call, the two layouts looked as follows:

               ┌────────────┐    ┌────────────┐
               │ frame_base │    │ frame_base │
               ├────────────┤    ├────────────┤
               │    ...     │    │    ...     │
               ├────────────┤    ├────────────┤
               │    args    │    │    args    │
               ├────────────┤    └────────────┘<─prev_frame_sp
               │  receiver  │
prev_frame_sp─>└────────────┘
                 RJIT & YJIT      interpreter

Essentially, vm_base_ptr() needs to compute the address to frame_base
given prev_frame_sp in the diagrams. The presence of the receiver
created an off-by-one situation.

Make the interpreter use the layout the JITs use for iseq-to-iseq
bmethod calls. Doing so removes unnecessary argument shifting and
vm_exec_core() re-entry from the interpreter, yielding a speed
improvement visible through benchmark/vm_defined_method.yml:

 patched:   7578743.1 i/s
  master:   4796596.3 i/s - 1.58x  slower

C-to-iseq bmethod calls now store one more VALUE than before, but that
should have negligible impact on overall performance.

Note that re-entering vm_exec_core() used to be necessary for firing
TracePoint events, but that's no longer the case since
9121e57.

Closes #6428

vm_core.h

k0kubun

Oh wow, this is amazing. I didn't expect you would change the interpreter as well, but you already had a patch 🙂 It's much faster too. Great job!

maximecb · 2023-07-13T15:55:47Z

A wrinkle to this is that RJIT and YJIT use a slightly different stack layout for bmethod calls than the interpreter.

Just wondering if having the interpreter side conform to the JIT layout with the receiver on the stack could help keep things simpler, if the performance cost is not too high? AFAIK the receiver is always on the stack for every other type of call?

k0kubun · 2023-07-13T16:05:10Z

Just wondering if having the interpreter side conform to the JIT layout with the receiver on the stack could help keep things simpler, if the performance cost is not too high?

I believe that's what the second commit does and it speeds up the interpreter.

maximecb · 2023-07-13T16:25:23Z

Alan did write:

C-to-ISeq bmethod calls continue to use a different layout (See invoke_iseq_block_from_c())

Just wondering if we could make it so everyone uses the same layout (same as the JITs), to avoid the complexity of managing different layouts with special flags, etc.

k0kubun · 2023-07-13T16:51:27Z

My bad, thanks for pointing that out.

XrXr · 2023-07-14T14:27:47Z

AFAIK the receiver is always on the stack for every other type of call?

Not quite. Blocks don't get their self from the call site like method calls, they typically inherit the self of the scope that encloses the block. They can also get self from instance_eval and friends.

Just wondering if having the interpreter side conform to the JIT layout with the receiver on the stack could help keep things simpler, if the performance cost is not too high?

It's a close call between adding an 8 byte dead store and having the flag. I went with the flag because it's optimal speed wise, but I agree not having the flag is maybe a better trade off. I will try the no flag approach today. (PS. I'd grouped these two commits together because of these two options)

maximecb · 2023-07-14T15:43:47Z

I will try the no flag approach today. (PS. I'd grouped these two commits together because of these two options)

Thanks for being willing to try it out. I know it's not the most fun thing to refactor things around, but it might make the code cleaner.

This commit removes the __bp__ field from rb_control_frame_t. It was introduced to help MJIT, but since MJIT was replaced by RJIT, we can use vm_base_ptr() to compute it from the SP of the previous control frame instead. Removing the field avoids needing to set it up when pushing new frames. Simply removing __bp__ would cause crashes since RJIT and YJIT used a slightly different stack layout for bmethod calls than the interpreter. At the moment of the call, the two layouts looked as follows: ┌────────────┐ ┌────────────┐ │ frame_base │ │ frame_base │ ├────────────┤ ├────────────┤ │ ... │ │ ... │ ├────────────┤ ├────────────┤ │ args │ │ args │ ├────────────┤ └────────────┘<─prev_frame_sp │ receiver │ prev_frame_sp─>└────────────┘ RJIT & YJIT interpreter Essentially, vm_base_ptr() needs to compute the address to frame_base given prev_frame_sp in the diagrams. The presence of the receiver created an off-by-one situation. Make the interpreter use the layout the JITs use for iseq-to-iseq bmethod calls. Doing so removes unnecessary argument shifting and vm_exec_core() re-entry from the interpreter, yielding a speed improvement visible through `benchmark/vm_defined_method.yml`: patched: 7578743.1 i/s master: 4796596.3 i/s - 1.58x slower C-to-iseq bmethod calls now store one more VALUE than before, but that should have negligible impact on overall performance. Note that re-entering vm_exec_core() used to be necessary for firing TracePoint events, but that's no longer the case since 9121e57. Closes ruby#6428

matzbot requested a review from a team July 12, 2023 21:49

k0kubun reviewed Jul 12, 2023

View reviewed changes

vm_core.h Show resolved Hide resolved

k0kubun approved these changes Jul 12, 2023

View reviewed changes

XrXr force-pushed the rm-bp branch from 7ff2953 to a4ddbff Compare July 17, 2023 16:04

XrXr requested a review from a team July 17, 2023 17:34

maximecb approved these changes Jul 17, 2023

View reviewed changes

k0kubun approved these changes Jul 17, 2023

View reviewed changes

maximecb merged commit f302e72 into ruby:master Jul 17, 2023
91 checks passed

XrXr deleted the rm-bp branch July 17, 2023 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove bp and speed-up bmethod calls #8060

Remove bp and speed-up bmethod calls #8060

XrXr commented Jul 12, 2023 •

edited

k0kubun left a comment

maximecb commented Jul 13, 2023

k0kubun commented Jul 13, 2023

maximecb commented Jul 13, 2023

k0kubun commented Jul 13, 2023

XrXr commented Jul 14, 2023 •

edited

maximecb commented Jul 14, 2023

Remove __bp__ and speed-up bmethod calls #8060

Remove __bp__ and speed-up bmethod calls #8060

Conversation

XrXr commented Jul 12, 2023 • edited

k0kubun left a comment

Choose a reason for hiding this comment

maximecb commented Jul 13, 2023

k0kubun commented Jul 13, 2023

maximecb commented Jul 13, 2023

k0kubun commented Jul 13, 2023

XrXr commented Jul 14, 2023 • edited

maximecb commented Jul 14, 2023

Remove bp and speed-up bmethod calls #8060

Remove bp and speed-up bmethod calls #8060

XrXr commented Jul 12, 2023 •

edited

XrXr commented Jul 14, 2023 •

edited