New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove __bp__ and speed-up bmethod calls #8060
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, this is amazing. I didn't expect you would change the interpreter as well, but you already had a patch 🙂 It's much faster too. Great job!
Just wondering if having the interpreter side conform to the JIT layout with the receiver on the stack could help keep things simpler, if the performance cost is not too high? AFAIK the receiver is always on the stack for every other type of call? |
I believe that's what the second commit does and it speeds up the interpreter. |
Alan did write:
Just wondering if we could make it so everyone uses the same layout (same as the JITs), to avoid the complexity of managing different layouts with special flags, etc. |
My bad, thanks for pointing that out. |
Not quite. Blocks don't get their
It's a close call between adding an 8 byte dead store and having the flag. I went with the flag because it's optimal speed wise, but I agree not having the flag is maybe a better trade off. I will try the no flag approach today. (PS. I'd grouped these two commits together because of these two options) |
Thanks for being willing to try it out. I know it's not the most fun thing to refactor things around, but it might make the code cleaner. |
This commit removes the __bp__ field from rb_control_frame_t. It was introduced to help MJIT, but since MJIT was replaced by RJIT, we can use vm_base_ptr() to compute it from the SP of the previous control frame instead. Removing the field avoids needing to set it up when pushing new frames. Simply removing __bp__ would cause crashes since RJIT and YJIT used a slightly different stack layout for bmethod calls than the interpreter. At the moment of the call, the two layouts looked as follows: ┌────────────┐ ┌────────────┐ │ frame_base │ │ frame_base │ ├────────────┤ ├────────────┤ │ ... │ │ ... │ ├────────────┤ ├────────────┤ │ args │ │ args │ ├────────────┤ └────────────┘<─prev_frame_sp │ receiver │ prev_frame_sp─>└────────────┘ RJIT & YJIT interpreter Essentially, vm_base_ptr() needs to compute the address to frame_base given prev_frame_sp in the diagrams. The presence of the receiver created an off-by-one situation. Make the interpreter use the layout the JITs use for iseq-to-iseq bmethod calls. Doing so removes unnecessary argument shifting and vm_exec_core() re-entry from the interpreter, yielding a speed improvement visible through `benchmark/vm_defined_method.yml`: patched: 7578743.1 i/s master: 4796596.3 i/s - 1.58x slower C-to-iseq bmethod calls now store one more VALUE than before, but that should have negligible impact on overall performance. Note that re-entering vm_exec_core() used to be necessary for firing TracePoint events, but that's no longer the case since 9121e57. Closes ruby#6428
This commit removes the
__bp__
field from rb_control_frame_t. It wasintroduced to help MJIT, but since MJIT was replaced by RJIT, we can use
vm_base_ptr() to compute it from the SP of the previous control frame
instead. Removing the field avoids needing to set it up when pushing new
frames.
Simply removing
__bp__
would cause crashes since RJIT and YJIT used aslightly different stack layout for bmethod calls than the interpreter.
At the moment of the call, the two layouts looked as follows:
Essentially, vm_base_ptr() needs to compute the address to frame_base
given prev_frame_sp in the diagrams. The presence of the receiver
created an off-by-one situation.
Make the interpreter use the layout the JITs use for iseq-to-iseq
bmethod calls. Doing so removes unnecessary argument shifting and
vm_exec_core() re-entry from the interpreter, yielding a speed
improvement visible through
benchmark/vm_defined_method.yml
:C-to-iseq bmethod calls now store one more VALUE than before, but that
should have negligible impact on overall performance.
Note that re-entering vm_exec_core() used to be necessary for firing
TracePoint events, but that's no longer the case since
9121e57.
Closes #6428