Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YJIT: Lazily push a frame for specialized C funcs #10080

Merged
merged 7 commits into from Feb 23, 2024

Conversation

k0kubun
Copy link
Member

@k0kubun k0kubun commented Feb 23, 2024

This PR adds the capability to lazily push a frame for specialized C methods. It saves the PC before a C call, remembers { PC => cme }, and lazily pushes a frame using it when it calls non-leaf functions like rb_raise or rb_to_int.

To demonstrate the capability, this PR also merges String#setbyte specialization #9767. While the JIT code never pushes a frame for String#setbyte, it gets a String#setbyte frame when a backtrace is obtained inside the C call.

before: ruby 3.4.0dev (2024-02-23T04:55:43Z master f403660805) +YJIT [x86_64-linux]
after: ruby 3.4.0dev (2024-02-23T05:38:59Z yjit-lazy-frame 0845cbddb6) +YJIT [x86_64-linux]

--------  -----------  ----------  ----------  ----------  -------------  ------------
bench     before (ms)  stddev (%)  after (ms)  stddev (%)  after 1st itr  before/after
ruby-xor  88.4         0.3         82.6        0.3         1.07           1.07
--------  -----------  ----------  ----------  ----------  -------------  ------------

@k0kubun k0kubun force-pushed the yjit-lazy-frame branch 2 times, most recently from 3204cc3 to d906d19 Compare February 23, 2024 05:32
Co-authored-by: Maxime Chevalier-Boisvert <maxime.chevalierboisvert@shopify.com>
@k0kubun k0kubun marked this pull request as ready for review February 23, 2024 08:42
@matzbot matzbot requested a review from a team February 23, 2024 08:42
@maximecb
Copy link
Contributor

Looking at this PR 👀

Nice results on the ruby-xor microbenchmark 👍

yjit/src/codegen.rs Outdated Show resolved Hide resolved
yjit/src/yjit.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@maximecb maximecb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is surprisingly simple and the results look good so far. Well done 👍

Do you think it would make sense to try to apply this optimization to more cfuncs before merging?

Also, can this be applied to all calls to cfuncs in gen_send_cfunc eventually? That would make a big performance difference if possible (doesn't need to be done in this PR).

@k0kubun
Copy link
Member Author

k0kubun commented Feb 23, 2024

Do you think it would make sense to try to apply this optimization to more cfuncs before merging?

I can apply this to String#getbyte and String#byteslice to get rid of all existing "give up because it's not leaf" scenarios. Other existing specialized cfuncs don't need this. This is meant to unblock the specialization of more C funcs that are leaf unless arguments are malformed, and I want to work on new specializations in separate PRs.

Also, can this be applied to all calls to cfuncs in gen_send_cfunc eventually? That would make a big performance difference if possible (doesn't need to be done in this PR).

Theoretically, it's possible. However, given that frame outlining ended up slowing down the interpreter, I'd like to minimize the number of hooks where we push a lazy frame. Right now, num_lazy_frame_check is about 20 on both railsbench and lobsters (0 on chunky-png), meaning it's not called during benchmark iterations.

So the benefit of the current design is that it has zero overhead to the C code, which likely doesn't hold when you apply this to every single C func. We currently apply the hook to rb_to_int with non-integers, which is fairly uncommon, and rb_raise and FrozenError, which can often be a bug that's not shipped to production. It doesn't touch rb_funcall.

But I think this is enough. On chunky-png, num_send_cfunc_inline % was pushed from 76.6% to 99.4% by inlining #setbyte (no significant impact on the speedup though). So it inlines almost everything. By dealing with such C funcs, I believe we can cover a large portion of C calls on most benchmarks. Not-usually-leaf C funcs should be rewritten in Ruby.

@k0kubun
Copy link
Member Author

k0kubun commented Feb 23, 2024

I can apply this to String#getbyte and String#byteslice to get rid of all existing "give up because it's not leaf" scenarios

Done 340cfb5 c4bd35e. For String#byteslice, I added -1 argc support too.

Copy link
Contributor

@maximecb maximecb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done 👌

@maximecb maximecb enabled auto-merge (squash) February 23, 2024 18:57
@maximecb maximecb merged commit 8a6740c into ruby:master Feb 23, 2024
95 of 96 checks passed
@k0kubun k0kubun deleted the yjit-lazy-frame branch February 23, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants