-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YJIT: Enhance the String#<<
method substitution to handle integer codepoint values.
#11032
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
yjit/src/codegen.rs
Outdated
// In order to use the fast path (rb_str_buf_cat_byte), the string encoding must be ASCII-8BIT | ||
// and the codepoint must be in the byte range (0x00 - 0xff). | ||
// If either of those conditions are not met we must use the general string concat (str_buf_cat) | ||
// function with the original codepoint argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment. I like to avoid generating branches inside the inline block because it can be taxing for branch prediction. This is another discussion, but I think it's useful to think of branch prediction as a finite resource. It works well when you don't have a lot of branches, but the code we generate is going to have (tens of) thousands of them, so the CPU can't always remember which direction a given branch went the last time.
Branches inside the inline block also tends to result in bigger inline code size (worse for instruction cache). I like to make it so that failing guards side-exit to the interpreter (outlined code) as much as possible. One thing you could potentially do is peek at the encoding at compilation time... This would involve speculating that the string being appended to will always have the same encoding. We can measure how often this is the case in practice.
We can continue this discussion in #yjit-internal though. Still open to merging this PR as is.
Hi Kevin. Commenting to give you as much feedback as possible.
I think that the idea of this PR is a good one. It seems like a sensible specialization to implement. I wrote some comments on the code itself. Two things I would like to see in a PR like this:
Seems sensible. Duplicating checks that occur at run-time would not be good but the checks you duplicated occur at compilation time. It makes sense to write the logic for this in a separate function to avoid ending up with a megafunction that is hard to follow. |
42ff8d8
to
f118dd1
Compare
477f936
to
00cc8e4
Compare
Before this commit Protoboeuf+YJIT is about 5.28x slower than Google's protobuf implementation, but after this commit it is 4.33x slower. Before:
After:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to what you have is to make a new Rust/C function that essentially does the logic that you currently inline into every site (if codepoint.is_fixnum() { rb_str_buf_cat_byte } else { rb_str_concat }
). That way you only generate one ccall at each site.
It should be easier to read than checking in assembler, perf should be about the same, and it's a code size win.
If you write it in C, it could go into string.c and you can additionally avoid removing static
from rb_str_buf_cat_byte().
I like Alan's idea of embedding some of the checks in a C function to save on code size and maybe make the code a bit more readable. Also thanks Kevin for persisting. You picked a hard problem for your first big PR! 😉 |
00cc8e4
to
869d076
Compare
Okay. I think you were simplifying, but for completeness the check is really (in pseudo-code)
I'll have to introduce a new function. I don't think updating |
Yes, you should make clear that the new function you add is a YJIT helper and by not putting it in headers, express that it has many preconditions that are hard to meet and is not for general usage. |
c89be19
to
edc7703
Compare
// Ensure the codepoint argument is a Fixnum. | ||
let arg = asm.stack_opnd(0); | ||
let comptime_arg = jit.peek_at_stack(&asm.ctx, 0); | ||
if comptime_arg.fixnum_p() { | ||
jit_guard_known_klass( | ||
jit, | ||
asm, | ||
comptime_arg.class_of(), | ||
arg, | ||
arg.into(), | ||
comptime_arg, | ||
SEND_MAX_DEPTH, | ||
Counter::guard_send_not_fixnums, | ||
); | ||
} else { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you already checked that comptime_arg
is a fixnum in the caller.
Maybe we can also fold the guard that the value is fixnum into rb_yjit_str_concat_codepoint
? Because presumably the fallback rb_str_concat(str, codepoint)
can handle any kind of input type? @XrXr would this be valid?
You could do if (RB_LIKELY(value is fixnum) && ENCODING_GET_INLINED(str) == rb_ascii8bit_encindex()) {...}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, rb_str_concat() handles everything. And I agree with folding the guard into the C function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original version with the assembler required the type to be a Fixnum
, but that's no longer the case. I can move this guard easily enough. I just want to confirm that if we see another type at the call site we don't want to try to cut over to jit_rb_str_concat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to allow non-codepoint arguments into rb_yjit_str_concat_codepoint
, should I rename that function? Or is naming it after its primary intention okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to confirm that if we see another type at the call site we don't want to try to cut over to jit_rb_str_concat?
Seems like if it's already behind a FIXNUM_P
check in the caller (checking in a different way), so you could turn this into an assert.
Renaming the C function sounds good. Maybe rb_yjit_str_concat_likely_byte
? Also a short comment on the C side would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at this with @paracycle, too. Since we're talking about turning rb_yjit_str_concat_codepoint
into a generic fallback, should we collapse the jit_rb_str_concat
and jit_rb_str_concat_codepoint
codegen methods into a single handler that calls to this C function?
I don't know the history of jit_rb_str_concat
to know whether it's doing more in assembly for speed or because of necessity. That's what I used as the template for my original implementation of jit_rb_str_concat_codepoint
and we decided to move things out to C to keep the inline code block smaller and reduce branching. If we applied those same principals here, I suppose we could simplify the codegen a fair bit. But, if that's out of scope, that's fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collapsing them into a single C call sounds good to me. Most of today's jit_rb_str_concat
seems like something that could be done in the C function.
a350c88
to
4026c6d
Compare
4026c6d
to
e6e462c
Compare
This PR extends YJIT's method substitution for
String#<<
to handle integer codepoints as well. If the string isASCII-8BIT
and the codepoint is a byte value, YJIT will dispatch torb_str_buf_cat_byte
as a fast path for working with binary strings. Otherwise, it'll dispatch to the generalrb_str_concat
just asvm_opt_ltlt
would.rb_str_buf_cat_byte
currently works with bothASCII-8BIT
andUS-ASCII
, but this YJIT side only optimizes forASCII-8BIT
. It could be extended easily enough with an additional comparison. The encoding indices for a handful of encodings, including bothASCII-8BIT
andUS-ASCII
are fixed and sequential, so we could also do a range check. For the time being, I've omitted the handling ofUS-ASCII
. I'd like to get feedback on the simplified PR and extend it withUS-ASCII
handling if needed (which I'm also not convinced we do).Please advise if the mechanism I'm using to handle polymorphic dispatch is incorrect. We already have
jit_rb_str_concat
as a method substitution forString#<<
, but it deliberately only handled string arguments. I could have merged the two, but that struck me as being complicated. However, I also don't know if it's fine to call between methods like this. And, it ends up duplicating some of the type checks to ensure we dispatch to the correct type depending on what we see at compile time. I was unsure on how to handle the runtime guards, so please pay extra attention to that.