-
Notifications
You must be signed in to change notification settings - Fork 5.4k
vm_opt_ltlt: call rb_str_buf_append directly if RHS is a String #6095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
be60ccc
to
0c42cb0
Compare
`rb_str_concat` does a lot of type checking we can easily bypass. ``` | |compare-ruby|built-ruby| |:--------------|-----------:|---------:| |string_concat | 362.007k| 398.965k| | | -| 1.10x| ```
… encodings don't match, as discussed with byroot
17e8aa4
to
7d7554b
Compare
@@ -5382,7 +5382,11 @@ vm_opt_ltlt(VALUE recv, VALUE obj) | |||
} | |||
else if (RBASIC_CLASS(recv) == rb_cString && | |||
BASIC_OP_UNREDEFINED_P(BOP_LTLT, STRING_REDEFINED_OP_FLAG)) { | |||
return rb_str_concat(recv, obj); | |||
if (LIKELY(RB_TYPE_P(obj, T_STRING))) { | |||
return rb_str_buf_append(recv, obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any insight why this is faster?
It seems only RB_INTEGER_TYPE_P(str2)
and StringValue(str2)
checks less, and I'd think those are fairly fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly my profiling results (this patch is extracted from #6072, which I worked on last week), one of the main factor is the stupid amount of time we extract the encoding from the string.
So by jumping directly to rb_str_buf_append
, we save the two call you mentioned, but also 2 function calls, and a fairly slow call to get_encoding -> get_actual_encoding
.
Several of the other optimizations from #6072 that I need to cleanup also involve avoiding to go fetch the rb_encoding *
and instead do some shortcut with the encindex
for common encodings (utf-8, ascii, binary).
I think there's a bunch of low hanging fruits in string.c
if we assume that 99.99% of strings are one of these 3 encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah I didn't expect to have get_actual_encoding() used on the way.
That's used only in like 1 place in TruffleRuby which I guess means it's really not used in practice.
This kind of encoding resolution should happen on String creation probably, it seems crazy expensive semantics otherwise, or maybe it should just assume/alias native endian if not specified explicitly.
Also rb_enc_from_index() can take a lock for non-US-ASCII/UTF8/BINARY.
I wish there was a a fixed number of encodings, dummy and replicate encodings are a large overhead to support (in all Ruby impls) and have near zero value.
rb_str_concat
does a lot of type checking we can easily bypass.I have a bunch of other optimization, but I think it's worth submitting this one alone as it's easy to understand and very simple.