vm_opt_ltlt: call rb_str_buf_append directly if RHS is a String #6095

casperisfine · 2022-07-06T10:43:49Z

rb_str_concat does a lot of type checking we can easily bypass.


|               |compare-ruby|built-ruby|
|:--------------|-----------:|---------:|
|string_concat  |    362.007k|  398.965k|
|               |           -|     1.10x|

I have a bunch of other optimization, but I think it's worth submitting this one alone as it's easy to understand and very simple.

vm_insnhelper.c

`rb_str_concat` does a lot of type checking we can easily bypass. ``` | |compare-ruby|built-ruby| |:--------------|-----------:|---------:| |string_concat | 362.007k| 398.965k| | | -| 1.10x| ```

… encodings don't match, as discussed with byroot

eregon · 2022-07-06T18:42:38Z

vm_insnhelper.c

@@ -5382,7 +5382,11 @@ vm_opt_ltlt(VALUE recv, VALUE obj)
    }
    else if (RBASIC_CLASS(recv) == rb_cString &&
 	     BASIC_OP_UNREDEFINED_P(BOP_LTLT, STRING_REDEFINED_OP_FLAG)) {
-	return rb_str_concat(recv, obj);
+	if (LIKELY(RB_TYPE_P(obj, T_STRING))) {
+	    return rb_str_buf_append(recv, obj);


Any insight why this is faster?
It seems only RB_INTEGER_TYPE_P(str2) and StringValue(str2) checks less, and I'd think those are fairly fast.

If I remember correctly my profiling results (this patch is extracted from #6072, which I worked on last week), one of the main factor is the stupid amount of time we extract the encoding from the string.

So by jumping directly to rb_str_buf_append, we save the two call you mentioned, but also 2 function calls, and a fairly slow call to get_encoding -> get_actual_encoding.

Several of the other optimizations from #6072 that I need to cleanup also involve avoiding to go fetch the rb_encoding * and instead do some shortcut with the encindex for common encodings (utf-8, ascii, binary).

I think there's a bunch of low hanging fruits in string.c if we assume that 99.99% of strings are one of these 3 encodings.

Oh yeah I didn't expect to have get_actual_encoding() used on the way.

That's used only in like 1 place in TruffleRuby which I guess means it's really not used in practice.
This kind of encoding resolution should happen on String creation probably, it seems crazy expensive semantics otherwise, or maybe it should just assume/alias native endian if not specified explicitly.

Also rb_enc_from_index() can take a lock for non-US-ASCII/UTF8/BINARY.
I wish there was a a fixed number of encodings, dummy and replicate encodings are a large overhead to support (in all Ruby impls) and have near zero value.

casperisfine force-pushed the faster-buffer-concat-3 branch 2 times, most recently from be60ccc to 0c42cb0 Compare July 6, 2022 10:54

casperisfine requested review from maximecb, XrXr and tenderlove as code owners July 6, 2022 11:47

nobu reviewed Jul 6, 2022

View reviewed changes

vm_insnhelper.c Outdated Show resolved Hide resolved

byroot and others added 2 commits July 6, 2022 16:20

vm_opt_ltlt: call rb_str_buf_append directly if RHS is a String

e7bbc67

`rb_str_concat` does a lot of type checking we can easily bypass. ``` | |compare-ruby|built-ruby| |:--------------|-----------:|---------:| |string_concat | 362.007k| 398.965k| | | -| 1.10x| ```

Switch YJIT to using rb_str_buf_append rather than rb_str_append when…

7d7554b

… encodings don't match, as discussed with byroot

casperisfine force-pushed the faster-buffer-concat-3 branch from 17e8aa4 to 7d7554b Compare July 6, 2022 14:20

maximecb approved these changes Jul 6, 2022

View reviewed changes

byroot merged commit a2e0815 into ruby:master Jul 6, 2022

XrXr deleted the faster-buffer-concat-3 branch July 6, 2022 15:29

eregon reviewed Jul 6, 2022

View reviewed changes

casperisfine mentioned this pull request Jul 21, 2022

Optimize String#<< #6072

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vm_opt_ltlt: call rb_str_buf_append directly if RHS is a String #6095

vm_opt_ltlt: call rb_str_buf_append directly if RHS is a String #6095

Uh oh!

casperisfine commented Jul 6, 2022

Uh oh!

Uh oh!

eregon Jul 6, 2022

Uh oh!

byroot Jul 6, 2022

Uh oh!

eregon Jul 6, 2022

Uh oh!

Uh oh!

vm_opt_ltlt: call rb_str_buf_append directly if RHS is a String #6095

vm_opt_ltlt: call rb_str_buf_append directly if RHS is a String #6095

Uh oh!

Conversation

casperisfine commented Jul 6, 2022

Uh oh!

Uh oh!

eregon Jul 6, 2022

Choose a reason for hiding this comment

Uh oh!

byroot Jul 6, 2022

Choose a reason for hiding this comment

Uh oh!

eregon Jul 6, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!