Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of string interpolation #1626

Closed
wants to merge 5 commits into from

Conversation

@south37
Copy link

commented May 21, 2017

This patch will add pre-allocation in string interpolation.
By this, unnecessary capacity resizing is avoided.

For small strings, optimized rb_str_resurrect operation is faster, so pre-allocation is done only when concatenated strings are large.
MIN_PRE_ALLOC_SIZE was decided by experimenting with my local machine (x86_64-apple-darwin 16.5.0, Apple LLVM version 8.1.0 (clang - 802.0.42)).

String interpolation will be faster around 72% when large string is created.

  • Before
Calculating -------------------------------------
Large string interpolation
                          1.276M (± 5.9%) i/s -      6.358M in   5.002022s
Small string interpolation
                          5.156M (± 5.5%) i/s -     25.728M in   5.005731s
  • After
Calculating -------------------------------------
Large string interpolation
                          2.201M (± 5.8%) i/s -     11.063M in   5.043724s   <- 72% faster!!
Small string interpolation
                          5.192M (± 5.7%) i/s -     25.971M in   5.020516s   <- no degradation
  • Test code
require 'benchmark/ips'

Benchmark.ips do |x|
  x.report "Large string interpolation" do |t|
    a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
    b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"

    t.times do
      "#{a}, #{b}!"
    end
  end

  x.report "Small string interpolation" do |t|
    a = "Hello"
    b = "World"

    t.times do
      "#{a}, #{b}!"
    end
  end
end

Issue
https://bugs.ruby-lang.org/issues/13587

This patch will add pre-allocation in string interpolation.
By this, unecessary capacity resizing is avoided.

For small strings, optimized `rb_str_resurrect` operation is faster, so pre-allocation is done only when concatenated strings are large.
MIN_PRE_ALLOC_SIZE was decided by experimenting with local machine (x86_64-apple-darwin 16.5.0, Apple LLVM version 8.1.0 (clang - 802.0.42)).

String interpolation will be faster around 72% when large string is created.

* Before
Calculating -------------------------------------
Large string interpolation
                          1.276M (± 5.9%) i/s -      6.358M in   5.002022s
Small string interpolation
                          5.156M (± 5.5%) i/s -     25.728M in   5.005731s

* After
Calculating -------------------------------------
Large string interpolation
                          2.201M (± 5.8%) i/s -     11.063M in   5.043724s
Small string interpolation
                          5.192M (± 5.7%) i/s -     25.971M in   5.020516s

* Test code
require 'benchmark/ips'

Benchmark.ips do |x|
  x.report "Large string interpolation" do |t|
    a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
    b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"

    t.times do
      "#{a}, #{b}!"
    end
  end

  x.report "Small string interpolation" do |t|
    a = "Hello"
    b = "World"

    t.times do
      "#{a}, #{b}!"
    end
  end
end
Copy link
Member

left a comment

fix compile error

string.c Outdated
if (UNLIKELY(!num)) return rb_str_new(0, 0);
if (UNLIKELY(num == 1)) return rb_str_resurrect(strary[0]);

long len = 1;

This comment has been minimized.

Copy link
@nobu

nobu May 21, 2017

Member

https://travis-ci.org/ruby/ruby/builds/234558585#L1498

string.c:2887:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]

This comment has been minimized.

Copy link
@south37

south37 Jun 14, 2017

Author

Thanks for your comment. 😄
I fixed it ! 👍

string.c Outdated
if (LIKELY(len < MIN_PRE_ALLOC_SIZE)) {
str = rb_str_resurrect(strary[0]);
s = 1;
} else {

This comment has been minimized.

Copy link
@nobu

nobu May 21, 2017

Member

Adjust indentation and style to the rest.

This comment has been minimized.

Copy link
@south37

south37 Jun 14, 2017

Author

Thanks for your comment. 😄
I fixed it ! 👍

This patch fixes wrong coding style.
@south37

This comment has been minimized.

Copy link
Author

commented Jul 26, 2017

@nobu I fixed coding style! 😄

@nobu
nobu approved these changes Jul 27, 2017
long len = 1;

if (UNLIKELY(!num)) return rb_str_new(0, 0);
if (UNLIKELY(num == 1)) return rb_str_resurrect(strary[0]);

This comment has been minimized.

Copy link
@rhenium

rhenium Jul 27, 2017

Member

Would num ever be 0 or 1?

This comment has been minimized.

Copy link
@south37

south37 Jul 27, 2017

Author

Thanks for your comment! 😄

If rb_str_concat_literals is called from concatstrings(YARV insn) and it is executed as YARV insns generated by Ruby compiler, num seems not to be 0 nor 1.
But, if rb_str_concat_literals is called directly in ruby internal, num may be 0 or 1.
I considered that situation.

But, rb_str_concat_literals is used only in concatstrings now, so it may be better to reject them 💡

This comment has been minimized.

Copy link
@rhenium

rhenium Sep 16, 2017

Member

Thanks for explanation. I've just found that :"#{}" actually produces a concatstrings call with num==1 (possibly there are more cases that I didn't find). I think the concatstrings call can be eliminated in that case, but it would be out of scope of this PR.

s = 1;
}
else {
str = rb_str_buf_new(len);

This comment has been minimized.

Copy link
@rhenium

rhenium Jul 27, 2017

Member

The encoding is not preserved. s="."*50; p "#{s}x".encoding would result in ASCII-8BIT.

This comment has been minimized.

Copy link
@south37

south37 Jul 27, 2017

Author

Sorry, I didn't notice it...
Thanks for pointing it out!

I 'll fix it and add the test case! 💡

This comment has been minimized.

Copy link
@south37

south37 Jul 27, 2017

Author

@rhenium

I fixed it by 1f6ad9b ! 💡

I referenced the similar operation in array.c. https://github.com/ruby/ruby/blob/trunk/array.c#L1974

`s="."*50; p "#{s}x".encoding` should be UTF-8, but resulted in
ASCII-8bit.
This patch fixes it.
@@ -513,6 +513,11 @@ def test_concat
assert_raise(RuntimeError) { 'foo'.freeze.concat('bar') }
end

def test_concat_literals
s="."*50; "#{s}x".encoding
assert_equal("#{s}x".encoding, Encoding::UTF_8)

This comment has been minimized.

Copy link
@nobu

nobu Jul 27, 2017

Member

The argument order is "expected" and "actual"

This comment has been minimized.

Copy link
@south37

south37 Jul 27, 2017

Author

Sorry.... 🙈

I fixed it! 💡 ce64637

@south37

This comment has been minimized.

Copy link
Author

commented Sep 9, 2017

@rhenium ping

@@ -513,6 +513,11 @@ def test_concat
assert_raise(RuntimeError) { 'foo'.freeze.concat('bar') }
end

def test_concat_literals
s="."*50; "#{s}x".encoding

This comment has been minimized.

Copy link
@Maumagnaguagno

Maumagnaguagno Sep 17, 2017

The test repeats "#{s}x".encoding, which I believe to be unnecessary.
I would also add spaces for readability, so s="."*50; "#{s}x".encoding would become s = "." * 50.

This comment has been minimized.

Copy link
@south37

south37 Sep 18, 2017

Author

Thanks for your comment! 😄
I fixed it! abc4f2b

matzbot pushed a commit that referenced this pull request Sep 17, 2017
* compile.c (iseq_peephole_optimize): optimize away unnecessary
  concatenation of single string, following tostring which always
  puts a String instance.
  #1626 (comment)

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@59945 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
@matzbot matzbot closed this in 80c5030 Oct 21, 2017
mrkn pushed a commit to mrkn/ruby that referenced this pull request Dec 1, 2017
This patch will add pre-allocation in string interpolation.
By this, unecessary capacity resizing is avoided.

For small strings, optimized `rb_str_resurrect` operation is
faster, so pre-allocation is done only when concatenated strings
are large.  `MIN_PRE_ALLOC_SIZE` was decided by experimenting with
local machine (x86_64-apple-darwin 16.5.0, Apple LLVM version
8.1.0 (clang - 802.0.42)).

String interpolation will be faster around 72% when large string is created.

* Before
  ```
  Calculating -------------------------------------
  Large string interpolation
                            1.276M (± 5.9%) i/s -      6.358M in   5.002022s
  Small string interpolation
                            5.156M (± 5.5%) i/s -     25.728M in   5.005731s
  ```

* After
  ```
  Calculating -------------------------------------
  Large string interpolation
                            2.201M (± 5.8%) i/s -     11.063M in   5.043724s
  Small string interpolation
                            5.192M (± 5.7%) i/s -     25.971M in   5.020516s
  ```

* Test code
  ```ruby
  require 'benchmark/ips'

  Benchmark.ips do |x|
    x.report "Large string interpolation" do |t|
      a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
      b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"

      t.times do
        "#{a}, #{b}!"
      end
    end

    x.report "Small string interpolation" do |t|
      a = "Hello"
      b = "World"

      t.times do
        "#{a}, #{b}!"
      end
    end
  end
  ```

[Fix rubyGH-1626]
From: Nao Minami <south37777@gmail.com>

git-svn-id: svn+ssh://svn.ruby-lang.org/ruby/trunk@60320 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants
You can’t perform that action at this time.