Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup ActionView::OutputBuffer #45614

Merged
merged 1 commit into from Jul 19, 2022
Merged

Conversation

casperisfine
Copy link
Contributor

MRI has a lot of optimizations for string concatenation that are only available when concatenating into a String instance.

Using a String subclass disable these optimizations.

The difference is even more important in Ruby 3.2 where both YJIT and the VM noticeably improve String#<< performance.

So ideally we want the buffer not to be a String subclass. Luckily, the Action View buffer is for internal use only, so we can replace inheritance by composition without much work.

Benchmark:

ActionView::OutputBuffer:   147644.2 i/s
    optimized buffer:   228001.4 i/s - 1.54x  (± 0.00) faster

Source: https://gist.github.com/casperisfine/7199579a138e268fda71d6a91366af49

NB: That 50% faster figure is to be contextualized, it can radically change
from one template to the other, but is always faster.

cc @jhawthorn @matthewd @noahgibbs @tenderlove

@rails-bot rails-bot bot added the actionview label Jul 18, 2022
@casperisfine casperisfine force-pushed the faster-output-buffer branch 5 times, most recently from 879c5d6 to 6ca8332 Compare July 18, 2022 12:37
fragment = output_buffer.slice!(pos..-1)
fragment = output_buffer.to_s.slice!(pos..-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is quite right. Previously this was actually mutating the output buffer (it's a really strange thing to do). @seejohnrun and I ran into this as well when experimenting with modifying outputbuffer, and tried replacing what this method does capture https://github.com/jhawthorn/rails/pull/1/files#diff-7e8f6bc4873643df271aa007b513b993a8e424964b76e4eec2c1d156d9108ac3R252 (but I don't remember if it works)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look more into it tomorrow, but I was under the impression that mutating the buffer was simply an unneeded side effect. I'm pretty sure that buffer instance is no longer used after that slice!.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I was wrong. This slice! is to handle formats such as .builder, as it doesn't access @output_buffer, it is initialized once with it.

So in cache do we can't just substitute the buffer, we have to mutate it.

That's really unfortunate.

I'm tempted to add some stack to the buffer, the problem is that it's not clear to be wether @output_buffer is guaranteed to be an OutputBuffer (or StreamBuffer) as there are some tests using ActiveSupport::SafeBuffer. I'd have to dig into that as well to see wether it's legit or not.

@jhawthorn
Copy link
Member

jhawthorn commented Jul 18, 2022

I like this change. I think the speedup is more avoiding the indirection than the concatenation itself (I don't think anything called by rb_str_concat checks the class, just the builtin type), but equally worthwhile! 🚀

I don't know if this happens in practice but here's another behaviour change we should guard against:

b1 = ActionView::OutputBuffer.new
b1 << "foo"
b2 = b1.dup
b1 << "bar"

p b1.to_s # => "foobar"
p b2.to_s # => "foobar"

Probably worth adding an initialize_copy definition to ActionView::OutputBuffer

@noahgibbs
Copy link
Contributor

John: one of many reasons we're hoping to avoid concatenating onto a non-basic string is that YJIT is significantly faster when the receiver is a string. That's not the only reason we're doing it, but it's a side benefit :-)

@jhawthorn
Copy link
Member

jhawthorn commented Jul 18, 2022

John: one of many reasons we're hoping to avoid concatenating onto a non-basic string is that YJIT is significantly faster when the receiver is a string. That's not the only reason we're doing it, but it's a side benefit :-)

As I said, I'm in favour of this change. But I don't think that's true. String concatenation onto a subclass can be more or less the same speed.

EDIT:

# frozen_string_literal: true
require 'benchmark/ips'

class SafeBuffer < String
end

class OutputBuffer < SafeBuffer
  def <<(value)
    return self if value.nil?
    super(value.to_s)
  end
end

str = +""
safe_buffer = SafeBuffer.new
output_buffer = OutputBuffer.new
safe_buffer_encoded = SafeBuffer.new("")
output_buffer_encoded = OutputBuffer.new("")

puts "Encodings:"
pp({str:, safe_buffer:, output_buffer:, safe_buffer_encoded:, output_buffer_encoded:}.transform_values(&:encoding))

Benchmark.ips do |x|
  x.report("String") { str << "" }
  x.report("SafeBuffer") { safe_buffer << "" }
  x.report("OutputBuffer") { output_buffer << "" }
  x.report("SafeBuffer (same encoding)") { safe_buffer_encoded << "" }
  x.report("OutputBuffer (same encoding)") { output_buffer_encoded << "" }
  x.compare!(order: :baseline)
end
$ ./ruby --yjit benchmark_string_concat.rb
Encodings:
{:str=>#<Encoding:UTF-8>,
 :safe_buffer=>#<Encoding:ASCII-8BIT>,
 :output_buffer=>#<Encoding:ASCII-8BIT>,
 :safe_buffer_encoded=>#<Encoding:UTF-8>,
 :output_buffer_encoded=>#<Encoding:UTF-8>}
Warming up --------------------------------------
              String   573.166k i/100ms
          SafeBuffer   366.353k i/100ms
        OutputBuffer   360.232k i/100ms
SafeBuffer (same encoding)
                       597.930k i/100ms
OutputBuffer (same encoding)
                       582.716k i/100ms
Calculating -------------------------------------
              String      5.989M (± 1.1%) i/s -     30.378M in   5.073130s
          SafeBuffer      3.787M (± 1.2%) i/s -     19.050M in   5.031701s
        OutputBuffer      3.712M (± 0.9%) i/s -     18.732M in   5.046107s
SafeBuffer (same encoding)
                          5.985M (± 1.5%) i/s -     30.494M in   5.096533s
OutputBuffer (same encoding)
                          5.915M (± 1.2%) i/s -     29.719M in   5.024993s

Comparison:
              String:  5988771.9 i/s
SafeBuffer (same encoding):  5984749.7 i/s - same-ish: difference falls within error
OutputBuffer (same encoding):  5915046.7 i/s - same-ish: difference falls within error
          SafeBuffer:  3786659.5 i/s - 1.58x  (± 0.00) slower
        OutputBuffer:  3712492.5 i/s - 1.61x  (± 0.00) slower

@noahgibbs
Copy link
Contributor

Okay. I don't think I have permissions on that repo. But at a minimum we would need to build more infrastructure for YJIT to make that true. So with current YJIT as it exists, I'm pretty certain that's true.

@jhawthorn
Copy link
Member

jhawthorn commented Jul 18, 2022

benchmark attached, but again, I am in favour of this change.

I also don't think YJIT needs extra infrastructure for that to be true, because Ruby does not special case rb_cString for concatenation (just T_STRING). Even if it did (it doesn't) we are going to want it because ActiveSupport::SafeBuffer is public and a string subclass.

Let's continue the YJIT part of this discussion elsewhere because I think this is a good change anyways.

@casperisfine
Copy link
Contributor Author

@jhawthorn this is the optimization I'm talking about: https://github.com/ruby/ruby/blob/85ea46730deff70172a9f50172f0011a7401f371/vm_insnhelper.c#L5374-L5380

static VALUE
vm_opt_ltlt(VALUE recv, VALUE obj)
{
    if (SPECIAL_CONST_P(recv)) {
	return Qundef;
    }
    else if (RBASIC_CLASS(recv) == rb_cString &&
	     BASIC_OP_UNREDEFINED_P(BOP_LTLT, STRING_REDEFINED_OP_FLAG)) {
	if (LIKELY(RB_TYPE_P(obj, T_STRING))) {
	    return rb_str_buf_append(recv, obj);
	} else {
	    return rb_str_concat(recv, obj);
	}
    }
    else if (RBASIC_CLASS(recv) == rb_cArray &&
	     BASIC_OP_UNREDEFINED_P(BOP_LTLT, ARRAY_REDEFINED_OP_FLAG)) {
	return rb_ary_push(recv, obj);
    }
    else {
	return Qundef;
    }
}

String subclasses are explictly excluded, and it's particularly true for SafeBuffer as << is redefined.

@casperisfine
Copy link
Contributor Author

Probably worth adding an initialize_copy

Good catch.

MRI has a lot of optimizations for string concatenation that
are only available when concatenating into a `String` instance.

Using a `String` subclass disable these optimizations.

The difference is even more important with YJIT.

So ideally we want the buffer not to be a String subclass.
Luckily, the Action View buffer is for internal use only, so
we can replace inheritance by composition without much work.

Benchmark:

```
ActionView::OutputBuffer:   147644.2 i/s
    optimized buffer:   228001.4 i/s - 1.54x  (± 0.00) faster
```

Source: https://gist.github.com/casperisfine/7199579a138e268fda71d6a91366af49

NB: That 50% faster figure is to be contextualized, it can radically change
from one template to the other, but is always faster.
@byroot byroot merged commit 06e9fbd into rails:main Jul 19, 2022
joelhawksley added a commit to ViewComponent/view_component that referenced this pull request Jul 21, 2022
rails/rails#45614 broke
our Global Output Buffer
joelhawksley added a commit to ViewComponent/view_component that referenced this pull request Jul 21, 2022
* Pin CI to non-broken rails main

rails/rails#45614 broke
our Global Output Buffer

* I don't know why standard failed here but 🤷

* add changelog
casperisfine pushed a commit to casperisfine/temple that referenced this pull request Aug 2, 2022
In rails/rails#45614 I modified
`ActionView::OutputBuffer` to no longer be a subclass of
`ActiveSupport::SafeBuffer` and as such it's not recommended
to use `SafeBuffer` for buffering views anymore.

Additionally in rails/rails#45731
I start doing some changes so that we stop mutating `@output_buffer`.
So template should no longer assign `@output_buffer` themselves.
casperisfine pushed a commit to casperisfine/temple that referenced this pull request Aug 2, 2022
In rails/rails#45614 I modified
`ActionView::OutputBuffer` to no longer be a subclass of
`ActiveSupport::SafeBuffer` and as such it's not recommended
to use `SafeBuffer` for buffering views anymore.

Additionally in rails/rails#45731
I start doing some changes so that we stop mutating `@output_buffer`.
So template should no longer assign `@output_buffer` themselves.
@jhawthorn
Copy link
Member

jhawthorn commented Aug 3, 2022

We ran into a pretty significant performance regression with this change.

In our app, we usually are able to call template.render with the buffer argument in order to re-use the same buffer of the parent template. This is desirable because it allows deeply nesting templates without paying any cost for re-copying large buffers between strings.

However, with this change OutputBuffer#to_s is no longer a no-op, and allocates a new string. Also, we configure Erubi to call to_s on the output buffer at the end of each template. When using a new output buffer for each template, this works great as we're able to share the internal String memory between the output_buffer internal buffer and the newly output SafeBuffer. However, if that buffer is continued to be used it will require fully copying the memory to a new string, causing significant GC pressure and spending a lot of time in memcpy.

Here's a benchmark demonstrating the issue:

# frozen_string_literal: true

require 'rails'
require 'action_view'
require 'benchmark/ips'

class BenchView < ActionView::Base
  def compiled_method_container
    self.class
  end

  def self.compiled_method_container
    self
  end
end

view = BenchView.new(
  ActionView::LookupContext,
  {},
  nil,
)
locals = { }
erb_template = <<~ERB
    Hello! #{"a"*1000}
ERB
template = ActionView::Template.new(
  erb_template,
  "hello template",
  ActionView::Template::Handlers::ERB.new,
  virtual_path: "hello",
  locals: locals.keys,
)

def render_many(view, locals, template)
  buffer = ActionView::OutputBuffer.new
  1000.times do
    template.render(view, locals, buffer)
  end
end

Benchmark.ips do |x|
  x.report("render_many") { render_many(view, locals, template) }
end

Before this change

$ be ruby benchmark_output_buffer.rb
Warming up --------------------------------------
         render_many    90.000  i/100ms
Calculating -------------------------------------
         render_many    907.139  (± 2.4%) i/s -      4.590k in   5.063373s

After this change

$ be ruby benchmark_output_buffer.rb
Warming up --------------------------------------
         render_many     1.000  i/100ms
Calculating -------------------------------------
         render_many     14.304  (± 7.0%) i/s -     72.000  in   5.068334s

We can't run this change in production as-is. The good news is that I think we might be able to fail-forward by avoiding calling to_s at the end of the ERB template. (I'll try to open a PR shortly)

@casperisfine
Copy link
Contributor Author

Thanks @jhawthorn I'll comment on #45756

judofyr pushed a commit to judofyr/temple that referenced this pull request Aug 7, 2022
In rails/rails#45614 I modified
`ActionView::OutputBuffer` to no longer be a subclass of
`ActiveSupport::SafeBuffer` and as such it's not recommended
to use `SafeBuffer` for buffering views anymore.

Additionally in rails/rails#45731
I start doing some changes so that we stop mutating `@output_buffer`.
So template should no longer assign `@output_buffer` themselves.
varyonic added a commit to varyonic/arbo that referenced this pull request Oct 6, 2023
kaspth added a commit to bullet-train-co/jbuilder-schema that referenced this pull request Oct 9, 2023
Rails 7.1 has sped up its internal OutputBuffer implementation (what handles << concatenation in templates),
which we rely on tricking a bit. rails/rails#45614

Then a follow up PR the `to_s` call was moved to after calling `view._run` rails/rails#45756

However, we rely on the return value being a Hash (which I'm not sure how to fix yet).

So since `target!` is the return value of our templates, we can wrap that and ensure `to_s` doesn't actually `to_s`, which Rails 7.1
will call for us.

On Rails < 7.1 we must call the compatibility `unwrap_target!` method ourselves to get our inner Hash return value.

This also adds a Rails version matrix so we can detect these things before release in the future.
kaspth added a commit to bullet-train-co/jbuilder-schema that referenced this pull request Oct 9, 2023
* Fix Rails 7.1 compatibility.

Rails 7.1 has sped up its internal OutputBuffer implementation (what handles << concatenation in templates),
which we rely on tricking a bit. rails/rails#45614

Then a follow up PR the `to_s` call was moved to after calling `view._run` rails/rails#45756

However, we rely on the return value being a Hash (which I'm not sure how to fix yet).

So since `target!` is the return value of our templates, we can wrap that and ensure `to_s` doesn't actually `to_s`, which Rails 7.1
will call for us.

On Rails < 7.1 we must call the compatibility `unwrap_target!` method ourselves to get our inner Hash return value.

This also adds a Rails version matrix so we can detect these things before release in the future.

* Allow us to run tests in multiple Rails versions

* Yep, I'm calling it, don't think it's worth it anymore

* Fix enum invocation for Rails 6.1 tests
varyonic added a commit to varyonic/arbo that referenced this pull request Oct 12, 2023
varyonic added a commit to varyonic/activeadmin-rails that referenced this pull request Oct 31, 2023
varyonic added a commit to varyonic/activeadmin-rails that referenced this pull request Nov 8, 2023
* rails-7.1:
  Rails 7.1 no longer populates redirect body (rails/rails#44554).
  Calling silence on ActiveSupport::Deprecation is deprecated and will be removed from Rails (use Rails.application.deprecators.silence instead)
  Deprecator setting has been deprecated.
  run db:drop, db:create and db:migrate in a separate commands (probably due to rails/rails#49349)
  Override and revert rails/rails#46699 for now, move test database from /storage back to /db
  Rails 7.1 replaces config.cache_classes with config.enable_reloading in template environment/test.rb
  Add Rails 7.1 test gem file.
  to_default_s is deprecated and will be removed from Rails 7.2 (use to_s instead)
  ActionView::OutputBuffer refactored by rails/rails#45614 (Rails 7.1)
  See rails/rails#36020
claudiob pushed a commit to claudiob/view_component that referenced this pull request Dec 22, 2023
* Pin CI to non-broken rails main

rails/rails#45614 broke
our Global Output Buffer

* I don't know why standard failed here but 🤷

* add changelog
claudiob pushed a commit to claudiob/view_component that referenced this pull request Jan 3, 2024
* Pin CI to non-broken rails main

rails/rails#45614 broke
our Global Output Buffer

* I don't know why standard failed here but 🤷

* add changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants