Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the no-escape case with strpbrk #29

Merged
merged 1 commit into from
Nov 5, 2022
Merged

Optimize the no-escape case with strpbrk #29

merged 1 commit into from
Nov 5, 2022

Conversation

k0kubun
Copy link
Member

@k0kubun k0kubun commented Nov 5, 2022

This PR makes ERB::Util.html_escape even more optimized towards the most common, no-escaped case.

Typically, strpbrk(3) is optimized pretty well with SIMD instructions. Just using it makes this as fast as a SIMD-based implementation for the no-escape case.

Not utilizing this for escaped cases because memory allocation would be a more significant bottleneck for many strings anyway. Also, there'll be some overhead in calling a C function (strpbrk) many times because we're not using SIMD instructions directly. So using strpbrk all the time might not necessarily be faster.

Benchmark

No escape ("abcde" * 300)

Based on a benchmark in ruby/ruby, this is how it defeats an SIMD-based implementation:

Benchmark script

Based on benchmark/cgi_escape_html.yml in ruby/ruby, but modified to compare different implementations:

require 'benchmark/ips'
require 'cgi'
require 'erb'
require 'hescape'

class << ERB::Util
  def html_escape_old(s)
    CGI.escapeHTML(s.to_s)
  end
end

Benchmark.ips do |x|
  long_no_escape = "abcde" * 300
  x.report('CGI.escapeHTML')        { CGI.escapeHTML(long_no_escape) }
  x.report('ERB::Util.html_escape') { ERB::Util.html_escape(long_no_escape) }
  x.report('Hescape.escape_html')   { Hescape.escape_html(long_no_escape) }
  x.compare!
end

Using hescape.gem 0.1.1.

Before

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
       CGI.escapeHTML   174.719k i/100ms
ERB::Util.html_escape   181.137k i/100ms
  Hescape.escape_html   685.019k i/100ms
Calculating -------------------------------------
       CGI.escapeHTML      1.751M (± 0.3%) i/s -      8.911M in   5.090031s
ERB::Util.html_escape      1.807M (± 0.4%) i/s -      9.057M in   5.012665s
  Hescape.escape_html      6.827M (± 0.3%) i/s -     34.251M in   5.016834s

Comparison:
  Hescape.escape_html:  6827269.6 i/s
ERB::Util.html_escape:  1806820.6 i/s - 3.78x  (± 0.00) slower
       CGI.escapeHTML:  1750631.0 i/s - 3.90x  (± 0.00) slower

After

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
       CGI.escapeHTML   173.410k i/100ms
ERB::Util.html_escape   863.558k i/100ms
  Hescape.escape_html   678.420k i/100ms
Calculating -------------------------------------
       CGI.escapeHTML      1.706M (± 0.1%) i/s -      8.670M in   5.082314s
ERB::Util.html_escape      8.656M (± 0.7%) i/s -     44.041M in   5.088013s
  Hescape.escape_html      6.780M (± 0.2%) i/s -     33.921M in   5.003409s

Comparison:
ERB::Util.html_escape:  8656326.2 i/s
  Hescape.escape_html:  6779603.6 i/s - 1.28x  (± 0.00) slower
       CGI.escapeHTML:  1706017.9 i/s - 5.07x  (± 0.00) slower
Other benchmarks

The following benchmarks are "After"-only. Only the string part of the "Benchmark script" is modified and reused. Please look at https://github.com/ruby/ruby/blob/f276d5a7fe28ae4fd2934af4befd920bb78cfa9e/benchmark/cgi_escape_html.yml to see the actual strings I used.

No escape ("")

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
      CGI.escapeHTML     1.514M i/100ms
ERB::Util.html_escape
                         2.053M i/100ms
 Hescape.escape_html     2.250M i/100ms
Calculating -------------------------------------
      CGI.escapeHTML     14.796M (± 0.5%) i/s -     74.200M in   5.014887s
ERB::Util.html_escape
                         20.478M (± 0.4%) i/s -    102.643M in   5.012474s
 Hescape.escape_html     22.689M (± 0.5%) i/s -    114.758M in   5.057952s

Comparison:
 Hescape.escape_html: 22689167.4 i/s
ERB::Util.html_escape: 20477837.6 i/s - 1.11x  (± 0.00) slower
      CGI.escapeHTML: 14796391.4 i/s - 1.53x  (± 0.00) slower

No escape ("abcde")

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
      CGI.escapeHTML     1.468M i/100ms
ERB::Util.html_escape
                         1.999M i/100ms
 Hescape.escape_html     2.148M i/100ms
Calculating -------------------------------------
      CGI.escapeHTML     14.655M (± 0.2%) i/s -     73.403M in   5.008763s
ERB::Util.html_escape
                         19.928M (± 0.4%) i/s -     99.968M in   5.016640s
 Hescape.escape_html     21.500M (± 0.5%) i/s -    109.547M in   5.095396s

Comparison:
 Hescape.escape_html: 21499787.0 i/s
ERB::Util.html_escape: 19927543.5 i/s - 1.08x  (± 0.00) slower
      CGI.escapeHTML: 14654908.2 i/s - 1.47x  (± 0.00) slower

Escaped ("abcd<")

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
      CGI.escapeHTML     1.265M i/100ms
ERB::Util.html_escape
                         1.158M i/100ms
 Hescape.escape_html     1.086M i/100ms
Calculating -------------------------------------
      CGI.escapeHTML     12.754M (± 0.5%) i/s -     64.495M in   5.056999s
ERB::Util.html_escape
                         11.978M (± 0.1%) i/s -     60.198M in   5.025796s
 Hescape.escape_html     11.059M (± 0.1%) i/s -     55.396M in   5.009280s

Comparison:
      CGI.escapeHTML: 12753901.8 i/s
ERB::Util.html_escape: 11977845.3 i/s - 1.06x  (± 0.00) slower
 Hescape.escape_html: 11058672.4 i/s - 1.15x  (± 0.00) slower

Escaped ("'&"<>")

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
      CGI.escapeHTML   858.765k i/100ms
ERB::Util.html_escape
                       800.670k i/100ms
 Hescape.escape_html   618.977k i/100ms
Calculating -------------------------------------
      CGI.escapeHTML      8.517M (± 0.6%) i/s -     42.938M in   5.041755s
ERB::Util.html_escape
                          8.212M (± 0.3%) i/s -     41.635M in   5.070247s
 Hescape.escape_html      6.357M (± 0.2%) i/s -     32.187M in   5.063507s

Comparison:
      CGI.escapeHTML:  8516816.9 i/s
ERB::Util.html_escape:  8211671.0 i/s - 1.04x  (± 0.00) slower
 Hescape.escape_html:  6356652.6 i/s - 1.34x  (± 0.00) slower

Escaped ("'&"<>" * 10)

Looking at a local variable str = "'&\"<>" * 10

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
      CGI.escapeHTML   647.790k i/100ms
ERB::Util.html_escape
                       613.185k i/100ms
 Hescape.escape_html   177.588k i/100ms
Calculating -------------------------------------
      CGI.escapeHTML      6.477M (± 0.2%) i/s -     32.390M in   5.000548s
ERB::Util.html_escape
                          6.171M (± 0.1%) i/s -     31.272M in   5.067473s
 Hescape.escape_html      1.803M (± 0.1%) i/s -      9.057M in   5.022656s

Comparison:
      CGI.escapeHTML:  6477204.1 i/s
ERB::Util.html_escape:  6171219.3 i/s - 1.05x  (± 0.00) slower
 Hescape.escape_html:  1803230.3 i/s - 3.59x  (± 0.00) slower

Escaped (example.com)

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
      CGI.escapeHTML   312.429k i/100ms
ERB::Util.html_escape
                       300.756k i/100ms
 Hescape.escape_html   282.806k i/100ms
Calculating -------------------------------------
      CGI.escapeHTML      3.024M (± 1.5%) i/s -     15.309M in   5.064231s
ERB::Util.html_escape
                          3.063M (± 0.1%) i/s -     15.339M in   5.007486s
 Hescape.escape_html      2.834M (± 0.6%) i/s -     14.423M in   5.089948s

Comparison:
ERB::Util.html_escape:  3063129.8 i/s
      CGI.escapeHTML:  3023646.5 i/s - same-ish: difference falls within error
 Hescape.escape_html:  2833738.0 i/s - 1.08x  (± 0.00) slower

Some results in "Other benchmarks" are improved in: #30

Typically, strpbrk(3) is optimized pretty well with SIMD instructions.
Just using it makes this as fast as a SIMD-based implementation for the
no-escape case.

Not utilizing this for escaped cases because memory allocation would be
a more significant bottleneck for many strings anyway. Also, there'll be
some overhead in calling a C function (strpbrk) many times because we're
not using SIMD instructions directly. So using strpbrk all the time
might not necessarily be faster.
@k0kubun k0kubun marked this pull request as ready for review November 5, 2022 06:57
@k0kubun k0kubun merged commit 0a70bbf into master Nov 5, 2022
@k0kubun k0kubun deleted the strpbrk branch November 5, 2022 06:58
matzbot pushed a commit to ruby/ruby that referenced this pull request Nov 5, 2022
(ruby/erb#29)

Typically, strpbrk(3) is optimized pretty well with SIMD instructions.
Just using it makes this as fast as a SIMD-based implementation for the
no-escape case.

Not utilizing this for escaped cases because memory allocation would be
a more significant bottleneck for many strings anyway. Also, there'll be
some overhead in calling a C function (strpbrk) many times because we're
not using SIMD instructions directly. So using strpbrk all the time
might not necessarily be faster.
@@ -38,6 +38,12 @@ escaped_length(VALUE str)
static VALUE
optimized_escape_html(VALUE str)
{
// Optimize the most common, no-escape case with strpbrk(3). Not using it after
// this because calling a C function many times could be slower for some cases.
if (strpbrk(RSTRING_PTR(str), "'&\"<>") == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get not calling strpbrk after that, but if it returned a pointer on the first call we could use that to directly skip to the first char to escape, no?

Maybe not worth the extra complexity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried writing it for a bit and then stopped doing it for the extra complexity. If someone brings a simple enough patch and it's meaningfully better in benchmarks, I'd merge it though. Also note that there's f58476b too now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Are your benchmarks committed anywhere? I may try that if I get bored in the plane.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ruby/ruby@f276d5a for it.

@byroot
Copy link
Member

byroot commented Nov 5, 2022

Out of curiosity, have you ran that same benchmark on M1?

@k0kubun
Copy link
Member Author

k0kubun commented Nov 5, 2022

Ugh, actually it was way worse than I imagined. Perhaps it should have an #ifdef or be just reverted...

Before

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin22]
Warming up --------------------------------------
      CGI.escapeHTML   136.743k i/100ms
ERB::Util.html_escape
                       140.024k i/100ms
 Hescape.escape_html   182.243k i/100ms
Calculating -------------------------------------
      CGI.escapeHTML      1.365M (± 0.2%) i/s -      6.837M in   5.007130s
ERB::Util.html_escape
                          1.399M (± 0.3%) i/s -      7.001M in   5.002835s
 Hescape.escape_html      1.826M (± 0.2%) i/s -      9.294M in   5.091433s

Comparison:
 Hescape.escape_html:  1825502.1 i/s
ERB::Util.html_escape:  1399459.4 i/s - 1.30x  (± 0.00) slower
      CGI.escapeHTML:  1365489.6 i/s - 1.34x  (± 0.00) slower

After

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin22]
Warming up --------------------------------------
      CGI.escapeHTML   136.103k i/100ms
ERB::Util.html_escape
                        26.302k i/100ms
 Hescape.escape_html   182.422k i/100ms
Calculating -------------------------------------
      CGI.escapeHTML      1.367M (± 0.3%) i/s -      6.941M in   5.077736s
ERB::Util.html_escape
                        263.477k (± 0.2%) i/s -      1.341M in   5.091179s
 Hescape.escape_html      1.822M (± 0.7%) i/s -      9.121M in   5.005324s

Comparison:
 Hescape.escape_html:  1822372.9 i/s
      CGI.escapeHTML:  1367008.2 i/s - 1.33x  (± 0.00) slower
ERB::Util.html_escape:   263476.8 i/s - 6.92x  (± 0.00) slower

@byroot
Copy link
Member

byroot commented Nov 5, 2022

Ugh, actually it was way worse than I imagined.

Yeah that's the problem with SIMD and such, it's always a bit of a gamble when you are multi-platform.

Perhaps it should have an #ifdef or be just reverted...

M1 perf isn't as critical given that it's very unlikely to be a production plafform. But with other ARM targets such as graviton etc, that will become important (if not already).

I wonder if it's macOS or clang strpbrk that isn't optimized. Would be interesting to try in a aarch64 docker container to see if the glibc optimize it or not.

k0kubun added a commit that referenced this pull request Nov 5, 2022
because it's much slower on M1 #29.
It'd be too complicated to switch the implementation based on known
optimized platforms / versions.

Besides, short strings are the most common usages of this method and
SIMD doesn't really help that case. All in all, I can't justify the
existence of this code.
matzbot pushed a commit to ruby/ruby that referenced this pull request Nov 5, 2022
because it's much slower on M1 ruby/erb#29.
It'd be too complicated to switch the implementation based on known
optimized platforms / versions.

Besides, short strings are the most common usages of this method and
SIMD doesn't really help that case. All in all, I can't justify the
existence of this code.

ruby/erb@30691c8995
@k0kubun
Copy link
Member Author

k0kubun commented Nov 5, 2022

Here's my take for now 30691c8. Given that long_no_escape is not necessarily a common case and all other cases (including the most common pattern) are slower in ruby/cgi#28, I think this is very hard to justify.

We need a benchmark that is convincing enough to ignore all the other slowdown, a better patch that prevents a slowdown in all cases/environments, or SIMD-based code that is simple enough.

@byroot
Copy link
Member

byroot commented Nov 5, 2022

or SIMD-based code that is simple enough.

I know some people manage to get SIMD "for free" by writing their code in a way that compiler can easily vectorize. I've never done it myself though, so not too sure how hard it is.

@eregon
Copy link
Member

eregon commented Nov 26, 2022

I know some people manage to get SIMD "for free" by writing their code in a way that compiler can easily vectorize. I've never done it myself though, so not too sure how hard it is.

Actually using a Ruby Regexp on TruffleRuby gives you that :) (https://twitter.com/eregontp/status/1596493699160367104)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants