Optimize CGI.escapeHTML by reducing buffer extension and branches #2226

k0kubun · 2019-06-04T15:11:23Z

Benchmark

Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores, gcc 7.3.0

escape_utils.gem's benchmark

Here's the benchmark result of escape_utils/benchmark/html_escape.rb using this CGI.escapeHTML. Originally CGI.escapeHTML was about 0.72x of EscapeUtils.escape_html, and now it's 2.48x.

$ bundle exec ruby -v benchmark/html_escape.rb
ruby 2.7.0dev (2019-06-05 master 0a29dc87e6) [x86_64-linux]
last_commit=Optimize CGI.escapeHTML by reducing buffer extension
Escaping 154483 bytes of html from https://en.wikipedia.org/wiki/Succession_to_the_British_throne
Warming up --------------------------------------
Rack::Utils.escape_html
                        16.000  i/100ms
Haml::Helpers.html_escape
                       616.000  i/100ms
ERB::Util.html_escape
                       626.000  i/100ms
      CGI.escapeHTML   631.000  i/100ms
         String#gsub    24.000  i/100ms
fast_xs_extra#fast_xs_html
                       332.000  i/100ms
EscapeUtils.escape_html
                       255.000  i/100ms
Calculating -------------------------------------
Rack::Utils.escape_html
                        166.291  (± 2.4%) i/s -    832.000  in   5.006558s
Haml::Helpers.html_escape
                          6.376k (± 2.7%) i/s -     32.032k in   5.028389s
ERB::Util.html_escape
                          6.366k (± 3.6%) i/s -     31.926k in   5.022500s
      CGI.escapeHTML      6.386k (± 3.1%) i/s -     32.181k in   5.045185s
         String#gsub    240.854  (± 1.2%) i/s -      1.224k in   5.082920s
fast_xs_extra#fast_xs_html
                          3.345k (± 1.8%) i/s -     16.932k in   5.064190s
EscapeUtils.escape_html
                          2.572k (± 3.0%) i/s -     13.005k in   5.060726s

Comparison:
      CGI.escapeHTML:        6385.6 i/s
Haml::Helpers.html_escape:   6375.6 i/s - same-ish: difference falls within error
ERB::Util.html_escape:       6366.2 i/s - same-ish: difference falls within error
fast_xs_extra#fast_xs_html:  3344.6 i/s - 1.91x  slower
EscapeUtils.escape_html:     2572.2 i/s - 2.48x  slower
         String#gsub:        240.9 i/s - 26.51x  slower
Rack::Utils.escape_html:     166.3 i/s - 38.40x  slower

Note: Haml::Helpers.html_escape uses ERB::Util.html_escape which uses CGI.escapeHTML, so those 3 are the same.

Other scenarios

When there's at least one escaped character (_one, _all, _real), it becomes 1.91~5.35x.
When there's nothing to be escaped (_blank, _none), unfortunately it becomes 1.08~1.12x slower.

$ benchmark-driver benchmark/cgi_escape_html.yml -v --rbenv 'before;after' --repeat-count=8
before: ruby 2.7.0dev (2019-06-05 master f3c877e8de) [x86_64-linux]
after: ruby 2.7.0dev (2019-06-05 master 0a29dc87e6) [x86_64-linux]
last_commit=Optimize CGI.escapeHTML by reducing buffer extension
Calculating -------------------------------------
                           before       after
     escape_html_blank    26.946M     25.011M i/s -     20.000M times in 0.742213s 0.799655s
escape_html_short_none    25.898M     23.174M i/s -     20.000M times in 0.772249s 0.863042s
 escape_html_short_one     8.046M     17.748M i/s -     20.000M times in 2.485578s 1.126892s
 escape_html_short_all     5.539M     10.561M i/s -      5.000M times in 0.902758s 0.473462s
 escape_html_long_none     1.373M      1.231M i/s -      1.000M times in 0.728400s 0.812565s
  escape_html_long_all     1.093M      5.849M i/s -      1.000M times in 0.914765s 0.170957s
      escape_html_real     1.121M      2.565M i/s -      1.000M times in 0.891941s 0.389921s

Comparison:
                  escape_html_blank
                before:  26946437.8 i/s
                 after:  25010784.2 i/s - 1.08x  slower

             escape_html_short_none
                before:  25898374.9 i/s
                 after:  23173849.3 i/s - 1.12x  slower

              escape_html_short_one
                 after:  17747937.4 i/s
                before:   8046417.1 i/s - 2.21x  slower

              escape_html_short_all
                 after:  10560503.5 i/s
                before:   5538580.2 i/s - 1.91x  slower

              escape_html_long_none
                before:   1372873.0 i/s
                 after:   1230671.0 i/s - 1.12x  slower

               escape_html_long_all
                 after:   5849414.4 i/s
                before:   1093177.0 i/s - 5.35x  slower

                   escape_html_real
                 after:   2564622.4 i/s
                before:   1121150.9 i/s - 2.29x  slower

and switch-case branches.

ext/cgi/escape/escape.c

mattn · 2019-06-04T15:49:40Z

ext/cgi/escape/escape.c

-	break;
-    }
+#define HTML_ESCAPE(c, str) do { \
+    html_escape_table[c] = str; \


Possibly, define variable int len = strlen(str); here, and use it following?

That might be robust and easier to read. I did so 9bb706a

rafaelfranca · 2019-06-04T22:44:36Z

This is great! I was trying to understand why escape_utils was still faster than CGI.escapeHTML and buffer extensions seems to be the cause. Would it be good to compare if your new implementation is closer or faster than escape_utils?

k0kubun · 2019-06-05T00:46:00Z

Thanks for your comment.

escape_utils/benchmark/html_escape.rb

Here's the benchmark result of escape_utils/benchmark/html_escape.rb using this CGI.escapeHTML on my machine (Intel 4.0GHz i7-4790K, 16GB memory, x86-64 Ubuntu 8 Cores, GCC 7.4.0):

$ bundle exec ruby -v benchmark/html_escape.rb
ruby 2.7.0dev (2019-06-05 master 0a29dc87e6) [x86_64-linux]
last_commit=Optimize CGI.escapeHTML by reducing buffer extension
Escaping 154483 bytes of html from https://en.wikipedia.org/wiki/Succession_to_the_British_throne
Warming up --------------------------------------
Rack::Utils.escape_html
                        16.000  i/100ms
Haml::Helpers.html_escape
                       616.000  i/100ms
ERB::Util.html_escape
                       626.000  i/100ms
      CGI.escapeHTML   631.000  i/100ms
         String#gsub    24.000  i/100ms
fast_xs_extra#fast_xs_html
                       332.000  i/100ms
EscapeUtils.escape_html
                       255.000  i/100ms
Calculating -------------------------------------
Rack::Utils.escape_html
                        166.291  (± 2.4%) i/s -    832.000  in   5.006558s
Haml::Helpers.html_escape
                          6.376k (± 2.7%) i/s -     32.032k in   5.028389s
ERB::Util.html_escape
                          6.366k (± 3.6%) i/s -     31.926k in   5.022500s
      CGI.escapeHTML      6.386k (± 3.1%) i/s -     32.181k in   5.045185s
         String#gsub    240.854  (± 1.2%) i/s -      1.224k in   5.082920s
fast_xs_extra#fast_xs_html
                          3.345k (± 1.8%) i/s -     16.932k in   5.064190s
EscapeUtils.escape_html
                          2.572k (± 3.0%) i/s -     13.005k in   5.060726s

Comparison:
      CGI.escapeHTML:        6385.6 i/s
Haml::Helpers.html_escape:   6375.6 i/s - same-ish: difference falls within error
ERB::Util.html_escape:       6366.2 i/s - same-ish: difference falls within error
fast_xs_extra#fast_xs_html:  3344.6 i/s - 1.91x  slower
EscapeUtils.escape_html:     2572.2 i/s - 2.48x  slower
         String#gsub:        240.9 i/s - 26.51x  slower
Rack::Utils.escape_html:     166.3 i/s - 38.40x  slower

Note: Haml::Helpers.html_escape uses ERB::Util.html_escape which uses CGI.escapeHTML (because I maintain both template engines), so those 3 are the same.

benchmark/cgi_escape_html.yml

Same environment as above, but with benchmarks in this PR:

$ benchmark-driver benchmark/cgi_escape_html.yml -v --rbenv 'before;after;escape_utils::before -rescape_utils -rescape_utils/html/cgi'
before: ruby 2.7.0dev (2019-06-05 master f3c877e8de) [x86_64-linux]
after: ruby 2.7.0dev (2019-06-05 master 0a29dc87e6) [x86_64-linux]
last_commit=Optimize CGI.escapeHTML by reducing buffer extension
escape_utils: ruby 2.7.0dev (2019-06-05 master f3c877e8de) [x86_64-linux]
Calculating -------------------------------------
                           before       after  escape_utils
     escape_html_blank    26.893M     24.954M       13.290M i/s -     20.000M times in 0.743694s 0.801490s 1.504855s
escape_html_short_none    25.234M     22.703M       21.202M i/s -     20.000M times in 0.792582s 0.880929s 0.943310s
 escape_html_short_one     7.966M     17.724M        7.958M i/s -     20.000M times in 2.510630s 1.128417s 2.513101s
 escape_html_short_all     5.494M     10.506M        4.656M i/s -      5.000M times in 0.910155s 0.475900s 1.073980s
 escape_html_long_none     1.373M      1.228M        1.897M i/s -      1.000M times in 0.728313s 0.814626s 0.527014s
  escape_html_long_all     1.094M      5.632M        1.386M i/s -      1.000M times in 0.914260s 0.177545s 0.721668s
      escape_html_real     1.099M      2.219M        1.212M i/s -      1.000M times in 0.910291s 0.450618s 0.825238s

Comparison:
                  escape_html_blank
                before:  26892772.9 i/s
                 after:  24953535.0 i/s - 1.08x  slower
          escape_utils:  13290316.8 i/s - 2.02x  slower

             escape_html_short_none
                before:  25233996.7 i/s
                 after:  22703310.4 i/s - 1.11x  slower
          escape_utils:  21201938.3 i/s - 1.19x  slower

              escape_html_short_one
                 after:  17723937.9 i/s
                before:   7966129.6 i/s - 2.22x  slower
          escape_utils:   7958295.4 i/s - 2.23x  slower

              escape_html_short_all
                 after:  10506411.8 i/s
                before:   5493570.4 i/s - 1.91x  slower
          escape_utils:   4655579.9 i/s - 2.26x  slower

              escape_html_long_none
          escape_utils:   1897481.2 i/s
                before:   1373035.5 i/s - 1.38x  slower
                 after:   1227557.7 i/s - 1.55x  slower

               escape_html_long_all
                 after:   5632366.9 i/s
          escape_utils:   1385678.6 i/s - 4.06x  slower
                before:   1093780.9 i/s - 5.15x  slower

                   escape_html_real
                 after:   2219174.4 i/s
          escape_utils:   1211772.1 i/s - 1.83x  slower
                before:   1098550.0 i/s - 2.02x  slower

CGI.escapeHTML is faster except escape_html_long_none. See below for reasons.

Why

Let me clarify my understanding of characteristics of each implementation:

Number of buffer extensions:
- CGI.escapeHTML (before): When N characters are extended, String is extended N times.
- CGI.escapeHTML (after): By allocating str.length * 6 (" is 6 chars) on stack (to avoid heap management and fragmentation. It uses a heap if the size is too big), buffer extension never happens.
- EscapeUtils.escape_html: It always speculatively extends a buffer to 1.5x. It grows a temporary buffer on a heap exponentially. So the times of buffer extension would look like O(log N) if str is long.
- fast_xs_extra#fast_xs_html: It calculates a result buffer size beforehand, so a buffer extension never happens. The downside is that it needs to scan the entire string twice, unlike the new CGI.escapeHTML.
String object creation when nothing is escaped:
- CGI.escapeHTML (before/after): It creates another String object by rb_str_dup(str). This cannot be optimized for backward compatibility with old CGI.escapeHTML using gsub. We would need to add an option or another method to optimize this.
- EscapeHTML.escape_html: While it allocates a temporary buffer on heap, it's not used and it returns the argument. (In a strict sense, it's not compatible with CGI.escapeHTML. But this behavior would be more suitable for template engines for sure...)
  - It outperforms CGI.escapeHTML in escape_html_long_none especially because CGI.escapeHTML uses a heap when a string is long. But I believe our HTML escape argument is usually short.
- fast_xs_extra#fast_xs_html: It always creates a new buffer and appends characters to it. It would be slightly slower than CGI.escapeHTML which just copies String by rb_str_dup(str).
Encoding / Taint of a new escaped string:
- CGI.escapeHTML (before/after): Preserved. It would have an extra overhead, but maybe trivial.
- EscapeHTML.escape_html / fast_xs_extra#fast_xs_html: Not preserved. Maybe usually fine.
Non-ASCII-compatible encoding support:
- CGI.escapeHTML (before/after): It fallbacks to gsub implementation. I think others should follow it.
- EscapeHTML.escape_html: It asserts ASIII compatibility and raises an error if not compatible.
- fast_xs_extra#fast_xs_html: It does not check encoding, but implementation seems to assume ASCII compatibility. Not sure if it's fine.

patched by nobu.

k0kubun · 2019-06-05T02:02:42Z

(It included my wrong assumption for ALLOCA_N macro. It's temporarily reverted in 71b14af but I'll fix and commit that again later.)

and switch-case branches. Buffer allocation optimization using `ALLOCA_N` would be the main benefit of patch. It eliminates the O(N) buffer extensions. It also reduces the number of branches using escape table like https://mattn.kaoriya.net/software/lang/c/20160817011915.htm. Closes: #2226 Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org> Co-authored-by: Yasuhiro MATSUMOTO <mattn.jp@gmail.com>

k0kubun · 2019-06-05T12:31:31Z

I committed the fixed version in 0a29dc8 and updated all above benchmark results again.

See #2226 for benchmark results.

rafaelfranca · 2019-06-05T16:32:36Z

Thank you for such detailed explanation. I'm glad that we can either kill escape_utils or at least shrink its implementation in Ruby 2.7.

tenderlove · 2019-06-06T00:05:59Z

This is great! I wonder why Rack's implementation doesn't use CGI.

@rafaelfranca I think we use the JavaScript / URI escaping stuff in escape_utils still (I wanted to remove this dependency, but couldn't)

ioquatix · 2019-06-06T00:20:14Z

I didn't do all the optimisations here, but I did do this one which I'm not sure if it's implemented here or not: https://github.com/ioquatix/trenni/blob/master/ext/trenni/escape.c#L83-L84

Basically, search the string to see if there are any characters to be escaped. If you find index to escape, then start from there (you might be able to minimise stack allocation too if you know length of remainder). If you don't find any symbol to escape, you can return early with no overhead.

ioquatix · 2019-06-06T00:25:08Z

ext/cgi/escape/escape.c

+} while (0)
+    HTML_ESCAPE('\'', "&#39;");
+    HTML_ESCAPE('&', "&amp;");
+    HTML_ESCAPE('"', "&quot;");


You could also use " which is one character less :p

I know there are some variations, but I intended to use exactly these characters for backward compatibility.

Yeah, I agree with you, it was half-joke :)

k0kubun · 2019-06-06T00:51:13Z

Basically, search the string to see if there are any characters to be escaped. If you find index to escape, then start from there (you might be able to minimise stack allocation too if you know length of remainder). If you don't find any symbol to escape, you can return early with no overhead.

Before merging this, I did the experiment to skip a buffer allocation at all when nothing is escaped k0kubun#16. It slightly improved the benchmark, but it also complicates implementation (if we don't care maintainability, why not use SIMD? :p). So I intentionally skipped that for now.

ioquatix · 2019-06-06T04:53:07Z

I don't think the implementation has to be much complicated to avoid buffer allocation. Just make "Find next token" method that takes current char * and returns next one, or null if not found. Then, you just use a simple while loop, copying and substituting in as required. You can do first check out of loop, and if it's null, just return the original string.

The reason why it's good optimisation is because it avoids allocation, GC pressure, etc. I would say many strings have no sequence that requires escape.

mame · 2019-06-06T08:07:04Z

For the record: my complicated patch.

Word-aligned copy instead of byte-aligned
On-demand allocation
ALLOCA for smaller case

Calculating -------------------------------------
                              new         old 
     escape_html_blank    19.207M     16.935M i/s -     20.000M times in 1.041281s 1.181003s
escape_html_short_none    19.507M     17.095M i/s -     20.000M times in 1.025275s 1.169951s
 escape_html_short_one    13.790M     13.669M i/s -     20.000M times in 1.450324s 1.463170s
 escape_html_short_all     8.466M      7.823M i/s -      5.000M times in 0.590598s 0.639116s
 escape_html_long_none     1.371M    939.010k i/s -      1.000M times in 0.729490s 1.064951s
  escape_html_long_all     4.447M      4.397M i/s -      1.000M times in 0.224857s 0.227434s
      escape_html_real     2.148M      1.969M i/s -      1.000M times in 0.465648s 0.507745s

Comparison:
                  escape_html_blank
                   new:  19207104.9 i/s 
                   old:  16934763.6 i/s - 1.13x  slower

             escape_html_short_none
                   new:  19506956.1 i/s 
                   old:  17094734.8 i/s - 1.14x  slower

              escape_html_short_one
                   new:  13790021.5 i/s 
                   old:  13668951.6 i/s - 1.01x  slower

              escape_html_short_all
                   new:   8465991.7 i/s 
                   old:   7823300.2 i/s - 1.08x  slower

              escape_html_long_none
                   new:   1370821.1 i/s 
                   old:    939010.4 i/s - 1.46x  slower

               escape_html_long_all
                   new:   4447270.2 i/s 
                   old:   4396889.5 i/s - 1.01x  slower

                   escape_html_real
                   new:   2147546.9 i/s 
                   old:   1969491.5 i/s - 1.09x  slower

diff --git a/ext/cgi/escape/escape.c b/ext/cgi/escape/escape.c
index 76d8f0d067..5fa8463c75 100644
--- a/ext/cgi/escape/escape.c
+++ b/ext/cgi/escape/escape.c
@@ -34,35 +34,87 @@ preserve_original_state(VALUE orig, VALUE dest)
     RB_OBJ_INFECT_RAW(dest, orig);
 }
 
+static inline char *
+proceed_one_char(char *dest, const unsigned char c)
+{
+    uint8_t len = html_escape_table[c].len;
+    if (len) {
+        memcpy(dest, html_escape_table[c].str, len);
+        dest += len;
+    }
+    else {
+        *dest++ = c;
+    }
+    return dest;
+}
+
+#define FAST_EACH_CHAR() \
+    /* Manual loop unrolling to align word access */            \
+    for (; end - cstr >= 4; cstr += 4) {                        \
+        /* Prefetch four bytes */                               \
+        const unsigned char c0 = cstr[0];                       \
+        const unsigned char c1 = cstr[1];                       \
+        const unsigned char c2 = cstr[2];                       \
+        const unsigned char c3 = cstr[3];                       \
+        /* return cstr instead of cstr + N for alignment */     \
+        BLOCK(c0, cstr);                                        \
+        BLOCK(c1, cstr);                                        \
+        BLOCK(c2, cstr);                                        \
+        BLOCK(c3, cstr);                                        \
+    }                                                           \
+    /* The original loop */                                     \
+    while (cstr < end) {                                        \
+        const unsigned char c = cstr[0];                        \
+        BLOCK(c, cstr);                                         \
+        cstr++;                                                 \
+    }
+
+static inline const char *
+scout_escape_char(const char *cstr, const char *end) {
+#define BLOCK(c, p) if (html_escape_table[c].len) return (p);
+    FAST_EACH_CHAR();
+#undef BLOCK
+    return NULL;
+}
+
+static inline char *
+escape_cstr(char *dest, const char *cstr, const char *end)
+{
+#define BLOCK(c, p) dest = proceed_one_char(dest, (c));
+    FAST_EACH_CHAR();
+#undef BLOCK
+    return dest;
+}
+
 static VALUE
 optimized_escape_html(VALUE str)
 {
-    VALUE vbuf;
-    char *buf = ALLOCV_N(char, vbuf, RSTRING_LEN(str) * HTML_ESCAPE_MAX_LEN);
     const char *cstr = RSTRING_PTR(str);
-    const char *end = cstr + RSTRING_LEN(str);
-
-    char *dest = buf;
-    while (cstr < end) {
-        const unsigned char c = *cstr++;
-        uint8_t len = html_escape_table[c].len;
-        if (len) {
-            memcpy(dest, html_escape_table[c].str, len);
-            dest += len;
-        }
-        else {
-            *dest++ = c;
-        }
-    }
+    long len = RSTRING_LEN(str);
+    const char *end = cstr + len;
+
+    const char *first = scout_escape_char(cstr, end);
+
+    if (!first) return rb_str_dup(str);
+
+    if (len < 20) {
+        char *buf = ALLOCA_N(char, len * HTML_ESCAPE_MAX_LEN);
+        memcpy(buf, cstr, first - cstr);
+        char *dest = escape_cstr(buf + (first - cstr), first, end);
 
-    VALUE escaped;
-    if (RSTRING_LEN(str) < (dest - buf)) {
-        escaped = rb_str_new(buf, dest - buf);
+        VALUE escaped = rb_str_new(buf, dest - buf);
         preserve_original_state(str, escaped);
+        return escaped;
     }
-    else {
-        escaped = rb_str_dup(str);
-    }
+
+    VALUE vbuf;
+    char *buf = ALLOCV_N(char, vbuf, len * HTML_ESCAPE_MAX_LEN);
+
+    memcpy(buf, cstr, first - cstr);
+    char *dest = escape_cstr(buf + (first - cstr), first, end);
+
+    VALUE escaped = rb_str_new(buf, dest - buf);
+    preserve_original_state(str, escaped);
     ALLOCV_END(vbuf);
     return escaped;
 }

k0kubun · 2019-06-06T08:51:39Z

I don't think the implementation has to be much complicated to avoid buffer allocation. Just make "Find next token" method that takes current char * and returns next one, or null if not found. Then, you just use a simple while loop, copying and substituting in as required. You can do first check out of loop, and if it's null, just return the original string.

My point is that having the "Find next token" is already not as simple as the current implementation, and the benefit should be big enough to accept it. My patch did not improve the no-escape performance that much, but yours might do 🙂

The reason why it's good optimisation is because it avoids allocation, GC pressure, etc. I would say many strings have no sequence that requires escape.

The argument sounds fair, but I'd also say many strings are shorter than 170 characters (170 * 6 < RUBY_ALLOCV_LIMIT) and thus it just uses a stack (does not pressure GC) in RB_ALLOCV_N and is less harmful.

ioquatix · 2019-06-06T09:07:05Z

What is the reason to call rb_str_dup on fast path?

mattn · 2019-06-06T09:26:31Z

I'm not sure and I'm not Rubyist but calling rb_str_dup is required.

require 'cgi'

a = "hello"
b = CGI.escapeHTML(a)
a.gsub! /l/, 'L'
puts b

ioquatix · 2019-06-06T10:01:38Z

I think that this is a great improvement and I think this implementation is fast enough.

I have existing benchmarks, so I added CGI.escapeHTML from Ruby 2.6.2. Here are the results.

Trenni::Markup
Warming up --------------------------------------
CGI.escapeHTML(general_string)
                       206.508k i/100ms
CGI.escapeHTML(code_string)
                       117.813k i/100ms
Trenni::Markup.escape_string(general_string)
                       234.318k i/100ms
Trenni::Markup.escape_string(code_string)
                       109.329k i/100ms
Calculating -------------------------------------
CGI.escapeHTML(general_string)
                          4.360M (± 1.1%) i/s -     21.890M in   5.021099s
CGI.escapeHTML(code_string)
                          1.733M (± 3.1%) i/s -      8.718M in   5.035770s
Trenni::Markup.escape_string(general_string)
                          5.367M (± 3.4%) i/s -     26.947M in   5.027060s
Trenni::Markup.escape_string(code_string)
                          1.520M (± 6.4%) i/s -      7.653M in   5.055731s

Comparison:
Trenni::Markup.escape_string(general_string):  5367197.1 i/s
CGI.escapeHTML(general_string):  4360162.9 i/s - 1.23x  slower
CGI.escapeHTML(code_string):  1732852.9 i/s - 3.10x  slower
Trenni::Markup.escape_string(code_string):  1520189.0 i/s - 3.53x  slower

This implementation probably doesn't beat Trenni's implementation, but I will test it once it's merged.

Ruby does have some basic string CoW so maybe performance hit is not so bad when calling rb_str_dup. If string is very big, it might be a bigger issue...

In my experience, typical use case is appending to an output buffer. So, I think it's silly to duplicate a string in memory for the sole purpose of appending to another buffer. In my testing, avoiding this operation was a huge performance win, to then point where all my operations became appends:

https://github.com/ioquatix/trenni/blob/master/ext/trenni/escape.h#L11-L12

Overall, this and several other optimisations allow Trenni templates to be 10x or more faster than ERB, even while using escaping by default. While we can't utilise the append & escape operation without changing the existing method, maybe it's not silly to add it, e.g. CGI.escapeHTML(text, buffer) which appends the escaped text into the given buffer. I think this operation is a big performance win.

k0kubun · 2019-06-06T10:03:15Z

What is the reason to call rb_str_dup on fast path?

mattn's comment is right. Also I already explained that in #2226 (comment):

CGI.escapeHTML (before/after): It creates another String object by rb_str_dup(str). This cannot be optimized for backward compatibility with old CGI.escapeHTML using gsub.

ioquatix · 2019-06-06T10:04:24Z

@k0kubun sorry I didn't clearly read all your detailed notes. Thanks for such information.

ioquatix · 2019-06-06T10:08:13Z

CGI.escapeHTML (after): By allocating str.length * 6 (" is 6 chars) on stack (to avoid heap management and fragmentation. It uses a heap if the size is too big), buffer extension never happens.

On this point, do you think it makes sense to add something like rb_str_reserve(VALUE self, size_t n) where we expand capacity to support at least n additional bytes without any more memory allocations?

k0kubun · 2019-06-06T10:14:37Z

At least I think it does not help CGI.escapeHTML as long as we do not change any behavior of it (For no-escape case, rb_str_dup is mandatory and calling rb_str_reserver just doesn't help anything. For escaped case, if it's for the result string, we'd need to resize again to avoid consuming too much memory). It'd be helpful if there were CGI.escapeHTML!.

Oh by the way, when fixing ALLOCA_N problem, nobu suggested to use rb_str_tmp_new with a large size first and call rb_str_resize to shrink it at the end, and I think it's close to your idea.
But calling them is much slower than rb_str_dup and so we need to lazily call it. The above discussion about complication applies to this too.

Anyway I think rb_str_resize is a different topic. Please file a ticket and discuss there.

and switch-case branches. Buffer allocation optimization using `ALLOCA_N` would be the main benefit of patch. It eliminates the O(N) buffer extensions. It also reduces the number of branches using escape table like https://mattn.kaoriya.net/software/lang/c/20160817011915.htm. Closes: ruby/ruby#2226 Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org> Co-authored-by: Yasuhiro MATSUMOTO <mattn.jp@gmail.com>

Optimize CGI.escapeHTML by reducing buffer extension

2599308

and switch-case branches.

nobu reviewed Jun 4, 2019

View reviewed changes

ext/cgi/escape/escape.c Outdated Show resolved Hide resolved

nobu reviewed Jun 4, 2019

View reviewed changes

ext/cgi/escape/escape.c Outdated Show resolved Hide resolved

k0kubun force-pushed the no-switch-html-escape branch from 751b4bf to 4d23502 Compare June 4, 2019 15:24

Avoid calling strlen at the end

0bc4478

k0kubun force-pushed the no-switch-html-escape branch from 4d23502 to 0bc4478 Compare June 4, 2019 15:32

mattn reviewed Jun 4, 2019

View reviewed changes

Define temporary len variable on initialization

9bb706a

k0kubun added 2 commits June 5, 2019 09:59

Simplify initialization

691642a

patched by nobu.

Fix compile error on variably modified 'str'

3396577

matzbot closed this in 8d81e59 Jun 5, 2019

k0kubun deleted the no-switch-html-escape branch June 5, 2019 01:25

matzbot pushed a commit that referenced this pull request Jun 5, 2019

NEWS: Note about CGI.escapeHTML change [ci skip]

6dc0541

See #2226 for benchmark results.

ioquatix reviewed Jun 6, 2019

View reviewed changes

ahorek mentioned this pull request Oct 22, 2019

port CGI.escapeHTML optimization jruby/jruby#5937

Closed

eregon mentioned this pull request Dec 30, 2019

Write specs for new Ruby 2.7 features and changes ruby/spec#745

Open

70 tasks

georgie84 mentioned this pull request Nov 13, 2020

Ruby 2.7 Support jruby/jruby#6464

Closed

k0kubun mentioned this pull request Feb 19, 2021

HAML+Rails can lead to doubly escaped attributes haml/haml#1051

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize CGI.escapeHTML by reducing buffer extension and branches #2226

Optimize CGI.escapeHTML by reducing buffer extension and branches #2226

k0kubun commented Jun 4, 2019 •

edited

Loading

mattn Jun 4, 2019

k0kubun Jun 4, 2019

rafaelfranca commented Jun 4, 2019

k0kubun commented Jun 5, 2019 •

edited

Loading

k0kubun commented Jun 5, 2019

k0kubun commented Jun 5, 2019

rafaelfranca commented Jun 5, 2019

tenderlove commented Jun 6, 2019

ioquatix commented Jun 6, 2019

ioquatix Jun 6, 2019

k0kubun Jun 6, 2019

ioquatix Jun 6, 2019

k0kubun commented Jun 6, 2019

ioquatix commented Jun 6, 2019

mame commented Jun 6, 2019

k0kubun commented Jun 6, 2019 •

edited

Loading

ioquatix commented Jun 6, 2019

mattn commented Jun 6, 2019

ioquatix commented Jun 6, 2019 •

edited

Loading

k0kubun commented Jun 6, 2019

ioquatix commented Jun 6, 2019

ioquatix commented Jun 6, 2019 •

edited

Loading

k0kubun commented Jun 6, 2019 •

edited

Loading

Optimize CGI.escapeHTML by reducing buffer extension and branches #2226

Optimize CGI.escapeHTML by reducing buffer extension and branches #2226

Conversation

k0kubun commented Jun 4, 2019 • edited Loading

Benchmark

escape_utils.gem's benchmark

Other scenarios

mattn Jun 4, 2019

Choose a reason for hiding this comment

k0kubun Jun 4, 2019

Choose a reason for hiding this comment

rafaelfranca commented Jun 4, 2019

k0kubun commented Jun 5, 2019 • edited Loading

escape_utils/benchmark/html_escape.rb

benchmark/cgi_escape_html.yml

Why

k0kubun commented Jun 5, 2019

k0kubun commented Jun 5, 2019

rafaelfranca commented Jun 5, 2019

tenderlove commented Jun 6, 2019

ioquatix commented Jun 6, 2019

ioquatix Jun 6, 2019

Choose a reason for hiding this comment

k0kubun Jun 6, 2019

Choose a reason for hiding this comment

ioquatix Jun 6, 2019

Choose a reason for hiding this comment

k0kubun commented Jun 6, 2019

ioquatix commented Jun 6, 2019

mame commented Jun 6, 2019

k0kubun commented Jun 6, 2019 • edited Loading

ioquatix commented Jun 6, 2019

mattn commented Jun 6, 2019

ioquatix commented Jun 6, 2019 • edited Loading

k0kubun commented Jun 6, 2019

ioquatix commented Jun 6, 2019

ioquatix commented Jun 6, 2019 • edited Loading

k0kubun commented Jun 6, 2019 • edited Loading

k0kubun commented Jun 4, 2019 •

edited

Loading

k0kubun commented Jun 5, 2019 •

edited

Loading

k0kubun commented Jun 6, 2019 •

edited

Loading

ioquatix commented Jun 6, 2019 •

edited

Loading

ioquatix commented Jun 6, 2019 •

edited

Loading

k0kubun commented Jun 6, 2019 •

edited

Loading