Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CGI.escapeHTML by reducing buffer extension and branches #2226

Closed
wants to merge 5 commits into from

Conversation

@k0kubun
Copy link
Member

k0kubun commented Jun 4, 2019

Benchmark

Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores, gcc 7.3.0

escape_utils.gem's benchmark

Here's the benchmark result of escape_utils/benchmark/html_escape.rb using this CGI.escapeHTML. Originally CGI.escapeHTML was about 0.72x of EscapeUtils.escape_html, and now it's 2.48x.

$ bundle exec ruby -v benchmark/html_escape.rb
ruby 2.7.0dev (2019-06-05 master 0a29dc87e6) [x86_64-linux]
last_commit=Optimize CGI.escapeHTML by reducing buffer extension
Escaping 154483 bytes of html from https://en.wikipedia.org/wiki/Succession_to_the_British_throne
Warming up --------------------------------------
Rack::Utils.escape_html
                        16.000  i/100ms
Haml::Helpers.html_escape
                       616.000  i/100ms
ERB::Util.html_escape
                       626.000  i/100ms
      CGI.escapeHTML   631.000  i/100ms
         String#gsub    24.000  i/100ms
fast_xs_extra#fast_xs_html
                       332.000  i/100ms
EscapeUtils.escape_html
                       255.000  i/100ms
Calculating -------------------------------------
Rack::Utils.escape_html
                        166.291  (± 2.4%) i/s -    832.000  in   5.006558s
Haml::Helpers.html_escape
                          6.376k (± 2.7%) i/s -     32.032k in   5.028389s
ERB::Util.html_escape
                          6.366k (± 3.6%) i/s -     31.926k in   5.022500s
      CGI.escapeHTML      6.386k (± 3.1%) i/s -     32.181k in   5.045185s
         String#gsub    240.854  (± 1.2%) i/s -      1.224k in   5.082920s
fast_xs_extra#fast_xs_html
                          3.345k (± 1.8%) i/s -     16.932k in   5.064190s
EscapeUtils.escape_html
                          2.572k (± 3.0%) i/s -     13.005k in   5.060726s

Comparison:
      CGI.escapeHTML:        6385.6 i/s
Haml::Helpers.html_escape:   6375.6 i/s - same-ish: difference falls within error
ERB::Util.html_escape:       6366.2 i/s - same-ish: difference falls within error
fast_xs_extra#fast_xs_html:  3344.6 i/s - 1.91x  slower
EscapeUtils.escape_html:     2572.2 i/s - 2.48x  slower
         String#gsub:        240.9 i/s - 26.51x  slower
Rack::Utils.escape_html:     166.3 i/s - 38.40x  slower

Note: Haml::Helpers.html_escape uses ERB::Util.html_escape which uses CGI.escapeHTML, so those 3 are the same.

Other scenarios

When there's at least one escaped character (_one, _all, _real), it becomes 1.91~5.35x.
When there's nothing to be escaped (_blank, _none), unfortunately it becomes 1.08~1.12x slower.

$ benchmark-driver benchmark/cgi_escape_html.yml -v --rbenv 'before;after' --repeat-count=8
before: ruby 2.7.0dev (2019-06-05 master f3c877e8de) [x86_64-linux]
after: ruby 2.7.0dev (2019-06-05 master 0a29dc87e6) [x86_64-linux]
last_commit=Optimize CGI.escapeHTML by reducing buffer extension
Calculating -------------------------------------
                           before       after
     escape_html_blank    26.946M     25.011M i/s -     20.000M times in 0.742213s 0.799655s
escape_html_short_none    25.898M     23.174M i/s -     20.000M times in 0.772249s 0.863042s
 escape_html_short_one     8.046M     17.748M i/s -     20.000M times in 2.485578s 1.126892s
 escape_html_short_all     5.539M     10.561M i/s -      5.000M times in 0.902758s 0.473462s
 escape_html_long_none     1.373M      1.231M i/s -      1.000M times in 0.728400s 0.812565s
  escape_html_long_all     1.093M      5.849M i/s -      1.000M times in 0.914765s 0.170957s
      escape_html_real     1.121M      2.565M i/s -      1.000M times in 0.891941s 0.389921s

Comparison:
                  escape_html_blank
                before:  26946437.8 i/s
                 after:  25010784.2 i/s - 1.08x  slower

             escape_html_short_none
                before:  25898374.9 i/s
                 after:  23173849.3 i/s - 1.12x  slower

              escape_html_short_one
                 after:  17747937.4 i/s
                before:   8046417.1 i/s - 2.21x  slower

              escape_html_short_all
                 after:  10560503.5 i/s
                before:   5538580.2 i/s - 1.91x  slower

              escape_html_long_none
                before:   1372873.0 i/s
                 after:   1230671.0 i/s - 1.12x  slower

               escape_html_long_all
                 after:   5849414.4 i/s
                before:   1093177.0 i/s - 5.35x  slower

                   escape_html_real
                 after:   2564622.4 i/s
                before:   1121150.9 i/s - 2.29x  slower
and switch-case branches.
ext/cgi/escape/escape.c Outdated Show resolved Hide resolved
ext/cgi/escape/escape.c Outdated Show resolved Hide resolved
@k0kubun k0kubun force-pushed the k0kubun:no-switch-html-escape branch from 751b4bf to 4d23502 Jun 4, 2019
@k0kubun k0kubun force-pushed the k0kubun:no-switch-html-escape branch from 4d23502 to 0bc4478 Jun 4, 2019
break;
}
#define HTML_ESCAPE(c, str) do { \
html_escape_table[c] = str; \

This comment has been minimized.

Copy link
@mattn

mattn Jun 4, 2019

Contributor

Possibly, define variable int len = strlen(str); here, and use it following?

This comment has been minimized.

Copy link
@k0kubun

k0kubun Jun 4, 2019

Author Member

That might be robust and easier to read. I did so 9bb706a

@rafaelfranca

This comment has been minimized.

Copy link
Contributor

rafaelfranca commented Jun 4, 2019

This is great! I was trying to understand why escape_utils was still faster than CGI.escapeHTML and buffer extensions seems to be the cause. Would it be good to compare if your new implementation is closer or faster than escape_utils?

@k0kubun

This comment has been minimized.

Copy link
Member Author

k0kubun commented Jun 5, 2019

Thanks for your comment.

escape_utils/benchmark/html_escape.rb

Here's the benchmark result of escape_utils/benchmark/html_escape.rb using this CGI.escapeHTML on my machine (Intel 4.0GHz i7-4790K, 16GB memory, x86-64 Ubuntu 8 Cores, GCC 7.4.0):

$ bundle exec ruby -v benchmark/html_escape.rb
ruby 2.7.0dev (2019-06-05 master 0a29dc87e6) [x86_64-linux]
last_commit=Optimize CGI.escapeHTML by reducing buffer extension
Escaping 154483 bytes of html from https://en.wikipedia.org/wiki/Succession_to_the_British_throne
Warming up --------------------------------------
Rack::Utils.escape_html
                        16.000  i/100ms
Haml::Helpers.html_escape
                       616.000  i/100ms
ERB::Util.html_escape
                       626.000  i/100ms
      CGI.escapeHTML   631.000  i/100ms
         String#gsub    24.000  i/100ms
fast_xs_extra#fast_xs_html
                       332.000  i/100ms
EscapeUtils.escape_html
                       255.000  i/100ms
Calculating -------------------------------------
Rack::Utils.escape_html
                        166.291  (± 2.4%) i/s -    832.000  in   5.006558s
Haml::Helpers.html_escape
                          6.376k (± 2.7%) i/s -     32.032k in   5.028389s
ERB::Util.html_escape
                          6.366k (± 3.6%) i/s -     31.926k in   5.022500s
      CGI.escapeHTML      6.386k (± 3.1%) i/s -     32.181k in   5.045185s
         String#gsub    240.854  (± 1.2%) i/s -      1.224k in   5.082920s
fast_xs_extra#fast_xs_html
                          3.345k (± 1.8%) i/s -     16.932k in   5.064190s
EscapeUtils.escape_html
                          2.572k (± 3.0%) i/s -     13.005k in   5.060726s

Comparison:
      CGI.escapeHTML:        6385.6 i/s
Haml::Helpers.html_escape:   6375.6 i/s - same-ish: difference falls within error
ERB::Util.html_escape:       6366.2 i/s - same-ish: difference falls within error
fast_xs_extra#fast_xs_html:  3344.6 i/s - 1.91x  slower
EscapeUtils.escape_html:     2572.2 i/s - 2.48x  slower
         String#gsub:        240.9 i/s - 26.51x  slower
Rack::Utils.escape_html:     166.3 i/s - 38.40x  slower

Note: Haml::Helpers.html_escape uses ERB::Util.html_escape which uses CGI.escapeHTML (because I maintain both template engines), so those 3 are the same.

benchmark/cgi_escape_html.yml

Same environment as above, but with benchmarks in this PR:

$ benchmark-driver benchmark/cgi_escape_html.yml -v --rbenv 'before;after;escape_utils::before -rescape_utils -rescape_utils/html/cgi'
before: ruby 2.7.0dev (2019-06-05 master f3c877e8de) [x86_64-linux]
after: ruby 2.7.0dev (2019-06-05 master 0a29dc87e6) [x86_64-linux]
last_commit=Optimize CGI.escapeHTML by reducing buffer extension
escape_utils: ruby 2.7.0dev (2019-06-05 master f3c877e8de) [x86_64-linux]
Calculating -------------------------------------
                           before       after  escape_utils
     escape_html_blank    26.893M     24.954M       13.290M i/s -     20.000M times in 0.743694s 0.801490s 1.504855s
escape_html_short_none    25.234M     22.703M       21.202M i/s -     20.000M times in 0.792582s 0.880929s 0.943310s
 escape_html_short_one     7.966M     17.724M        7.958M i/s -     20.000M times in 2.510630s 1.128417s 2.513101s
 escape_html_short_all     5.494M     10.506M        4.656M i/s -      5.000M times in 0.910155s 0.475900s 1.073980s
 escape_html_long_none     1.373M      1.228M        1.897M i/s -      1.000M times in 0.728313s 0.814626s 0.527014s
  escape_html_long_all     1.094M      5.632M        1.386M i/s -      1.000M times in 0.914260s 0.177545s 0.721668s
      escape_html_real     1.099M      2.219M        1.212M i/s -      1.000M times in 0.910291s 0.450618s 0.825238s

Comparison:
                  escape_html_blank
                before:  26892772.9 i/s
                 after:  24953535.0 i/s - 1.08x  slower
          escape_utils:  13290316.8 i/s - 2.02x  slower

             escape_html_short_none
                before:  25233996.7 i/s
                 after:  22703310.4 i/s - 1.11x  slower
          escape_utils:  21201938.3 i/s - 1.19x  slower

              escape_html_short_one
                 after:  17723937.9 i/s
                before:   7966129.6 i/s - 2.22x  slower
          escape_utils:   7958295.4 i/s - 2.23x  slower

              escape_html_short_all
                 after:  10506411.8 i/s
                before:   5493570.4 i/s - 1.91x  slower
          escape_utils:   4655579.9 i/s - 2.26x  slower

              escape_html_long_none
          escape_utils:   1897481.2 i/s
                before:   1373035.5 i/s - 1.38x  slower
                 after:   1227557.7 i/s - 1.55x  slower

               escape_html_long_all
                 after:   5632366.9 i/s
          escape_utils:   1385678.6 i/s - 4.06x  slower
                before:   1093780.9 i/s - 5.15x  slower

                   escape_html_real
                 after:   2219174.4 i/s
          escape_utils:   1211772.1 i/s - 1.83x  slower
                before:   1098550.0 i/s - 2.02x  slower

CGI.escapeHTML is faster except escape_html_long_none. See below for reasons.

Why

Let me clarify my understanding of characteristics of each implementation:

  • Number of buffer extensions:
    • CGI.escapeHTML (before): When N characters are extended, String is extended N times.
    • CGI.escapeHTML (after): By allocating str.length * 6 (" is 6 chars) on stack (to avoid heap management and fragmentation. It uses a heap if the size is too big), buffer extension never happens.
    • EscapeUtils.escape_html: It always speculatively extends a buffer to 1.5x. It grows a temporary buffer on a heap exponentially. So the times of buffer extension would look like O(log N) if str is long.
    • fast_xs_extra#fast_xs_html: It calculates a result buffer size beforehand, so a buffer extension never happens. The downside is that it needs to scan the entire string twice, unlike the new CGI.escapeHTML.
  • String object creation when nothing is escaped:
    • CGI.escapeHTML (before/after): It creates another String object by rb_str_dup(str). This cannot be optimized for backward compatibility with old CGI.escapeHTML using gsub. We would need to add an option or another method to optimize this.
    • EscapeHTML.escape_html: While it allocates a temporary buffer on heap, it's not used and it returns the argument. (In a strict sense, it's not compatible with CGI.escapeHTML. But this behavior would be more suitable for template engines for sure...)
      • It outperforms CGI.escapeHTML in escape_html_long_none especially because CGI.escapeHTML uses a heap when a string is long. But I believe our HTML escape argument is usually short.
    • fast_xs_extra#fast_xs_html: It always creates a new buffer and appends characters to it. It would be slightly slower than CGI.escapeHTML which just copies String by rb_str_dup(str).
  • Encoding / Taint of a new escaped string:
    • CGI.escapeHTML (before/after): Preserved. It would have an extra overhead, but maybe trivial.
    • EscapeHTML.escape_html / fast_xs_extra#fast_xs_html: Not preserved. Maybe usually fine.
  • Non-ASCII-compatible encoding support:
    • CGI.escapeHTML (before/after): It fallbacks to gsub implementation. I think others should follow it.
    • EscapeHTML.escape_html: It asserts ASIII compatibility and raises an error if not compatible.
    • fast_xs_extra#fast_xs_html: It does not check encoding, but implementation seems to assume ASCII compatibility. Not sure if it's fine.
k0kubun added 2 commits Jun 5, 2019
patched by nobu.
@matzbot matzbot closed this in 8d81e59 Jun 5, 2019
@k0kubun k0kubun deleted the k0kubun:no-switch-html-escape branch Jun 5, 2019
@k0kubun

This comment has been minimized.

Copy link
Member Author

k0kubun commented Jun 5, 2019

(It included my wrong assumption for ALLOCA_N macro. It's temporarily reverted in 71b14af but I'll fix and commit that again later.)

matzbot pushed a commit that referenced this pull request Jun 5, 2019
and switch-case branches.

Buffer allocation optimization using `ALLOCA_N` would be the main
benefit of patch. It eliminates the O(N) buffer extensions.

It also reduces the number of branches using escape table like
https://mattn.kaoriya.net/software/lang/c/20160817011915.htm.

Closes: #2226

Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
Co-authored-by: Yasuhiro MATSUMOTO <mattn.jp@gmail.com>
@k0kubun

This comment has been minimized.

Copy link
Member Author

k0kubun commented Jun 5, 2019

I committed the fixed version in 0a29dc8 and updated all above benchmark results again.

matzbot pushed a commit that referenced this pull request Jun 5, 2019
See #2226 for benchmark results.
@rafaelfranca

This comment has been minimized.

Copy link
Contributor

rafaelfranca commented Jun 5, 2019

Thank you for such detailed explanation. I'm glad that we can either kill escape_utils or at least shrink its implementation in Ruby 2.7.

@tenderlove

This comment has been minimized.

Copy link
Member

tenderlove commented Jun 6, 2019

This is great! I wonder why Rack's implementation doesn't use CGI.

@rafaelfranca I think we use the JavaScript / URI escaping stuff in escape_utils still (I wanted to remove this dependency, but couldn't)

@ioquatix

This comment has been minimized.

Copy link
Member

ioquatix commented Jun 6, 2019

I didn't do all the optimisations here, but I did do this one which I'm not sure if it's implemented here or not: https://github.com/ioquatix/trenni/blob/master/ext/trenni/escape.c#L83-L84

Basically, search the string to see if there are any characters to be escaped. If you find index to escape, then start from there (you might be able to minimise stack allocation too if you know length of remainder). If you don't find any symbol to escape, you can return early with no overhead.

} while (0)
HTML_ESCAPE('\'', "&#39;");
HTML_ESCAPE('&', "&amp;");
HTML_ESCAPE('"', "&quot;");

This comment has been minimized.

Copy link
@ioquatix

ioquatix Jun 6, 2019

Member

You could also use &#34; which is one character less :p

This comment has been minimized.

Copy link
@k0kubun

k0kubun Jun 6, 2019

Author Member

I know there are some variations, but I intended to use exactly these characters for backward compatibility.

This comment has been minimized.

Copy link
@ioquatix

ioquatix Jun 6, 2019

Member

Yeah, I agree with you, it was half-joke :)

@k0kubun

This comment has been minimized.

Copy link
Member Author

k0kubun commented Jun 6, 2019

Basically, search the string to see if there are any characters to be escaped. If you find index to escape, then start from there (you might be able to minimise stack allocation too if you know length of remainder). If you don't find any symbol to escape, you can return early with no overhead.

Before merging this, I did the experiment to skip a buffer allocation at all when nothing is escaped k0kubun#16. It slightly improved the benchmark, but it also complicates implementation (if we don't care maintainability, why not use SIMD? :p). So I intentionally skipped that for now.

@ioquatix

This comment has been minimized.

Copy link
Member

ioquatix commented Jun 6, 2019

I don't think the implementation has to be much complicated to avoid buffer allocation. Just make "Find next token" method that takes current char * and returns next one, or null if not found. Then, you just use a simple while loop, copying and substituting in as required. You can do first check out of loop, and if it's null, just return the original string.

The reason why it's good optimisation is because it avoids allocation, GC pressure, etc. I would say many strings have no sequence that requires escape.

@mame

This comment has been minimized.

Copy link
Member

mame commented Jun 6, 2019

For the record: my complicated patch.

  • Word-aligned copy instead of byte-aligned
  • On-demand allocation
  • ALLOCA for smaller case
Calculating -------------------------------------
                              new         old 
     escape_html_blank    19.207M     16.935M i/s -     20.000M times in 1.041281s 1.181003s
escape_html_short_none    19.507M     17.095M i/s -     20.000M times in 1.025275s 1.169951s
 escape_html_short_one    13.790M     13.669M i/s -     20.000M times in 1.450324s 1.463170s
 escape_html_short_all     8.466M      7.823M i/s -      5.000M times in 0.590598s 0.639116s
 escape_html_long_none     1.371M    939.010k i/s -      1.000M times in 0.729490s 1.064951s
  escape_html_long_all     4.447M      4.397M i/s -      1.000M times in 0.224857s 0.227434s
      escape_html_real     2.148M      1.969M i/s -      1.000M times in 0.465648s 0.507745s

Comparison:
                  escape_html_blank
                   new:  19207104.9 i/s 
                   old:  16934763.6 i/s - 1.13x  slower

             escape_html_short_none
                   new:  19506956.1 i/s 
                   old:  17094734.8 i/s - 1.14x  slower

              escape_html_short_one
                   new:  13790021.5 i/s 
                   old:  13668951.6 i/s - 1.01x  slower

              escape_html_short_all
                   new:   8465991.7 i/s 
                   old:   7823300.2 i/s - 1.08x  slower

              escape_html_long_none
                   new:   1370821.1 i/s 
                   old:    939010.4 i/s - 1.46x  slower

               escape_html_long_all
                   new:   4447270.2 i/s 
                   old:   4396889.5 i/s - 1.01x  slower

                   escape_html_real
                   new:   2147546.9 i/s 
                   old:   1969491.5 i/s - 1.09x  slower

diff --git a/ext/cgi/escape/escape.c b/ext/cgi/escape/escape.c
index 76d8f0d067..5fa8463c75 100644
--- a/ext/cgi/escape/escape.c
+++ b/ext/cgi/escape/escape.c
@@ -34,35 +34,87 @@ preserve_original_state(VALUE orig, VALUE dest)
     RB_OBJ_INFECT_RAW(dest, orig);
 }
 
+static inline char *
+proceed_one_char(char *dest, const unsigned char c)
+{
+    uint8_t len = html_escape_table[c].len;
+    if (len) {
+        memcpy(dest, html_escape_table[c].str, len);
+        dest += len;
+    }
+    else {
+        *dest++ = c;
+    }
+    return dest;
+}
+
+#define FAST_EACH_CHAR() \
+    /* Manual loop unrolling to align word access */            \
+    for (; end - cstr >= 4; cstr += 4) {                        \
+        /* Prefetch four bytes */                               \
+        const unsigned char c0 = cstr[0];                       \
+        const unsigned char c1 = cstr[1];                       \
+        const unsigned char c2 = cstr[2];                       \
+        const unsigned char c3 = cstr[3];                       \
+        /* return cstr instead of cstr + N for alignment */     \
+        BLOCK(c0, cstr);                                        \
+        BLOCK(c1, cstr);                                        \
+        BLOCK(c2, cstr);                                        \
+        BLOCK(c3, cstr);                                        \
+    }                                                           \
+    /* The original loop */                                     \
+    while (cstr < end) {                                        \
+        const unsigned char c = cstr[0];                        \
+        BLOCK(c, cstr);                                         \
+        cstr++;                                                 \
+    }
+
+static inline const char *
+scout_escape_char(const char *cstr, const char *end) {
+#define BLOCK(c, p) if (html_escape_table[c].len) return (p);
+    FAST_EACH_CHAR();
+#undef BLOCK
+    return NULL;
+}
+
+static inline char *
+escape_cstr(char *dest, const char *cstr, const char *end)
+{
+#define BLOCK(c, p) dest = proceed_one_char(dest, (c));
+    FAST_EACH_CHAR();
+#undef BLOCK
+    return dest;
+}
+
 static VALUE
 optimized_escape_html(VALUE str)
 {
-    VALUE vbuf;
-    char *buf = ALLOCV_N(char, vbuf, RSTRING_LEN(str) * HTML_ESCAPE_MAX_LEN);
     const char *cstr = RSTRING_PTR(str);
-    const char *end = cstr + RSTRING_LEN(str);
-
-    char *dest = buf;
-    while (cstr < end) {
-        const unsigned char c = *cstr++;
-        uint8_t len = html_escape_table[c].len;
-        if (len) {
-            memcpy(dest, html_escape_table[c].str, len);
-            dest += len;
-        }
-        else {
-            *dest++ = c;
-        }
-    }
+    long len = RSTRING_LEN(str);
+    const char *end = cstr + len;
+
+    const char *first = scout_escape_char(cstr, end);
+
+    if (!first) return rb_str_dup(str);
+
+    if (len < 20) {
+        char *buf = ALLOCA_N(char, len * HTML_ESCAPE_MAX_LEN);
+        memcpy(buf, cstr, first - cstr);
+        char *dest = escape_cstr(buf + (first - cstr), first, end);
 
-    VALUE escaped;
-    if (RSTRING_LEN(str) < (dest - buf)) {
-        escaped = rb_str_new(buf, dest - buf);
+        VALUE escaped = rb_str_new(buf, dest - buf);
         preserve_original_state(str, escaped);
+        return escaped;
     }
-    else {
-        escaped = rb_str_dup(str);
-    }
+
+    VALUE vbuf;
+    char *buf = ALLOCV_N(char, vbuf, len * HTML_ESCAPE_MAX_LEN);
+
+    memcpy(buf, cstr, first - cstr);
+    char *dest = escape_cstr(buf + (first - cstr), first, end);
+
+    VALUE escaped = rb_str_new(buf, dest - buf);
+    preserve_original_state(str, escaped);
     ALLOCV_END(vbuf);
     return escaped;
 }
@k0kubun

This comment has been minimized.

Copy link
Member Author

k0kubun commented Jun 6, 2019

I don't think the implementation has to be much complicated to avoid buffer allocation. Just make "Find next token" method that takes current char * and returns next one, or null if not found. Then, you just use a simple while loop, copying and substituting in as required. You can do first check out of loop, and if it's null, just return the original string.

My point is that having the "Find next token" is already not as simple as the current implementation, and the benefit should be big enough to accept it. My patch did not improve the no-escape performance that much, but yours might do 🙂

The reason why it's good optimisation is because it avoids allocation, GC pressure, etc. I would say many strings have no sequence that requires escape.

The argument sounds fair, but I'd also say many strings are shorter than 170 characters (170 * 6 < RUBY_ALLOCV_LIMIT) and thus it just uses a stack (does not pressure GC) in RB_ALLOCV_N and is less harmful.

@ioquatix

This comment has been minimized.

Copy link
Member

ioquatix commented Jun 6, 2019

What is the reason to call rb_str_dup on fast path?

@mattn

This comment has been minimized.

Copy link
Contributor

mattn commented Jun 6, 2019

I'm not sure and I'm not Rubyist but calling rb_str_dup is required.

require 'cgi'

a = "hello"
b = CGI.escapeHTML(a)
a.gsub! /l/, 'L'
puts b
@ioquatix

This comment has been minimized.

Copy link
Member

ioquatix commented Jun 6, 2019

I think that this is a great improvement and I think this implementation is fast enough.

I have existing benchmarks, so I added CGI.escapeHTML from Ruby 2.6.2. Here are the results.

Trenni::Markup
Warming up --------------------------------------
CGI.escapeHTML(general_string)
                       206.508k i/100ms
CGI.escapeHTML(code_string)
                       117.813k i/100ms
Trenni::Markup.escape_string(general_string)
                       234.318k i/100ms
Trenni::Markup.escape_string(code_string)
                       109.329k i/100ms
Calculating -------------------------------------
CGI.escapeHTML(general_string)
                          4.360M (± 1.1%) i/s -     21.890M in   5.021099s
CGI.escapeHTML(code_string)
                          1.733M (± 3.1%) i/s -      8.718M in   5.035770s
Trenni::Markup.escape_string(general_string)
                          5.367M (± 3.4%) i/s -     26.947M in   5.027060s
Trenni::Markup.escape_string(code_string)
                          1.520M (± 6.4%) i/s -      7.653M in   5.055731s

Comparison:
Trenni::Markup.escape_string(general_string):  5367197.1 i/s
CGI.escapeHTML(general_string):  4360162.9 i/s - 1.23x  slower
CGI.escapeHTML(code_string):  1732852.9 i/s - 3.10x  slower
Trenni::Markup.escape_string(code_string):  1520189.0 i/s - 3.53x  slower

This implementation probably doesn't beat Trenni's implementation, but I will test it once it's merged.

Ruby does have some basic string CoW so maybe performance hit is not so bad when calling rb_str_dup. If string is very big, it might be a bigger issue...

In my experience, typical use case is appending to an output buffer. So, I think it's silly to duplicate a string in memory for the sole purpose of appending to another buffer. In my testing, avoiding this operation was a huge performance win, to then point where all my operations became appends:

https://github.com/ioquatix/trenni/blob/master/ext/trenni/escape.h#L11-L12

Overall, this and several other optimisations allow Trenni templates to be 10x or more faster than ERB, even while using escaping by default. While we can't utilise the append & escape operation without changing the existing method, maybe it's not silly to add it, e.g. CGI.escapeHTML(text, buffer) which appends the escaped text into the given buffer. I think this operation is a big performance win.

@k0kubun

This comment has been minimized.

Copy link
Member Author

k0kubun commented Jun 6, 2019

What is the reason to call rb_str_dup on fast path?

mattn's comment is right. Also I already explained that in #2226 (comment):

CGI.escapeHTML (before/after): It creates another String object by rb_str_dup(str). This cannot be optimized for backward compatibility with old CGI.escapeHTML using gsub.

@ioquatix

This comment has been minimized.

Copy link
Member

ioquatix commented Jun 6, 2019

@k0kubun sorry I didn't clearly read all your detailed notes. Thanks for such information.

@ioquatix

This comment has been minimized.

Copy link
Member

ioquatix commented Jun 6, 2019

CGI.escapeHTML (after): By allocating str.length * 6 (" is 6 chars) on stack (to avoid heap management and fragmentation. It uses a heap if the size is too big), buffer extension never happens.

On this point, do you think it makes sense to add something like rb_str_reserve(VALUE self, size_t n) where we expand capacity to support at least n additional bytes without any more memory allocations?

@k0kubun

This comment has been minimized.

Copy link
Member Author

k0kubun commented Jun 6, 2019

At least I think it does not help CGI.escapeHTML as long as we do not change any behavior of it (For no-escape case, rb_str_dup is mandatory and calling rb_str_reserver just doesn't help anything. For escaped case, if it's for the result string, we'd need to resize again to avoid consuming too much memory). It'd be helpful if there were CGI.escapeHTML!.

Oh by the way, when fixing ALLOCA_N problem, nobu suggested to use rb_str_tmp_new with a large size first and call rb_str_resize to shrink it at the end, and I think it's close to your idea.
But calling them is much slower than rb_str_dup and so we need to lazily call it. The above discussion about complication applies to this too.

Anyway I think rb_str_resize is a different topic. Please file a ticket and discuss there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
7 participants
You can’t perform that action at this time.