New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CGI.escapeHTML for ASCII-compatible encodings #1164

Closed
wants to merge 4 commits into
base: trunk
from

Conversation

3 participants
@k0kubun
Member

k0kubun commented Dec 20, 2015

As commented in #156 (comment), I rewrote CGI.escapeHTML in C, which is used by ERB::Util#html_escape.
Since escaping HTML is expensive in rendering a template, I want it to be faster.
For now, I optimized it only for strings whose encoding is ASCII-compatible.

With this benchmark https://gist.github.com/k0kubun/b6af6062bc876190e280, it's about 7 times faster than original implementation in escaping html.

$ ruby bench_escape_html.rb
Calculating -------------------------------------
              before    11.448k i/100ms
               after    31.189k i/100ms
-------------------------------------------------
              before    216.403k (± 7.0%) i/s -      2.152M
               after      1.637M (±10.0%) i/s -     16.125M

Comparison:
               after:  1637408.5 i/s
              before:   216403.5 i/s - 7.57x slower
Show outdated Hide outdated ext/cgi/escape/escape.c
static VALUE rb_cCGI, rb_mUtil, rb_mEscape;
static VALUE
html_escaped_cat(VALUE str, char c)

This comment has been minimized.

@nobu

nobu Dec 20, 2015

Member

this function returns no values.

@nobu

nobu Dec 20, 2015

Member

this function returns no values.

This comment has been minimized.

@k0kubun

k0kubun Dec 20, 2015

Member

Thank you. Fixed in k0kubun@39cdd83.

@k0kubun

k0kubun Dec 20, 2015

Member

Thank you. Fixed in k0kubun@39cdd83.

Show outdated Hide outdated ext/cgi/escape/escape.c
if (modified) {
rb_str_cat(dest, cstr + beg, len - beg);
return dest;

This comment has been minimized.

@nobu

nobu Dec 20, 2015

Member

the encoding of str is ignored and dest is always ASCII-8BIT.

@nobu

nobu Dec 20, 2015

Member

the encoding of str is ignored and dest is always ASCII-8BIT.

This comment has been minimized.

@k0kubun

k0kubun Dec 20, 2015

Member

I changed it to associate original encoding and added test case for it in k0kubun@2162835.

@k0kubun

k0kubun Dec 20, 2015

Member

I changed it to associate original encoding and added test case for it in k0kubun@2162835.

Show outdated Hide outdated ext/cgi/escape/escape.c
static VALUE
optimized_escape_html(VALUE str)
{
long i, len, modified = 0, beg = 0, offset = 0;

This comment has been minimized.

@nobu

nobu Dec 20, 2015

Member

offset is unused.

@nobu

nobu Dec 20, 2015

Member

offset is unused.

This comment has been minimized.

@k0kubun
@k0kubun

@nobu nobu closed this in ce7f7f5 Dec 20, 2015

@k0kubun k0kubun deleted the k0kubun:c-escape-html branch Dec 20, 2015

@knu

This comment has been minimized.

Show comment
Hide comment
@knu

knu Dec 24, 2015

Member

How much is the final performance gain after fixing the return value not being duped when no replacement takes place?

Member

knu commented Dec 24, 2015

How much is the final performance gain after fixing the return value not being duped when no replacement takes place?

@k0kubun

This comment has been minimized.

Show comment
Hide comment
@k0kubun

k0kubun Dec 24, 2015

Member

You can check performance gain with this benchmark https://gist.github.com/k0kubun/8e1c7efb1e29991e1382.

This is the result of ruby compiled from latest revision b58b970.

$ ruby bench_escape.rb "'&\"<>"
Escape: '&"<>
Calculating -------------------------------------
              before    15.457k i/100ms
               after    63.931k i/100ms
-------------------------------------------------
              before    199.975k (± 5.1%) i/s -      1.005M
               after      1.453M (± 5.8%) i/s -      7.288M

Comparison:
               after:  1453098.6 i/s
              before:   199974.8 i/s - 7.27x slower

$ ruby bench_escape.rb "hello world"
Escape: hello world
Calculating -------------------------------------
              before    56.973k i/100ms
               after   100.344k i/100ms
-------------------------------------------------
              before      1.474M (± 6.1%) i/s -      7.350M
               after      4.419M (± 7.3%) i/s -     21.975M

Comparison:
               after:  4419281.2 i/s
              before:  1473860.3 i/s - 3.00x slower
Member

k0kubun commented Dec 24, 2015

You can check performance gain with this benchmark https://gist.github.com/k0kubun/8e1c7efb1e29991e1382.

This is the result of ruby compiled from latest revision b58b970.

$ ruby bench_escape.rb "'&\"<>"
Escape: '&"<>
Calculating -------------------------------------
              before    15.457k i/100ms
               after    63.931k i/100ms
-------------------------------------------------
              before    199.975k (± 5.1%) i/s -      1.005M
               after      1.453M (± 5.8%) i/s -      7.288M

Comparison:
               after:  1453098.6 i/s
              before:   199974.8 i/s - 7.27x slower

$ ruby bench_escape.rb "hello world"
Escape: hello world
Calculating -------------------------------------
              before    56.973k i/100ms
               after   100.344k i/100ms
-------------------------------------------------
              before      1.474M (± 6.1%) i/s -      7.350M
               after      4.419M (± 7.3%) i/s -     21.975M

Comparison:
               after:  4419281.2 i/s
              before:  1473860.3 i/s - 3.00x slower
@knu

This comment has been minimized.

Show comment
Hide comment
@knu

knu Dec 24, 2015

Member

Thanks. So, the new implementation is 3x faster even in the worst cases. Great job!

Member

knu commented Dec 24, 2015

Thanks. So, the new implementation is 3x faster even in the worst cases. Great job!

mrkn pushed a commit to mrkn/ruby that referenced this pull request Apr 17, 2016

cgi/escape: Optimize CGI.escapeHTML
* cgi/escape/escape.c: Optimize CGI.escapeHTML for
  ASCII-compatible encodings.  [Fix GH-1164]

git-svn-id: svn+ssh://svn.ruby-lang.org/ruby/trunk@53220 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

@headius headius referenced this pull request Apr 20, 2016

Open

Remaining 2.3 checklist items not in JRuby 9.1 #3816

8 of 23 tasks complete

k0kubun added a commit to k0kubun/haml that referenced this pull request Feb 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment