Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the implementation of escapeTextContentForBrowser with escape-html #6862

Merged
merged 4 commits into from
Jun 2, 2016

Conversation

aickin
Copy link
Contributor

@aickin aickin commented May 25, 2016

While working on #6836, I found that escapeTextContentForBrowser can take up a significant amount of time in server rendering because it gets called for every string child and every attribute value.

By replacing the current implementation with escape-html, my server rendering test cases see a reduction of 8-20% in server rendering time, depending on how many attributes there are per element and whether or not the values have special characters.

I don't know what the react policy is on dependencies, but it looks like there are other MIT-licensed dependencies already, so I thought this might be possible.

'<div title="&#x27;&quot;&lt;&gt;&amp;" style="text-align:&#x27;&quot;&lt;&gt;&amp;;">' +
'&#x27;&quot;&lt;&gt;&amp;' +
'<div title="&#39;&quot;&lt;&gt;&amp;" style="text-align:&#39;&quot;&lt;&gt;&amp;;">' +
'&#39;&quot;&lt;&gt;&amp;' +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escape-html uses &#39 instead of &#x27, which should be a completely equivalent HTML entity for single quote (and it's one byte less, too).

@zpao
Copy link
Member

zpao commented May 25, 2016

I don't know what the react policy is on dependencies, but it looks like there are other MIT-licensed dependencies already, so I thought this might be possible.

object-assign & fbjs are the only actual prod deps and they're only added here as part of a transform. We don't do the same internally. Currently all requires resolve internally which will make this a little bit tricker until we revamp how we sync (which will happen but hasn't yet). So short version is that it should be fine but will need some internal FB work done so might take a little longer to make a final call here.

@syranide
Copy link
Contributor

syranide commented May 25, 2016

From previous benchmarking, pre-testing if there are any matches before doing replace would provide a measurable performance increase, but this was discarded in-favor of simplicity for the time being (presumed not to really matter). With that minor tweak I imagine the difference would then be negligible. I'm also suspecting your test-cases are heavily skewed towards values that need escaping or I fail to see how it could have such a massive impact, so in practice it shouldn't result in a significant difference anyway I would assume.

Also, there's a not-super-obvious twist here, escapeTextContentForHtml is meant to be explicitly about text content and not for attribute values (there is technically no need to escape <> in text content). However because it's a touchy subject it was decided to err on the side of safety for the time being at least, but in the future we may want to change that. So it may be make sense to keep our own implementation here.

PS. I also suspect changing to &#39; will cause a bunch of peoples tests to fail which may be unnecessary churn.

@aickin
Copy link
Contributor Author

aickin commented May 25, 2016

@zpao Thanks for the response!

Currently all requires resolve internally which will make this a little bit tricker until we revamp how we sync (which will happen but hasn't yet).

I'm curious; is this referring to things internal to Facebook or internal to React? I tried to cover what's needed in React to include the dependency, and it at least seems to work from tests and inspection of the output build.

@syranide Thanks for responding!

this was discarded in-favor of simplicity for the time being (presumed not to really matter)

Totally understood; premature optimization is uncool.

With that minor tweak I imagine the difference would then be negligible.

I think (though I'm not 100% sure) that short circuiting for numbers and booleans also helps significantly. This is true in all test cases because escapeTextContentForBrowser is called once per element for data-reactid, which always passes in a number.

I'm also suspecting your test-cases are heavily skewed towards values that need escaping

I understand your suspicion, but it's not the case. I ran a test where the leaf nodes both did and did not need escaping, and while the escaping tests showed a larger perf boost, every case was better with this code. The numbers were:

Leaf node JSX Perf Boost
<div tabIndex="2>" hidden={true} width="30>">{"abc<efg>ij"}</div> 20%
<div>{"abc<efg>ij"}</div> 13%
<div tabIndex="2" hidden={true} width="30">{"abcdefghij"}</div> 12%
<div>{"abcdefghij"}</div> 8%

escapeTextContentForHtml is meant to be explicitly about text content and not for attribute values (there is technically no need to escape <> in text content).

Yeah, I did some exploratory testing to see if an implementation that didn't check for <> would be faster for attribute values. It didn't seem like there was any real difference, so I stuck with a function that would work for both attribute values and text content.

in the future we may want to change that. So it may be make sense to keep our own implementation here.

I might be misreading you here, but I don't understand why using escape-html now forecloses the possibility of using a different implementation in the future.

I also suspect changing to &#39; will cause a bunch of peoples tests to fail which may be unnecessary churn.

Hmmmm... good point! I'm curious; did this happen when data-reactid was changed in React 15? Is there then a general aversion to ever changing the output of renderToString, or is it just that this change doesn't have much upside?

Overall, I totally take your point that the vast majority of real-world inputs to the function will not need to be escaped, so the majority of the perf increase could be achieved with a combo of type-check short-circuit and pretesting for the escapable characters. (In fact, I wrote and almost submitted that version, but then I decided to also test perf when encoding was needed, which led me to escape-html.)

If you would rather have that version, which trades off runtime perf of the unlikely cases for simplicity of dependency management, just let me know; I'd be happy to submit it!

@syranide
Copy link
Contributor

syranide commented May 25, 2016

I might be misreading you here, but I don't understand why using escape-html now forecloses the possibility of using a different implementation in the future.

@aickin Not at all, it just seems odd to switch to an external algorithm if there's reason to believe we may want to switch back in the future. Nitpick. I was just putting all the arguments out there really, not trying to shoot it down. :)

Since it seems you've done a bit of benchmarking already, could you do a test with:

function escapeTextContentForBrowser(text) {
  text = ('' + text);
  if (ESCAPE_REGEX.test(text)) {
    return text.replace(ESCAPE_REGEX, escaper);
  }
  return text;
}

That's the optimization I was referring to, I suspect that should close the gap considerably. Possibly try with the shortcircuits for boolean and number as well?

PS. This is not my decision, I'm not sure what preferences the others have when it comes to external dependencies.

@aickin
Copy link
Contributor Author

aickin commented May 25, 2016

That's the optimization I was referring to, I suspect that should close the gap considerably.

That's almost character-for-character what I was about to submit before I moved to escape-html; great minds think alike!

So, I went ahead and ran the benchmark I've been using on four scenarios (linked to the commit if you want to look at code):

Leaf node JSX typeof pretest typeof & pretest escape-html
<div tabIndex="2>" hidden={true} width="30>">{"abc<efg>ij"}</div> 5% 1% 2% 18%
<div>{"abc<efg>ij"}</div> 5% 2% 1% 12%
<div tabIndex="2" hidden={true} width="30">{"abcdefghij"}</div> 2% 8% 10% 9%
<div>{"abcdefghij"}</div> 4% 8% 8% 7%

For test cases that don't need any escaping, pretest, typeof & pretest, and escape-html are all about the same. For test cases that do need escaping, escape-html is the clear winner.

I'm happy to do whatever. I'd probably recommend typeof & pretest or escape-html, depending on whether we care about the escaping case. Thoughts?

@zpao
Copy link
Member

zpao commented May 25, 2016

is this referring to things internal to Facebook or internal to React?

Sorry, wasn't clear. I mean internal to Facebook. Since our current sync process is "copy files from src/" we need require to be resolvable. We'll just need to have a module internally that is @providesModule escape-html to make this work, which just means it's another thing to remember.

It looks like escape-html is either fastest or close enough that it doesn't matter, so unless we duplicate their work we'll likely want to just use it. It's strictly better than what we're doing today.

I'm curious; did this happen when data-reactid was changed in React 15? Is there then a general aversion to ever changing the output of renderToString,

This is one of those areas where "semver" gets kind of vague & icky. There are many who would say that changing the output of that (when not fixing a bug) requires a major version change. I don't really feel that way but I can understand the argument. We changed reactid in a major version so it was safe from scrutiny.

@aickin
Copy link
Contributor Author

aickin commented May 25, 2016

There are many who would say that changing the output of that (when not fixing a bug) requires a major version change. I don't really feel that way but I can understand the argument. We changed reactid in a major version so it was safe from scrutiny.

Oh, I was just assuming this would go in a major version. I asked the question because I thought that @syranide was objecting that the change might cause churn in over-literal unit tests, even in a major version, and I was wondering if that churn in fact happened when data-reactid changed.

@ericf
Copy link

ericf commented May 25, 2016

FWIW, looking at a macro level CPU profile (bottom-up view), for handling a complete server-side request, I'm seeing escapeTextContentForBrowser towards the bottom and only making up 0.1ms (total time), while renderToString is 128ms (total time). This is on Node 6 with React 14.

Happy to try testing something differently and try to reproduce your results.

@aickin
Copy link
Contributor Author

aickin commented May 26, 2016

@ericf Huh, that's definitely odd.

I just checked in my test setup, and the test functions can be found here. It's not the most elegant code, but it produces reproducible results. You run it with npm install and npm run benchmark.

I used Node v5.7.1 with react@master vs. the various branches in my repo, which I tested by changing the package.json to point to them.

If I may, I'd like to ask a few questions about your setup: What were you rendering? How many nodes did it have, and how big a document did it produce? And what tools did you use to profile the render?

Thanks!

@ericf
Copy link

ericf commented May 26, 2016

@aickin here's some more details on my tests:

I'm using Chrome Dev Tools to --inspect the Node process which is now possible via the Node PR above. I'm using curl to hit the Node process to capture the CPU profile of a single navigation request which renders a full page of the app using React. The profile shape/portions of time I'm seeing on my laptop checking a single request are in line with what we see on our prod servers.

  • Response size: 432 KB (uncompressed)
  • DOM Nodes: 711 (via document.body.querySelectorAll('*') in browser)

@ericf
Copy link

ericf commented May 26, 2016

EDIT: I ran your tests in the master branch, before realizing you had an escape-text-perf branch. I updated the CPU profiles and results based on this branch…


@aickin I ran your tests with my setup and captured a CPU profile of the complete run which you should be able to open (try Chrome Canary):

Results from running your tests with my Node setup:

No Attr No Escape
Mean:    68.6 ms
Std Dev: 12.7 ms

No Attr Escape
Mean:    76.2 ms
Std Dev: 10.9 ms

Attr No Escape
Mean:    97.3 ms
Std Dev: 13.6 ms

Attr Escape
Mean:    114.7 ms
Std Dev: 12.4 ms

In both dev and prod CPU profiles, I'm not seeing escapeTextContentForBrowser bubbling up enough to show up in the Bottom Up view.

@aickin
Copy link
Contributor Author

aickin commented May 26, 2016

Thanks for working on this! (And sorry about the branch confusion...)

In both dev and prod CPU profiles, I'm not seeing escapeTextContentForBrowser bubbling up enough to show up in the Bottom Up view.

The tests always use a minified version of React to get the best perf, so you aren't going to see escapeTextContentForBrowser in there; it's been renamed. I don't really know if it's possible to profile React while it's envified and minified; I'd have to think about it for a bit.

For what it's worth, you're seeing very, very similar times to what I'm seeing on my 2.8GHz mid 2014 MBP.

@aickin
Copy link
Contributor Author

aickin commented May 26, 2016

@ericf: I spent some more poking around with the production env CPU profile that you uploaded, and I have a theory for what's going on. (And apologies in advance for the length of this post... 😊)

First off, I went through react.min.js from react@15.1.0 and found that escapeTextContentForBrowser is minified as function o(e){...} and quoteAttributeValueForBrowser (which calls escapeTextContentForBrowser) is minified as function r(e){...}. Armed with this info, I spelunked through CPU-20160526T021311.cpuprofile, and to my surprise I found the function:

screen shot 2016-05-26 at 10 58 00 am

escapeTextContentForBrowser is called from those two methods (createMarkupForID and _createContentMarkup), so this looks like a perfect match. And given that renderToString (also called o in this profile) takes 17,389.7 ms, it seems like escapeTextContentForBrowser doesn't contribute at all to the overall time.

However, there are a few unexplained quirks here. First, escapeTextContentForBrowser is also called from createMarkupForProperty, and that doesn't show up at all in the callers. Second, both createMarkupForID and createMarkupForProperty take up a significant amount of time, which is odd if escapeTextContentForBrowser isn't a problem:

screen shot 2016-05-26 at 11 07 43 am

screen shot 2016-05-26 at 11 08 32 am

Together those two methods take 3,202.4 ms, or 18.4% of the 17,389.7 ms renderToString total. (Ignore the percentages in those images, as the denominator includes a lot of idle time when renderToString isn't running.)

This was really curious, because createMarkupForID and createMarkupForProperty are both really minimal functions. createMarkupForID, for example, basically just calls into quoteAttributeForBrowser:

createMarkupForID: function(id) {
  return DOMProperty.ID_ATTRIBUTE_NAME + '=' +
    quoteAttributeValueForBrowser(id);
},

How could that be taking up 875ms of self time? Literally all it's doing is looking up an object property and two string concats!

Then I realized: inlining. v8 uses a sampling profiler, and it can't report inlined functions on the call stack, because they aren't actually on the call stack. As a result, escapeTextContentForBrowser is only showing up in the profiler for the few times it's called before it gets inlined.

To test this theory, I wrote a page with the following code and profiled it in Chrome:

function match(matcher, text) {
    if (matcher.test(text)) {
      text.replace(matcher, function() { return "b"; });
    }
}
function run() {
  var text = "asfasdfasdfasdfasdfasdfasdfadsfasdfasdf";
  var matcher = /c/;
  for (var i = 0; i < 1000000; i++) {
    match(matcher, text);
  }
}

This is the profile I got:

screen shot 2016-05-26 at 11 26 00 am

Note that match doesn't appear at all. Then I added a try/catch block to match in order to make it un-inlineable:

function match(matcher, text) {
  try {
    if (matcher.test(text)) {
      text.replace(matcher, function() { return "b"; });
    }
  } catch (e) {
    console.log(e);
  }
}

which gave the following profile:

screen shot 2016-05-26 at 11 29 26 am

Now that the function isn't inlined, we can see that the vast majority of the time is spent in match.

So, I think this explains how you and I could get such a different view of how much time escapeTextContentForBrowser is taking. v8 is inlining the function, making it invisible to the profiler.

Does this make sense? Do you agree that this is what is happening here?

And thanks so much for your work on this; it's been unexpectedly fun to dive into profiling!

@ericf
Copy link

ericf commented May 26, 2016

@aickin yeah that makes sense to explain the differences. So it's likely that escapeHTML is being inlined which takes up all the work of escapeTextContentForBrowser (since the rest is a switch statement) and that's why I'm only seeing it take 0.1ms in my full page server render profile.

@sophiebits
Copy link
Collaborator

Can you try running this on the benchmark in the scripts/bench folder?

@aickin
Copy link
Contributor Author

aickin commented May 31, 2016

@spicyj sure!

I did two things to measure.py to run the benchmark:

  1. Substituted in the full path of jsc because jsc isn't on my path.
  2. Changed measure.py line 145 to trials = 30 so that it would run warm JIT trials
$ ./measure.py react-master.min.js master.txt react-use-escape-html.min.js use-escape-html.txt 
Measuring SSR for PE benchmark (30 trials)
______________________________
..............................
Measuring SSR for PE with warm JIT (30 slow trials)
______________________________
..............................
$ ./analyze.py master.txt use-escape-html.txt 
Comparing master.txt (control) vs use-escape-html.txt (test)
Significant differences marked by ***
% change from control to test, with 99% CIs:

* factory_ms_jsc_jit
    % change:  -0.34% [ -3.36%,  +2.68%]
    means: 16.3161 (control), 16.2621 (test)
* factory_ms_jsc_nojit
    % change:  +0.92% [ -1.94%,  +3.78%]
    means: 15.1536 (control), 15.2938 (test)
* factory_ms_node
    % change:  +0.91% [ -1.11%,  +2.94%]
    means: 48.6843 (control), 49.1311 (test)
* ssr_pe_cold_ms_jsc_jit
    % change:  -1.34% [ -3.70%,  +1.02%]
    means: 44.5324 (control), 43.9364 (test)
* ssr_pe_cold_ms_jsc_nojit
    % change:  -0.43% [ -2.96%,  +2.11%]
    means: 49.4186 (control), 49.2102 (test)
* ssr_pe_cold_ms_node
    % change:  -0.50% [ -2.77%,  +1.78%]
    means: 86.9631 (control), 86.5367 (test)
* ssr_pe_warm_ms_jsc_jit
    % change:  -5.28% [ -6.67%,  -3.89%]  ***
    means: 8.7489 (control), 8.287 (test)
* ssr_pe_warm_ms_jsc_nojit
    % change:  +1.03% [ -0.47%,  +2.52%]
    means: 24.5058 (control), 24.7579 (test)
* ssr_pe_warm_ms_node
    % change:  -4.78% [ -6.51%,  -3.05%]  ***
    means: 25.3768 (control), 24.1645 (test)

Looks to me like there's significant differences in the warm-JIT cases, but not in any of the other cases, which is pretty much what I would expect. It doesn't quite see the scale of improvement that my tests did, but I'm guessing that's because of particularities of the test cases.

@sophiebits
Copy link
Collaborator

How about we copy escape-html here but change it to output the same as our current version? See src/renderers/dom/client/utils/isEventSupported.js for an example of how to do the license.

@sophiebits
Copy link
Collaborator

sophiebits commented Jun 1, 2016

Follow-up: let's use this exact header:

/**
 * Copyright 2016-present, Facebook, Inc.
 * All rights reserved.
 *
 * This source code is licensed under the BSD-style license found in the
 * LICENSE file in the root directory of this source tree. An additional grant
 * of patent rights can be found in the PATENTS file in the same directory.
 * 
 * Based on the escape-html library, which is used under the MIT License below:
 *  
 * Copyright (c) 2012-2013 TJ Holowaychuk
 * Copyright (c) 2015 Andreas Lubbe
 * Copyright (c) 2015 Tiancheng "Timothy" Gu
 *  
 * Permission is hereby granted, free of charge, to any person obtaining
 * a copy of this software and associated documentation files (the
 * 'Software'), to deal in the Software without restriction, including
 * without limitation the rights to use, copy, modify, merge, publish,
 * distribute, sublicense, and/or sell copies of the Software, and to
 * permit persons to whom the Software is furnished to do so, subject to
 * the following conditions:
 *  
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *  
 * THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
 * IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
 * CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
 * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 *
 * @providesModule escapeTextContentForBrowser
 */

…. Pulled the code of escape-html in to react and changed the encoding of single quote to &#x27.
@@ -32,7 +108,16 @@ function escaper(match) {
* @return {string} An escaped string.
*/
function escapeTextContentForBrowser(text) {
return ('' + text).replace(ESCAPE_REGEX, escaper);
switch (typeof text) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, can we inline if (typeof text === 'boolean' || typeof text === 'number')? V8, at least, inlines typeof foo === <string> in the bytecode.

@ghost
Copy link

ghost commented Jun 1, 2016

@aickin updated the pull request.

1 similar comment
@ghost
Copy link

ghost commented Jun 2, 2016

@aickin updated the pull request.

@sophiebits sophiebits added this to the 15-next milestone Jun 2, 2016
@sophiebits sophiebits merged commit d6e7058 into facebook:master Jun 2, 2016
@sophiebits
Copy link
Collaborator

Thanks!

zpao pushed a commit to zpao/react that referenced this pull request Jun 8, 2016
…-html (facebook#6862)

* Replacing the implementation of escapeTextContentForBrowser with escape-html for performance

* Addressing @spicyj's code review comment here: facebook#6862 (comment) . Pulled the code of escape-html in to react and changed the encoding of single quote to &#x27.

* Addressing code review comment facebook#6862 (comment) to make code more inlinable for v8. Thanks, @spicyj.

(cherry picked from commit d6e7058)
zpao pushed a commit that referenced this pull request Jun 14, 2016
…-html (#6862)

* Replacing the implementation of escapeTextContentForBrowser with escape-html for performance

* Addressing @spicyj's code review comment here: #6862 (comment) . Pulled the code of escape-html in to react and changed the encoding of single quote to &#x27.

* Addressing code review comment #6862 (comment) to make code more inlinable for v8. Thanks, @spicyj.

(cherry picked from commit d6e7058)
@zpao zpao modified the milestones: 15-next, 15.2.0 Jun 14, 2016
@aickin aickin mentioned this pull request Dec 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants