escapeTextContentForBrowser no longer escapes ' and ", quoteAttributeValueForBrowser no longer escapes ' #3152

syranide · 2015-02-14T14:59:37Z

After digging around and evaluating, i see the following possible rule changes:

A. escapeTextContentForBrowser ignore " and ', they have no special meaning in text content.
B. quoteAttributeValueForBrowser ignore ', can only be broken out of with " (OWASP).
C. quoteAttributeValueForBrowser ignore < and >, cannot break out of quoted attribute values.

The following safety observations are only guaranteed to hold for React generated markup, it does not hold when markup is introduced via dangerouslySetInnerHTML using different escaping/quoting or post-process manipulated.

Markup as a string in inline scripts
Proper escaping: JSON stringify + replace </script with <\/script.

With current rules (if no encoding):

Leads to XSS if inside a '-string if there are legitimate occurrences of </script> (breaks layout during load).
Leads to XSS if inside a "-string (throws error on load if markup has a quoted attribute value).

With rules A+B:

Leads to XSS if inside a '-string (throws error on load if markup has a ').

With rules A+B+C:

Leads to XSS if </script> is used as an attribute value (only observed if actively tested for).

Markup within a HTML comment
Proper encoding: HTML encode

With current rules (if no encoding):

Leads to XSS if there are legitimate occurrences of  (breaks layout during load).
Leads to XSS if comment is evaluted by a library (likely to throw error or break layout during load).

With rules A+B:

(nothing new)

With rules A+B+C:

Leads to XSS if --> is used as an attribute value (only observed if actively tested for).

The ruleset A+B is ever slightly more exploitable in the case of markup as a string within an inline script without proper encoding, but the flip-side benefit is that the lack of proper escaping is much more likely to be observed during development. I find this to be an acceptable trade-off and it's a dangerous situation to be in with or without the new rules so having a chance to catch it earlier is for the better.

So while the ruleset A+B+C is safe for HTML rendering it consistently elevates likely (relatively) minor safety issues to full-blown XSS and without increasing the chances of the lack of proper escaping being observed at development, although these are errors on behalf of the user and technically not our concern. This seems like a dangerous step that is not worth taking lightly, considering knowledge of proper escaping is far less common than it should be. < and > are also rarely used in attribute values so it would also have little practical impact. It would be easy to fix </script and --> but there's always the question of what else.

tl;dr I'm confident ruleset reduction A+B is, all things considered, as safe and perhaps even preferable due to earlier detection. This PR implements A+B.

document.body.innerHTML = '<div></div>';
document.body.firstChild.setAttribute('attr', '<>\'"&/'); // quoteAttributeValueForBrowser
document.body.firstChild.style.content = '\'<>\\\'"&/\''; // quoteAttributeValueForBrowser
document.body.firstChild.textContent = '<>\'"&/'; // escapeTextContentForBrowser

<div
  attr="<>'&quot;&amp;/"
  style="content: '<>\'&quot;&amp;/';">
  &lt;&gt;'"&amp;/
</div>

…ValueForBrowser no longer escapes '

yukinying · 2015-02-25T18:09:24Z

There are more optimizations can be done that is safe according to html specifications.

1 - For text content, the only characters that would change a browser state from text (DATA) to other state are < and &. Here are the references in the specification:

HTML5: as described in W3C HTML5 TR section 8.2.4.1 DATA state
HTML4: follows the SGML syntax
- > is used in MDC, PIC and TAGC. All of them are processed in non DATA (INITIAL) state.

Thus keeping > uncooked in escapeTextContentForBrowser does not cause any context state change.

yukinying · 2015-02-25T18:25:54Z

2 - For text content, & character would lead to character entity processing. The processing would only consume #, alphanumeric characters and \n. Here are the reference in the specification:

HTML5: W3C HTML5 TR section 8.2.4.2 Character reference in data state
HTML4: SGML syntax
- & is used in CRO and ERO only, follows with name, name_token, number and number_token. None of those contain < or &.

Thus escaping & would always result in double escaping. Keeping it intact would not cause any HTML context state change in escapeTextContentForBrowser.

Side note: Netscape historically supported a syntax called Javascript entities in the form of &{...} that javascript would be executed when one put it in HTML attribute. Yet this syntax is has been deprecated for long ago. Here are the references:

HTML: The Definitive Guide, 1997 edition
IE ends the support of such since IE 4.0 KB 190280

yukinying · 2015-02-25T18:31:02Z

3 - For attribute that is double quoted, the characters that would cause state changes are " and &. This is described in the following sections:

W3C HTML5 TR section 8.2.4.38 Attribute value (double-quoted) state
HTML4 SGML syntax
- 7.9.3 Attribute Value Literal - literal ({LIT}[^\"]*{LIT})|({LITA}[^\']*{LITA})

Thus escaping '<' is unnecessary. The logic of & follows the same as in text content thus it is not necessary to escape it as well.

yukinying · 2015-02-25T18:36:35Z

4 - Yet I would suggest considering escaping the \0 null byte. While the HTML5 specification clearly indicates that null byte would be replaced with replacement character \uFFFD and cause no changes in states for text (DATA) or attribute, it has special effect on other states.

yukinying · 2015-02-25T18:44:27Z

A bit background on the above suggestions: I have been studying the unsafe usage pattern in React in my company and public community. I identified a few scenarios that developers would just want to put html entities in an span element.

E.g. in https://github.com/JedWatson/react-select/blob/36a81c857b77fcc4be4bdc0e63eb5268061b9f41/src/Select.js#L549

<span className="Select-clear" ... dangerouslySetInnerHTML={{ __html: '&times;' }} />

We believe that we should not train developers to use unsafe practices in legitimate cases as it will reduce their security awareness. It appears that there is a growing trend of using dangerouslySetInnerHTML for cases that should not be. It would be great if we could stop this earlier than later.

Thanks!

vjeux · 2015-03-06T17:28:58Z

 <span className="Select-clear" ... dangerouslySetInnerHTML={{ __html: '&times;' }} />

You can write

<span>&times;</span>

It's even in the docs: http://facebook.github.io/react/docs/jsx-gotchas.html#html-entities

Demo: http://jsfiddle.net/z2rkqL6g/

Does it solve your issue? Or do you still want to do some modification to React escaping strategy?

yukinying · 2015-03-12T22:18:00Z

Thanks @vjeux , it should solve the issue for some developers. (I don't have that issue :) ).

I have also got in touch with the Facebook product security team and we figured out that React does not distinguish RAWTEXT or RCDATA nodes (e.g. title, textarea). That means React would allow some children node to be added to RAWTEXT or RCDATA nodes, and it would introduce vulnerability if < is not escaped even it is in an attribute state.

Also it appears that React assumes data are always come as unsanitized, and leaving ampersand uncooked would break some flow. For consistency reason, they recommend "uncook" the data and let React "cook" it again if the developers cannot change the server side sanitization flow.

I think, for consistency reason, I can take my comment back, and provide pull requests that 1) actually handle RAWTEXT and RCDATA nodes logic correctly (for '<' case) .

gaearon · 2016-03-26T20:27:12Z

This has been stale for quite some time. While this is important to get right, PRs tend to age worse than issues. We’re going to try an RFC-based approach to improvements in the future so features don’t get implemented unless there is a consensus on the approach and that it is high enough priority to be shipped.

I reopened #3879 as we are trying to hold discussions about intent in the issues now. Let’s keep track of this work there, and revisit if this is important to get it.

escapeTextContentForBrowser no longer escapes ' and ", quoteAttribute…

bcdcd82

…ValueForBrowser no longer escapes '

syranide added the GH Review: review-needed label Feb 14, 2015

facebook-github-bot added the CLA Signed label Apr 7, 2015

zpao mentioned this pull request May 14, 2015

Unnecessary encoding of element text content #3879

Closed

vlinder mentioned this pull request Aug 12, 2015

React renders differently on server and client. #4616

Closed

sebmarkbage added the HTML label Feb 27, 2016

gaearon closed this Mar 26, 2016

gaearon added the Resolution: Unsolved label Mar 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

escapeTextContentForBrowser no longer escapes ' and ", quoteAttributeValueForBrowser no longer escapes ' #3152

escapeTextContentForBrowser no longer escapes ' and ", quoteAttributeValueForBrowser no longer escapes ' #3152

Uh oh!

syranide commented Feb 14, 2015

Uh oh!

yukinying commented Feb 25, 2015

Uh oh!

yukinying commented Feb 25, 2015

Uh oh!

yukinying commented Feb 25, 2015

Uh oh!

yukinying commented Feb 25, 2015

Uh oh!

yukinying commented Feb 25, 2015

Uh oh!

vjeux commented Mar 6, 2015

Uh oh!

yukinying commented Mar 12, 2015

Uh oh!

gaearon commented Mar 26, 2016

Uh oh!

Uh oh!

escapeTextContentForBrowser no longer escapes ' and ", quoteAttributeValueForBrowser no longer escapes ' #3152

escapeTextContentForBrowser no longer escapes ' and ", quoteAttributeValueForBrowser no longer escapes ' #3152

Uh oh!

Conversation

syranide commented Feb 14, 2015

Uh oh!

yukinying commented Feb 25, 2015

Uh oh!

yukinying commented Feb 25, 2015

Uh oh!

yukinying commented Feb 25, 2015

Uh oh!

yukinying commented Feb 25, 2015

Uh oh!

yukinying commented Feb 25, 2015

Uh oh!

vjeux commented Mar 6, 2015

Uh oh!

yukinying commented Mar 12, 2015

Uh oh!

gaearon commented Mar 26, 2016

Uh oh!

Uh oh!