Skip to content

Conversation

syranide
Copy link
Contributor

cc @yungsters, @vjeux, @zpao

After digging around and evaluating, i see the following possible rule changes:

A. escapeTextContentForBrowser ignore " and ', they have no special meaning in text content.
B. quoteAttributeValueForBrowser ignore ', can only be broken out of with " (OWASP).
C. quoteAttributeValueForBrowser ignore < and >, cannot break out of quoted attribute values.


The following safety observations are only guaranteed to hold for React generated markup, it does not hold when markup is introduced via dangerouslySetInnerHTML using different escaping/quoting or post-process manipulated.

Markup as a string in inline scripts
Proper escaping: JSON stringify + replace </script with <\/script.

With current rules (if no encoding):

  • Leads to XSS if inside a '-string if there are legitimate occurrences of </script> (breaks layout during load).
  • Leads to XSS if inside a "-string (throws error on load if markup has a quoted attribute value).

With rules A+B:

  • Leads to XSS if inside a '-string (throws error on load if markup has a ').

With rules A+B+C:

  • Leads to XSS if </script> is used as an attribute value (only observed if actively tested for).

Markup within a HTML comment
Proper encoding: HTML encode

With current rules (if no encoding):

  • Leads to XSS if there are legitimate occurrences of <!-- --> (breaks layout during load).
  • Leads to XSS if comment is evaluted by a library (likely to throw error or break layout during load).

With rules A+B:

  • (nothing new)

With rules A+B+C:

  • Leads to XSS if --> is used as an attribute value (only observed if actively tested for).

The ruleset A+B is ever slightly more exploitable in the case of markup as a string within an inline script without proper encoding, but the flip-side benefit is that the lack of proper escaping is much more likely to be observed during development. I find this to be an acceptable trade-off and it's a dangerous situation to be in with or without the new rules so having a chance to catch it earlier is for the better.

So while the ruleset A+B+C is safe for HTML rendering it consistently elevates likely (relatively) minor safety issues to full-blown XSS and without increasing the chances of the lack of proper escaping being observed at development, although these are errors on behalf of the user and technically not our concern. This seems like a dangerous step that is not worth taking lightly, considering knowledge of proper escaping is far less common than it should be. < and > are also rarely used in attribute values so it would also have little practical impact. It would be easy to fix </script and --> but there's always the question of what else.

tl;dr I'm confident ruleset reduction A+B is, all things considered, as safe and perhaps even preferable due to earlier detection. This PR implements A+B.


document.body.innerHTML = '<div></div>';
document.body.firstChild.setAttribute('attr', '<>\'"&/'); // quoteAttributeValueForBrowser
document.body.firstChild.style.content = '\'<>\\\'"&/\''; // quoteAttributeValueForBrowser
document.body.firstChild.textContent = '<>\'"&/'; // escapeTextContentForBrowser
<div
  attr="<>'&quot;&amp;/"
  style="content: '<>\'&quot;&amp;/';">
  &lt;&gt;'"&amp;/
</div>

@yukinying
Copy link

There are more optimizations can be done that is safe according to html specifications.

1 - For text content, the only characters that would change a browser state from text (DATA) to other state are < and &. Here are the references in the specification:

Thus keeping > uncooked in escapeTextContentForBrowser does not cause any context state change.

@yukinying
Copy link

2 - For text content, & character would lead to character entity processing. The processing would only consume #, alphanumeric characters and \n. Here are the reference in the specification:

Thus escaping & would always result in double escaping. Keeping it intact would not cause any HTML context state change in escapeTextContentForBrowser.

Side note: Netscape historically supported a syntax called Javascript entities in the form of &{...} that javascript would be executed when one put it in HTML attribute. Yet this syntax is has been deprecated for long ago. Here are the references:

@yukinying
Copy link

3 - For attribute that is double quoted, the characters that would cause state changes are " and &. This is described in the following sections:

Thus escaping '<' is unnecessary. The logic of & follows the same as in text content thus it is not necessary to escape it as well.

@yukinying
Copy link

4 - Yet I would suggest considering escaping the \0 null byte. While the HTML5 specification clearly indicates that null byte would be replaced with replacement character \uFFFD and cause no changes in states for text (DATA) or attribute, it has special effect on other states.

@yukinying
Copy link

A bit background on the above suggestions: I have been studying the unsafe usage pattern in React in my company and public community. I identified a few scenarios that developers would just want to put html entities in an span element.

E.g. in https://github.com/JedWatson/react-select/blob/36a81c857b77fcc4be4bdc0e63eb5268061b9f41/src/Select.js#L549

<span className="Select-clear" ... dangerouslySetInnerHTML={{ __html: '&times;' }} />

We believe that we should not train developers to use unsafe practices in legitimate cases as it will reduce their security awareness. It appears that there is a growing trend of using dangerouslySetInnerHTML for cases that should not be. It would be great if we could stop this earlier than later.

Thanks!

@vjeux
Copy link
Contributor

vjeux commented Mar 6, 2015

 <span className="Select-clear" ... dangerouslySetInnerHTML={{ __html: '&times;' }} />

You can write

<span>&times;</span>

It's even in the docs: http://facebook.github.io/react/docs/jsx-gotchas.html#html-entities

Demo: http://jsfiddle.net/z2rkqL6g/

Does it solve your issue? Or do you still want to do some modification to React escaping strategy?

@yukinying
Copy link

Thanks @vjeux , it should solve the issue for some developers. (I don't have that issue :) ).

I have also got in touch with the Facebook product security team and we figured out that React does not distinguish RAWTEXT or RCDATA nodes (e.g. title, textarea). That means React would allow some children node to be added to RAWTEXT or RCDATA nodes, and it would introduce vulnerability if < is not escaped even it is in an attribute state.

Also it appears that React assumes data are always come as unsanitized, and leaving ampersand uncooked would break some flow. For consistency reason, they recommend "uncook" the data and let React "cook" it again if the developers cannot change the server side sanitization flow.

I think, for consistency reason, I can take my comment back, and provide pull requests that 1) actually handle RAWTEXT and RCDATA nodes logic correctly (for '<' case) .

@gaearon
Copy link
Collaborator

gaearon commented Mar 26, 2016

This has been stale for quite some time. While this is important to get right, PRs tend to age worse than issues. We’re going to try an RFC-based approach to improvements in the future so features don’t get implemented unless there is a consensus on the approach and that it is high enough priority to be shipped.

I reopened #3879 as we are trying to hold discussions about intent in the issues now. Let’s keep track of this work there, and revisit if this is important to get it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants