-
Notifications
You must be signed in to change notification settings - Fork 49.5k
escapeTextContentForBrowser no longer escapes ' and ", quoteAttributeValueForBrowser no longer escapes ' #3152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ValueForBrowser no longer escapes '
There are more optimizations can be done that is safe according to html specifications. 1 - For text content, the only characters that would change a browser state from text (DATA) to other state are
Thus keeping |
2 - For text content,
Thus escaping Side note: Netscape historically supported a syntax called Javascript entities in the form of
|
3 - For attribute that is double quoted, the characters that would cause state changes are
Thus escaping '<' is unnecessary. The logic of |
4 - Yet I would suggest considering escaping the |
A bit background on the above suggestions: I have been studying the unsafe usage pattern in React in my company and public community. I identified a few scenarios that developers would just want to put html entities in an span element.
We believe that we should not train developers to use unsafe practices in legitimate cases as it will reduce their security awareness. It appears that there is a growing trend of using Thanks! |
You can write
It's even in the docs: http://facebook.github.io/react/docs/jsx-gotchas.html#html-entities Demo: http://jsfiddle.net/z2rkqL6g/ Does it solve your issue? Or do you still want to do some modification to React escaping strategy? |
Thanks @vjeux , it should solve the issue for some developers. (I don't have that issue :) ). I have also got in touch with the Facebook product security team and we figured out that React does not distinguish RAWTEXT or RCDATA nodes (e.g. title, textarea). That means React would allow some children node to be added to RAWTEXT or RCDATA nodes, and it would introduce vulnerability if Also it appears that React assumes data are always come as unsanitized, and leaving ampersand uncooked would break some flow. For consistency reason, they recommend "uncook" the data and let React "cook" it again if the developers cannot change the server side sanitization flow. I think, for consistency reason, I can take my comment back, and provide pull requests that 1) actually handle RAWTEXT and RCDATA nodes logic correctly (for '<' case) . |
This has been stale for quite some time. While this is important to get right, PRs tend to age worse than issues. We’re going to try an RFC-based approach to improvements in the future so features don’t get implemented unless there is a consensus on the approach and that it is high enough priority to be shipped. I reopened #3879 as we are trying to hold discussions about intent in the issues now. Let’s keep track of this work there, and revisit if this is important to get it. |
cc @yungsters, @vjeux, @zpao
After digging around and evaluating, i see the following possible rule changes:
A.
escapeTextContentForBrowser
ignore"
and'
, they have no special meaning in text content.B.
quoteAttributeValueForBrowser
ignore'
, can only be broken out of with"
(OWASP).C.
quoteAttributeValueForBrowser
ignore<
and>
, cannot break out of quoted attribute values.The following safety observations are only guaranteed to hold for React generated markup, it does not hold when markup is introduced via
dangerouslySetInnerHTML
using different escaping/quoting or post-process manipulated.Markup as a string in inline scripts
Proper escaping: JSON stringify + replace
</script
with<\/script
.With current rules (if no encoding):
'
-string if there are legitimate occurrences of</script>
(breaks layout during load)."
-string (throws error on load if markup has a quoted attribute value).With rules A+B:
'
-string (throws error on load if markup has a'
).With rules A+B+C:
</script>
is used as an attribute value (only observed if actively tested for).Markup within a HTML comment
Proper encoding: HTML encode
With current rules (if no encoding):
<!-- -->
(breaks layout during load).With rules A+B:
With rules A+B+C:
-->
is used as an attribute value (only observed if actively tested for).The ruleset A+B is ever slightly more exploitable in the case of markup as a string within an inline script without proper encoding, but the flip-side benefit is that the lack of proper escaping is much more likely to be observed during development. I find this to be an acceptable trade-off and it's a dangerous situation to be in with or without the new rules so having a chance to catch it earlier is for the better.
So while the ruleset A+B+C is safe for HTML rendering it consistently elevates likely (relatively) minor safety issues to full-blown XSS and without increasing the chances of the lack of proper escaping being observed at development, although these are errors on behalf of the user and technically not our concern. This seems like a dangerous step that is not worth taking lightly, considering knowledge of proper escaping is far less common than it should be.
<
and>
are also rarely used in attribute values so it would also have little practical impact. It would be easy to fix</script
and-->
but there's always the question of what else.tl;dr I'm confident ruleset reduction A+B is, all things considered, as safe and perhaps even preferable due to earlier detection. This PR implements A+B.