-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON escape rendered shoebox content #85
Conversation
Once rendering is completed, the contents of the shoebox is converted into a string of JSON and inserted into the HTML output that is sent back to the browser. However, because JSON and HTML content are mixed, there is the potential for security vulnerabilities. Specifically, if an attacker can cause an application to place user-generated content into the shoebox, that content could trick the browser into thinking JSON parsing had ended, and evaluate arbitrary code in the origin of the host. For example, if an untrusted user could supply an article with the title of `</script><script>alert("owned")</script>`, the naive interpolation of that into the shoebox might look like: ```html <script type="fastboot/shoebox" id="shoebox-article"> {"article":{"title":"</script><script>alert("owned")</script>"}} </script> ``` In this case, the browser would interpret the `</script>` inside the JSON string as a real closing `script` tag, and thus would allow the attacker's code to execute in the application's origin ("XSS"). Upon examining the HTML5 parser specification, [we can observe that there is one, and only one, way to exit the "script data" state][spec]: the existence of a `<` character, which moves the state machine into the "script data less-than sign state". From the "script data less-than sign state", there are several more states that can be traversed through, and it requires the creation of a temporary buffer. [spec]: https://www.w3.org/TR/html5/syntax.html#script-data-state Thus we can conclude that the simplest, most effective way to prevent inadvertent end-of-script situations is to prevent the `<` character from ever appearing in shoebox content. If you never leave the "script data" state, you can feel fairly certain that you have prevented this particular vector of XSS attacks. The good news is that this is easily accomplished. Both the JavaScript specification and the JSON specification allow for [Unicode escape sequences](https://mathiasbynens.be/notes/javascript-escapes#unicode). Before insertion into the HTML document, we can replace characters that could be ambiguous to the HTML parser and replace them with Unicode escape sequences. These are no different from the unescaped values to the eyes of the JSON or JavaScript parser, but give us a high degree of confidence that the HTML parser will not attempt to treat them as anything other than script data. This commit Unicode escapes the following characters: * `<` and `>`, to prevent ambiguity with opening and closing tags. * `&`, to prevent ambiguity with HTML entities. * `\u2028` and `\u2029`, Unicode line/paragraph separators, which the JSON parser and JavaScript parser treat differently and thus can lead to mismatched data if JavaScript is used as the JSON parser.
I have a son, he's 10 years old. He has computers, he's so good with these computers it's unbelievable. The security aspect of cyber is very very tough. And maybe it's hardly doable. But I think you have solved it @tomdale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 from me.
@tomdale FWIW, I think escaping Should we be unit-testing this, like actually passing the string into JSON.parse and assert that it comes out the same? We can probably just port these tests from Rails over: https://github.com/rails/rails/blob/b326e82dc012d81e9698cb1f402502af1788c1e9/actionview/test/template/erb_util_test.rb#L46-L62 |
This is correct. It doesn’t make sense to include U+2028 and U+2029 in this patch.
The same can be said for any symbol that is not a printable ASCII symbol. I use jsesc anywhere I generate JSON output at build time for this reason. In my experience, U+2028 and U+2029 are very rarely used though — other weird non-ASCII characters are much more common. |
FWIW, I’ve looked into this problem before, and wrote down my findings + the escaping requirements for each solution: “Hiding JSON-formatted data in the DOM with CSP enabled”. |
From ember-fastboot/ember-cli-fastboot#275
|
I replied to the last comment in detail in ember-fastboot/ember-cli-fastboot#275. Also, created this demo to show the result of this escaping before writing to the DOM does not affect the resulting parsed values when it is read (as mentioned in this PR's description). |
@tomdale - What is blocking this? Seems like it closes a fairly important gap/issue, I'd like to land it ASAP if there are no objections... |
@rwjblue This is high priority and agree it should have been merged already. There was some debate about whether or not we need to escape |
I just downloaded the latest versions and I'm still getting incorrectly escaped JSON. I'm using |
@aexmachina Looks like a new release hasn't been cut yet for this package or for |
Once rendering is completed, the contents of the shoebox is converted into a string of JSON and inserted into the HTML output that is sent back to the browser.
However, because JSON and HTML content are mixed, there is the potential for security vulnerabilities. Specifically, if an attacker can cause an application to place user-generated content into the shoebox, that content could trick the browser into thinking script parsing had ended, and evaluate arbitrary code in the origin of the host.
For example, if an untrusted user could supply an article with the title of
</script><script>alert("owned")</script>
, the naive interpolation of that into the shoebox might look like:In this case, the browser would interpret the
</script>
inside the JSON string as a real closingscript
tag, and thus would allow the attacker's code to execute in the application's origin ("XSS").Upon examining the HTML5 parser specification, we can observe that there is one, and only one, way to exit the "script data" state: the existence of a
<
character, which moves the state machine into the "script data less-than sign state". From the "script data less-than sign state", there are several more states that can be traversed through, and it requires the creation of a temporary buffer.Thus we can conclude that the simplest, most effective way to prevent inadvertent end-of-script situations is to prevent the
<
character from ever appearing in shoebox content. If you never leave the "script data" state, you can feel fairly certain that you have prevented this particular vector of XSS attacks.The good news is that this is easily accomplished. Both the JavaScript specification and the JSON specification allow for Unicode escape sequences.
Before insertion into the HTML document, we can replace characters that could be ambiguous to the HTML parser and replace them with Unicode escape sequences. These are no different from the unescaped values to the eyes of the JSON or JavaScript parser, but give us a high degree of confidence that the HTML parser will not attempt to treat them as anything other than script data.
This commit Unicode escapes the following characters:
<
and>
, to prevent ambiguity with opening and closing tags.&
, to prevent ambiguity with HTML entities.\u2028
and\u2029
, Unicode line/paragraph separators, which the JSON parser and JavaScript parser treat differently and thus can lead to mismatched data if JavaScript is used as the JSON parser.This PR is based on @pwfisher's #79 but uses an updated approach that the core team feels is more robust to potential attack vectors.