JSON escape rendered shoebox content #85

tomdale · 2016-09-30T14:15:04Z

Once rendering is completed, the contents of the shoebox is converted into a string of JSON and inserted into the HTML output that is sent back to the browser.

However, because JSON and HTML content are mixed, there is the potential for security vulnerabilities. Specifically, if an attacker can cause an application to place user-generated content into the shoebox, that content could trick the browser into thinking script parsing had ended, and evaluate arbitrary code in the origin of the host.

For example, if an untrusted user could supply an article with the title of </script><script>alert("owned")</script>, the naive interpolation of that into the shoebox might look like:

<script type="fastboot/shoebox" id="shoebox-article">
{"article":{"title":"</script><script>alert("owned")</script>"}}
</script>

In this case, the browser would interpret the </script> inside the JSON string as a real closing script tag, and thus would allow the attacker's code to execute in the application's origin ("XSS").

Upon examining the HTML5 parser specification, we can observe that there is one, and only one, way to exit the "script data" state: the existence of a < character, which moves the state machine into the "script data less-than sign state". From the "script data less-than sign state", there are several more states that can be traversed through, and it requires the creation of a temporary buffer.

Thus we can conclude that the simplest, most effective way to prevent inadvertent end-of-script situations is to prevent the < character from ever appearing in shoebox content. If you never leave the "script data" state, you can feel fairly certain that you have prevented this particular vector of XSS attacks.

The good news is that this is easily accomplished. Both the JavaScript specification and the JSON specification allow for Unicode escape sequences.

Before insertion into the HTML document, we can replace characters that could be ambiguous to the HTML parser and replace them with Unicode escape sequences. These are no different from the unescaped values to the eyes of the JSON or JavaScript parser, but give us a high degree of confidence that the HTML parser will not attempt to treat them as anything other than script data.

This commit Unicode escapes the following characters:

< and >, to prevent ambiguity with opening and closing tags.
&, to prevent ambiguity with HTML entities.
\u2028 and \u2029, Unicode line/paragraph separators, which the JSON parser and JavaScript parser treat differently and thus can lead to mismatched data if JavaScript is used as the JSON parser.

This PR is based on @pwfisher's #79 but uses an updated approach that the core team feels is more robust to potential attack vectors.

Once rendering is completed, the contents of the shoebox is converted into a string of JSON and inserted into the HTML output that is sent back to the browser. However, because JSON and HTML content are mixed, there is the potential for security vulnerabilities. Specifically, if an attacker can cause an application to place user-generated content into the shoebox, that content could trick the browser into thinking JSON parsing had ended, and evaluate arbitrary code in the origin of the host. For example, if an untrusted user could supply an article with the title of `</script><script>alert("owned")</script>`, the naive interpolation of that into the shoebox might look like: ```html <script type="fastboot/shoebox" id="shoebox-article"> {"article":{"title":"</script><script>alert("owned")</script>"}} </script> ``` In this case, the browser would interpret the `</script>` inside the JSON string as a real closing `script` tag, and thus would allow the attacker's code to execute in the application's origin ("XSS"). Upon examining the HTML5 parser specification, [we can observe that there is one, and only one, way to exit the "script data" state][spec]: the existence of a `<` character, which moves the state machine into the "script data less-than sign state". From the "script data less-than sign state", there are several more states that can be traversed through, and it requires the creation of a temporary buffer. [spec]: https://www.w3.org/TR/html5/syntax.html#script-data-state Thus we can conclude that the simplest, most effective way to prevent inadvertent end-of-script situations is to prevent the `<` character from ever appearing in shoebox content. If you never leave the "script data" state, you can feel fairly certain that you have prevented this particular vector of XSS attacks. The good news is that this is easily accomplished. Both the JavaScript specification and the JSON specification allow for [Unicode escape sequences](https://mathiasbynens.be/notes/javascript-escapes#unicode). Before insertion into the HTML document, we can replace characters that could be ambiguous to the HTML parser and replace them with Unicode escape sequences. These are no different from the unescaped values to the eyes of the JSON or JavaScript parser, but give us a high degree of confidence that the HTML parser will not attempt to treat them as anything other than script data. This commit Unicode escapes the following characters: * `<` and `>`, to prevent ambiguity with opening and closing tags. * `&`, to prevent ambiguity with HTML entities. * `\u2028` and `\u2029`, Unicode line/paragraph separators, which the JSON parser and JavaScript parser treat differently and thus can lead to mismatched data if JavaScript is used as the JSON parser.

habdelra · 2016-09-30T14:25:48Z

I have a son, he's 10 years old. He has computers, he's so good with these computers it's unbelievable. The security aspect of cyber is very very tough. And maybe it's hardly doable. But I think you have solved it @tomdale.

rwjblue

LGTM

danmcclain

LGTM as well

arjansingh

👍 from me.

chancancode · 2016-09-30T23:57:27Z

@tomdale FWIW, I think escaping \u2028 and \u2029 only matters if you eval the content (as opposed to passing it into JSON.parse), but it probably have the pleasant side-effect of making things less confusing looking in the HTML.

Should we be unit-testing this, like actually passing the string into JSON.parse and assert that it comes out the same? We can probably just port these tests from Rails over: https://github.com/rails/rails/blob/b326e82dc012d81e9698cb1f402502af1788c1e9/actionview/test/template/erb_util_test.rb#L46-L62

mathiasbynens · 2016-10-14T12:40:25Z

@chancancode

FWIW, I think escaping \u2028 and \u2029 only matters if you eval the content (as opposed to passing it into JSON.parse)

This is correct. It doesn’t make sense to include U+2028 and U+2029 in this patch.

, but it probably have the pleasant side-effect of making things less confusing looking in the HTML.

The same can be said for any symbol that is not a printable ASCII symbol. I use jsesc anywhere I generate JSON output at build time for this reason.

In my experience, U+2028 and U+2029 are very rarely used though — other weird non-ASCII characters are much more common.

mathiasbynens · 2016-10-14T12:44:19Z

FWIW, I’ve looked into this problem before, and wrote down my findings + the escaping requirements for each solution: “Hiding JSON-formatted data in the DOM with CSP enabled”.

danmcclain · 2016-10-14T13:05:54Z

From ember-fastboot/ember-cli-fastboot#275

The Result of a Fastboot request gets serialized by the SimpleDOM.HTMLSerializer before it is sent down to the client (https://github.com/ember-fastboot/fastboot/blob/master/src/result.js). This happens after the Shoebox is appended to the document, and has the unintended consequence of encoding any characters in the shoebox data that happen to be HTML entities. This includes characters such as &, <, >, ", `, and ', which are encoded into &, <, >, ", ` and ', respectively.

This results in a data payload fetched on the server and stored in the Shoebox to be different than the same data payload fetched on the client. This can cause issues when rendering any data provided by the Shoebox since Ember's rendering layer treats these encoded HTML entities as strings rather than HTML. For example my API may return Broadway & 1st Ave, but if that data is fetched on the server and passed to the client in the Shoebox it will be provided as Broadway & 1st Ave.

Seems like the shoebox#retrieve method ought to have a step that decodes any of these entities before the data is passed to the application.

It seems like there should be an idiomatic way to unescape these characters once they are fetched from the shoebox in the client. Characters &, <, and > will commonly occur in text data stored in the shoebox. When normally consumed from a REST API or other data source they won't be escaped and will display normally in a template, but when consumed from the shoebox they will be escaped and won't display correctly. This seems like a source of frustration and bugs. It seems like the escaping/unescaping of the shoebox content should be transparent to the developer and happen behind the scenes within the shoebox#retrieve method.

This, however, may conflict with shoebox content where the original data had escaped entities that shouldn't actually be unescaped. Considering this, maybe it makes more sense to leave it up to the developer to unescape shoebox content as needed. If so, this should be mentioned in the docs, and maybe there ought to be a standard/idiomatic way to do the unescaping such as an alternate method to shoebox#retrieve that unescapes the content.

rwjblue · 2016-10-14T13:45:35Z

I replied to the last comment in detail in ember-fastboot/ember-cli-fastboot#275.

Also, created this demo to show the result of this escaping before writing to the DOM does not affect the resulting parsed values when it is read (as mentioned in this PR's description).

rwjblue · 2016-10-14T14:11:42Z

@tomdale - What is blocking this? Seems like it closes a fairly important gap/issue, I'd like to land it ASAP if there are no objections...

tomdale · 2016-10-14T14:15:29Z

@rwjblue This is high priority and agree it should have been merged already. There was some debate about whether or not we need to escape \u2028 and u2029, but I don't think it does any harm to do. I've merged this PR as is but can remove those escapements in a future commit.

simonexmachina · 2016-10-27T06:27:31Z

I just downloaded the latest versions and I'm still getting incorrectly escaped JSON. I'm using "ember-cli-fastboot": "^1.0.0-beta.12" and "fastboot-app-server": "^1.0.0-rc.5".

acorncom · 2016-10-27T11:38:25Z

@aexmachina Looks like a new release hasn't been cut yet for this package or for fastboot-app-server

tomdale mentioned this pull request Sep 30, 2016

Shoebox encoding #79

Closed

rwjblue approved these changes Sep 30, 2016

View reviewed changes

danmcclain approved these changes Sep 30, 2016

View reviewed changes

arjansingh approved these changes Sep 30, 2016

View reviewed changes

danmcclain mentioned this pull request Oct 13, 2016

Encoded HTML entities in Shoebox data ember-fastboot/ember-cli-fastboot#275

Closed

tomdale merged commit 904f30d into master Oct 14, 2016

rwjblue deleted the shoebox-encoding branch October 14, 2016 14:14

rwjblue restored the shoebox-encoding branch October 14, 2016 14:14

tomdale deleted the shoebox-encoding branch October 14, 2016 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON escape rendered shoebox content #85

JSON escape rendered shoebox content #85

tomdale commented Sep 30, 2016 •

edited

Loading

habdelra commented Sep 30, 2016 •

edited

Loading

rwjblue left a comment

danmcclain left a comment

arjansingh left a comment

chancancode commented Sep 30, 2016

mathiasbynens commented Oct 14, 2016

mathiasbynens commented Oct 14, 2016

danmcclain commented Oct 14, 2016

rwjblue commented Oct 14, 2016

rwjblue commented Oct 14, 2016

tomdale commented Oct 14, 2016

simonexmachina commented Oct 27, 2016

acorncom commented Oct 27, 2016

JSON escape rendered shoebox content #85

JSON escape rendered shoebox content #85

Conversation

tomdale commented Sep 30, 2016 • edited Loading

habdelra commented Sep 30, 2016 • edited Loading

rwjblue left a comment

Choose a reason for hiding this comment

danmcclain left a comment

Choose a reason for hiding this comment

arjansingh left a comment

Choose a reason for hiding this comment

chancancode commented Sep 30, 2016

mathiasbynens commented Oct 14, 2016

mathiasbynens commented Oct 14, 2016

danmcclain commented Oct 14, 2016

rwjblue commented Oct 14, 2016

rwjblue commented Oct 14, 2016

tomdale commented Oct 14, 2016

simonexmachina commented Oct 27, 2016

acorncom commented Oct 27, 2016

tomdale commented Sep 30, 2016 •

edited

Loading

habdelra commented Sep 30, 2016 •

edited

Loading