Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON escape rendered shoebox content #85

Merged
merged 1 commit into from
Oct 14, 2016
Merged

JSON escape rendered shoebox content #85

merged 1 commit into from
Oct 14, 2016

Conversation

tomdale
Copy link
Contributor

@tomdale tomdale commented Sep 30, 2016

Once rendering is completed, the contents of the shoebox is converted into a string of JSON and inserted into the HTML output that is sent back to the browser.

However, because JSON and HTML content are mixed, there is the potential for security vulnerabilities. Specifically, if an attacker can cause an application to place user-generated content into the shoebox, that content could trick the browser into thinking script parsing had ended, and evaluate arbitrary code in the origin of the host.

For example, if an untrusted user could supply an article with the title of </script><script>alert("owned")</script>, the naive interpolation of that into the shoebox might look like:

<script type="fastboot/shoebox" id="shoebox-article">
{"article":{"title":"</script><script>alert("owned")</script>"}}
</script>

In this case, the browser would interpret the </script> inside the JSON string as a real closing script tag, and thus would allow the attacker's code to execute in the application's origin ("XSS").

Upon examining the HTML5 parser specification, we can observe that there is one, and only one, way to exit the "script data" state: the existence of a < character, which moves the state machine into the "script data less-than sign state". From the "script data less-than sign state", there are several more states that can be traversed through, and it requires the creation of a temporary buffer.

Thus we can conclude that the simplest, most effective way to prevent inadvertent end-of-script situations is to prevent the < character from ever appearing in shoebox content. If you never leave the "script data" state, you can feel fairly certain that you have prevented this particular vector of XSS attacks.

The good news is that this is easily accomplished. Both the JavaScript specification and the JSON specification allow for Unicode escape sequences.

Before insertion into the HTML document, we can replace characters that could be ambiguous to the HTML parser and replace them with Unicode escape sequences. These are no different from the unescaped values to the eyes of the JSON or JavaScript parser, but give us a high degree of confidence that the HTML parser will not attempt to treat them as anything other than script data.

This commit Unicode escapes the following characters:

  • < and >, to prevent ambiguity with opening and closing tags.
  • &, to prevent ambiguity with HTML entities.
  • \u2028 and \u2029, Unicode line/paragraph separators, which the JSON parser and JavaScript parser treat differently and thus can lead to mismatched data if JavaScript is used as the JSON parser.

This PR is based on @pwfisher's #79 but uses an updated approach that the core team feels is more robust to potential attack vectors.

Once rendering is completed, the contents of the shoebox is converted
into a string of JSON and inserted into the HTML output that is sent
back to the browser.

However, because JSON and HTML content are mixed, there is the potential
for security vulnerabilities. Specifically, if an attacker can cause an
application to place user-generated content into the shoebox, that
content could trick the browser into thinking JSON parsing had ended,
and evaluate arbitrary code in the origin of the host.

For example, if an untrusted user could supply an article with the
title of `</script><script>alert("owned")</script>`, the naive
interpolation of that into the shoebox might look like:

```html
<script type="fastboot/shoebox" id="shoebox-article">
{"article":{"title":"</script><script>alert("owned")</script>"}}
</script>
```

In this case, the browser would interpret the `</script>` inside the
JSON string as a real closing `script` tag, and thus would allow the
attacker's code to execute in the application's origin ("XSS").

Upon examining the HTML5 parser specification, [we can observe that there
is one, and only one, way to exit the "script data" state][spec]: the existence
of a `<` character, which moves the state machine into the "script data
less-than sign state". From the "script data less-than sign state",
there are several more states that can be traversed through, and it
requires the creation of a temporary buffer.

[spec]: https://www.w3.org/TR/html5/syntax.html#script-data-state

Thus we can conclude that the simplest, most effective way to prevent
inadvertent end-of-script situations is to prevent the `<` character
from ever appearing in shoebox content. If you never leave the "script
data" state, you can feel fairly certain that you have prevented this
particular vector of XSS attacks.

The good news is that this is easily accomplished. Both the JavaScript
specification and the JSON specification allow for [Unicode escape
sequences](https://mathiasbynens.be/notes/javascript-escapes#unicode).

Before insertion into the HTML document, we can replace characters that
could be ambiguous to the HTML parser and replace them with Unicode
escape sequences. These are no different from the unescaped values to
the eyes of the JSON or JavaScript parser, but give us a high degree of
confidence that the HTML parser will not attempt to treat them as
anything other than script data.

This commit Unicode escapes the following characters:

* `<` and `>`, to prevent ambiguity with opening and closing tags.
* `&`, to prevent ambiguity with HTML entities.
* `\u2028` and `\u2029`, Unicode line/paragraph separators, which the
  JSON parser and JavaScript parser treat differently and thus can lead
  to mismatched data if JavaScript is used as the JSON parser.
@tomdale tomdale mentioned this pull request Sep 30, 2016
@habdelra
Copy link
Contributor

habdelra commented Sep 30, 2016

I have a son, he's 10 years old. He has computers, he's so good with these computers it's unbelievable. The security aspect of cyber is very very tough. And maybe it's hardly doable. But I think you have solved it @tomdale.

Copy link
Member

@rwjblue rwjblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@danmcclain danmcclain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well

Copy link
Contributor

@arjansingh arjansingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me.

@chancancode
Copy link
Contributor

@tomdale FWIW, I think escaping \u2028 and \u2029 only matters if you eval the content (as opposed to passing it into JSON.parse), but it probably have the pleasant side-effect of making things less confusing looking in the HTML.

Should we be unit-testing this, like actually passing the string into JSON.parse and assert that it comes out the same? We can probably just port these tests from Rails over: https://github.com/rails/rails/blob/b326e82dc012d81e9698cb1f402502af1788c1e9/actionview/test/template/erb_util_test.rb#L46-L62

@mathiasbynens
Copy link

@chancancode

FWIW, I think escaping \u2028 and \u2029 only matters if you eval the content (as opposed to passing it into JSON.parse)

This is correct. It doesn’t make sense to include U+2028 and U+2029 in this patch.

, but it probably have the pleasant side-effect of making things less confusing looking in the HTML.

The same can be said for any symbol that is not a printable ASCII symbol. I use jsesc anywhere I generate JSON output at build time for this reason.

In my experience, U+2028 and U+2029 are very rarely used though — other weird non-ASCII characters are much more common.

@mathiasbynens
Copy link

FWIW, I’ve looked into this problem before, and wrote down my findings + the escaping requirements for each solution: “Hiding JSON-formatted data in the DOM with CSP enabled”.

@danmcclain
Copy link
Member

From ember-fastboot/ember-cli-fastboot#275

The Result of a Fastboot request gets serialized by the SimpleDOM.HTMLSerializer before it is sent down to the client (https://github.com/ember-fastboot/fastboot/blob/master/src/result.js). This happens after the Shoebox is appended to the document, and has the unintended consequence of encoding any characters in the shoebox data that happen to be HTML entities. This includes characters such as &, <, >, ", `, and ', which are encoded into &, <, >, ", ` and ', respectively.

This results in a data payload fetched on the server and stored in the Shoebox to be different than the same data payload fetched on the client. This can cause issues when rendering any data provided by the Shoebox since Ember's rendering layer treats these encoded HTML entities as strings rather than HTML. For example my API may return Broadway & 1st Ave, but if that data is fetched on the server and passed to the client in the Shoebox it will be provided as Broadway & 1st Ave.

Seems like the shoebox#retrieve method ought to have a step that decodes any of these entities before the data is passed to the application.

It seems like there should be an idiomatic way to unescape these characters once they are fetched from the shoebox in the client. Characters &, <, and > will commonly occur in text data stored in the shoebox. When normally consumed from a REST API or other data source they won't be escaped and will display normally in a template, but when consumed from the shoebox they will be escaped and won't display correctly. This seems like a source of frustration and bugs. It seems like the escaping/unescaping of the shoebox content should be transparent to the developer and happen behind the scenes within the shoebox#retrieve method.

This, however, may conflict with shoebox content where the original data had escaped entities that shouldn't actually be unescaped. Considering this, maybe it makes more sense to leave it up to the developer to unescape shoebox content as needed. If so, this should be mentioned in the docs, and maybe there ought to be a standard/idiomatic way to do the unescaping such as an alternate method to shoebox#retrieve that unescapes the content.

@rwjblue
Copy link
Member

rwjblue commented Oct 14, 2016

I replied to the last comment in detail in ember-fastboot/ember-cli-fastboot#275.

Also, created this demo to show the result of this escaping before writing to the DOM does not affect the resulting parsed values when it is read (as mentioned in this PR's description).

@rwjblue
Copy link
Member

rwjblue commented Oct 14, 2016

@tomdale - What is blocking this? Seems like it closes a fairly important gap/issue, I'd like to land it ASAP if there are no objections...

@tomdale tomdale merged commit 904f30d into master Oct 14, 2016
@rwjblue rwjblue deleted the shoebox-encoding branch October 14, 2016 14:14
@rwjblue rwjblue restored the shoebox-encoding branch October 14, 2016 14:14
@tomdale
Copy link
Contributor Author

tomdale commented Oct 14, 2016

@rwjblue This is high priority and agree it should have been merged already. There was some debate about whether or not we need to escape \u2028 and u2029, but I don't think it does any harm to do. I've merged this PR as is but can remove those escapements in a future commit.

@tomdale tomdale deleted the shoebox-encoding branch October 14, 2016 14:15
@simonexmachina
Copy link

I just downloaded the latest versions and I'm still getting incorrectly escaped JSON. I'm using "ember-cli-fastboot": "^1.0.0-beta.12" and "fastboot-app-server": "^1.0.0-rc.5".

@acorncom
Copy link

@aexmachina Looks like a new release hasn't been cut yet for this package or for fastboot-app-server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants