New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jQuery HTML parser corrupts script content that looks like HTML tags (including strings) #2668

Closed
JakeQZ opened this Issue Oct 20, 2015 · 11 comments

Comments

Projects
None yet
4 participants
@JakeQZ

JakeQZ commented Oct 20, 2015

With functions like .append(html), .repaceWith(html) or even just $(html), if html contains javascript and some content of that script looks like an HTML tag, it will be modified undesirably.

E.g.

<script type='text/javascript'>
alert('<What?/>');
</script>

when used as argument for .append() or suchlike produces "<What?></What>" in the alert when it should produce "<What?/>"

Example at http://jsfiddle.net/fwaoj0eh/2/ (where you can also see benign [in this case] replacement of <p/> with <p></p>), with workaround (and possible route to fix?). Bug present in jQuery 1.11.3 on all Windows browsers tested (IE7-12, FF, Chrome, Safari, Opera).

@gibson042

This comment has been minimized.

Show comment
Hide comment
@gibson042

gibson042 Oct 20, 2015

Member

Thanks for the report! This is related to (and a near-duplicate of) trac-14370 and trac-14464. It is possible to address on your end with jQuery.htmlPrefilter in 3.0, and the implementation in PR #1374 would be a good starting point. I'm still not sure about putting something like it in the official build, though.

Member

gibson042 commented Oct 20, 2015

Thanks for the report! This is related to (and a near-duplicate of) trac-14370 and trac-14464. It is possible to address on your end with jQuery.htmlPrefilter in 3.0, and the implementation in PR #1374 would be a good starting point. I'm still not sure about putting something like it in the official build, though.

@JakeQZ

This comment has been minimized.

Show comment
Hide comment
@JakeQZ

JakeQZ Oct 20, 2015

Yes, 14370, but not 14464 (which is not a bug). Seems the fix for 14370 was considered too big. But it ought to be possible to exclude content within <script...</script> without too much more regex?

JakeQZ commented Oct 20, 2015

Yes, 14370, but not 14464 (which is not a bug). Seems the fix for 14370 was considered too big. But it ought to be possible to exclude content within <script...</script> without too much more regex?

@JakeQZ

This comment has been minimized.

Show comment
Hide comment
@JakeQZ

JakeQZ Oct 20, 2015

Above should read 'content within <script...</script>'

JakeQZ commented Oct 20, 2015

Above should read 'content within <script...</script>'

@dmethvin

This comment has been minimized.

Show comment
Hide comment
@dmethvin

dmethvin Oct 20, 2015

Member

This is definitely unintended behavior, but screwing up this case isn't necessarily that bad. Putting executable scripts in HTML is something we'd love to remove, or at least make it much more difficult to do, because of its security implications. It's way too easy to inject attacker content that has scripts in it. For now I think we need to just reinforce the advice that it's bad practice, and use this example as one place where it fails.

Member

dmethvin commented Oct 20, 2015

This is definitely unintended behavior, but screwing up this case isn't necessarily that bad. Putting executable scripts in HTML is something we'd love to remove, or at least make it much more difficult to do, because of its security implications. It's way too easy to inject attacker content that has scripts in it. For now I think we need to just reinforce the advice that it's bad practice, and use this example as one place where it fails.

@gibson042

This comment has been minimized.

Show comment
Hide comment
@gibson042

gibson042 Oct 20, 2015

Member

But it ought to be possible to exclude content within <script...</script> without too much more regex?

https://github.com/jquery/jquery/pull/1374/files#diff-169760a97de5c86a886842060321d2c8R44 , though I would probably update the first part per 85ffc6d and change the [^>]*s to (?:[\x20\t\r\n\f]+[^\0-\x20\x7f-\x9f="'/>]+(?:[\x20\t\r\n\f]*=[\x20\t\r\n\f]*(?:"[\w\W]*?"|'[\w\W]*?'|[^\x20\t\r\n\f>]+)|))* for maximum correctness.

Member

gibson042 commented Oct 20, 2015

But it ought to be possible to exclude content within <script...</script> without too much more regex?

https://github.com/jquery/jquery/pull/1374/files#diff-169760a97de5c86a886842060321d2c8R44 , though I would probably update the first part per 85ffc6d and change the [^>]*s to (?:[\x20\t\r\n\f]+[^\0-\x20\x7f-\x9f="'/>]+(?:[\x20\t\r\n\f]*=[\x20\t\r\n\f]*(?:"[\w\W]*?"|'[\w\W]*?'|[^\x20\t\r\n\f>]+)|))* for maximum correctness.

@JakeQZ

This comment has been minimized.

Show comment
Hide comment
@JakeQZ

JakeQZ Oct 20, 2015

FWIW, the use case in which I found the issue was obfuscation of contact details using AJAX with JavaScript that converts sequences of apparently random characters back into the original human readable form.

I don't see anything particularly bad practice about injecting content into the DOM that has come from a trusted source via AJAX and may by design contain script, more generally.

However, I can't think of a good case, other than encryption, where such sequences in string literals being modified unexpectedly would be anything other than benign (e.g. if the AJAX-returned script also contains DOM manipulation code, $('<image/>') and $('<image></image>') are equivalent, and if plain Javascript, the XML special characters are not used (though I don't know about other frameworks). I am also not aware of any case where /> is valid Javascript outside a string literal.

So I would concur that adding something to the documentation 'Additional Notes' which begin "By design, any jQuery constructor or method that accepts an HTML string..." would resolve this ticket.

Perhaps: "Also note that if the HTML contains script with string literals from cryptographic functions, unexpected results may occur, e.g. "<EX@mple./>" will become "<EX@mple.></EX>". This is also by design, as jQuery is intended to be lightweight, and it is not expected to be an issue other genuine situations."

Something like this would have saved me several hours debugging, as I'd already scanned the docs and scoured t'internet to see if I might be missing something in the way .append(), etc. work.

But I don't know what to suggest as a workaround other than what's in the jsFiddle I've updated: http://jsfiddle.net/fwaoj0eh/3/, which is working fine in my case.

(I haven't looked into whether things like $('<image title="For historic reasons <image/> tags must be self-closed in HTML"/>') have issues, but some hackery types, on seeing an additional note on the limitations, may seek other ways of breaking it, which is probably a Good Thing...)

JakeQZ commented Oct 20, 2015

FWIW, the use case in which I found the issue was obfuscation of contact details using AJAX with JavaScript that converts sequences of apparently random characters back into the original human readable form.

I don't see anything particularly bad practice about injecting content into the DOM that has come from a trusted source via AJAX and may by design contain script, more generally.

However, I can't think of a good case, other than encryption, where such sequences in string literals being modified unexpectedly would be anything other than benign (e.g. if the AJAX-returned script also contains DOM manipulation code, $('<image/>') and $('<image></image>') are equivalent, and if plain Javascript, the XML special characters are not used (though I don't know about other frameworks). I am also not aware of any case where /> is valid Javascript outside a string literal.

So I would concur that adding something to the documentation 'Additional Notes' which begin "By design, any jQuery constructor or method that accepts an HTML string..." would resolve this ticket.

Perhaps: "Also note that if the HTML contains script with string literals from cryptographic functions, unexpected results may occur, e.g. "<EX@mple./>" will become "<EX@mple.></EX>". This is also by design, as jQuery is intended to be lightweight, and it is not expected to be an issue other genuine situations."

Something like this would have saved me several hours debugging, as I'd already scanned the docs and scoured t'internet to see if I might be missing something in the way .append(), etc. work.

But I don't know what to suggest as a workaround other than what's in the jsFiddle I've updated: http://jsfiddle.net/fwaoj0eh/3/, which is working fine in my case.

(I haven't looked into whether things like $('<image title="For historic reasons <image/> tags must be self-closed in HTML"/>') have issues, but some hackery types, on seeing an additional note on the limitations, may seek other ways of breaking it, which is probably a Good Thing...)

@dmethvin

This comment has been minimized.

Show comment
Hide comment
@dmethvin

dmethvin Oct 20, 2015

Member

I don't see anything particularly bad practice about injecting content into the DOM that has come from a trusted source via AJAX and may by design contain script, more generally.

From a Content Security Policy standpoint, it's very common to disable inline scripts as a policy because it's such a common attack vector. The problem is that is is really hard for most web developers to strongly assert that the content is from a trusted source. There are so many ways to mess that up.

(I haven't looked into whether things like $('<image title="For historic reasons <image/> tags must be self-closed in HTML"/>') have issues

Characters inside the title attribute should be HTML-encoded. If you were building that title dynamically it's an example of the "so many ways to mess that up" mentioned above that could result in XSS.

Member

dmethvin commented Oct 20, 2015

I don't see anything particularly bad practice about injecting content into the DOM that has come from a trusted source via AJAX and may by design contain script, more generally.

From a Content Security Policy standpoint, it's very common to disable inline scripts as a policy because it's such a common attack vector. The problem is that is is really hard for most web developers to strongly assert that the content is from a trusted source. There are so many ways to mess that up.

(I haven't looked into whether things like $('<image title="For historic reasons <image/> tags must be self-closed in HTML"/>') have issues

Characters inside the title attribute should be HTML-encoded. If you were building that title dynamically it's an example of the "so many ways to mess that up" mentioned above that could result in XSS.

@JakeQZ

This comment has been minimized.

Show comment
Hide comment
@JakeQZ

JakeQZ Oct 26, 2015

On your first point, why [rhetorically] is Google search results page full of inline script?
On your second, my bad example. Was trying to come up with other possible related failures, and failed.
Would like to move this to documentation issue, but don't seem able to...

JakeQZ commented Oct 26, 2015

On your first point, why [rhetorically] is Google search results page full of inline script?
On your second, my bad example. Was trying to come up with other possible related failures, and failed.
Would like to move this to documentation issue, but don't seem able to...

@gibson042

This comment has been minimized.

Show comment
Hide comment
@gibson042

gibson042 Oct 26, 2015

Member

On your second, my bad example. Was trying to come up with other possible related failures, and failed.

Not at all, <image title="For historic reasons <image/> tags must be self-closed in HTML"/> is valid HTML5.

Would like to move this to documentation issue, but don't seem able to...

https://github.com/jquery/api.jquery.com/issues/new . Thanks!

Member

gibson042 commented Oct 26, 2015

On your second, my bad example. Was trying to come up with other possible related failures, and failed.

Not at all, <image title="For historic reasons <image/> tags must be self-closed in HTML"/> is valid HTML5.

Would like to move this to documentation issue, but don't seem able to...

https://github.com/jquery/api.jquery.com/issues/new . Thanks!

@timmywil

This comment has been minimized.

Show comment
Hide comment
@timmywil

timmywil Oct 26, 2015

Member

We will put together a htmlPrefilter example for the docs specifically for this issue. See jquery/api.jquery.com#727.

Member

timmywil commented Oct 26, 2015

We will put together a htmlPrefilter example for the docs specifically for this issue. See jquery/api.jquery.com#727.

@timmywil timmywil closed this Oct 26, 2015

@JakeQZ

This comment has been minimized.

Show comment
Hide comment
@JakeQZ

JakeQZ Oct 27, 2015

Thanks all for your input on this issue.
Now it is closed, I'd just like to point out that because JS is client-side only, if you have a security issue with inline JS, then you MUST HAVE A SECURITY BREACH SOMEWHERE ELSE, and really should be looking there.

JakeQZ commented Oct 27, 2015

Thanks all for your input on this issue.
Now it is closed, I'd just like to point out that because JS is client-side only, if you have a security issue with inline JS, then you MUST HAVE A SECURITY BREACH SOMEWHERE ELSE, and really should be looking there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.