Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML style comment breaks script #4904

Closed
ovarn opened this issue Jul 14, 2021 · 11 comments · Fixed by #4906
Closed

HTML style comment breaks script #4904

ovarn opened this issue Jul 14, 2021 · 11 comments · Fixed by #4906

Comments

@ovarn
Copy link

ovarn commented Jul 14, 2021

Description

$.append fails when I try to append script element with HTML style comment.

There is no errors when I try to do the same thing using vanila JS:

var script = document.createElement('script');
script.text = '<!-- html style comment in javascript -->';
document.querySelector('html').append(script); // no errors

When I do it using jQuery:

var script = document.createElement('script');
script.text = '<!--html style comment in javascript -->';
$('html').append(script); // Uncaught SyntaxError: Unexpected identifier

The root cause:
The domManip function does RegExp replacement before calling DOMEval and removes <!-- and -->:

rcleanScript = /^\s*<!(?:\[CDATA\[|--)|(?:\]\]|--)>\s*$/g;

DOMEval( node.textContent.replace( rcleanScript, "" ), node, doc );

Link to test case

https://codepen.io/ovarn/pen/vYmgjzP (open browser console to see the error)

@mgol mgol added Discuss in Meeting Reserved for Issues and PRs that anyone would like to discuss in the weekly meeting. Manipulation labels Jul 14, 2021
@mgol
Copy link
Member

mgol commented Jul 14, 2021

Thanks for the report. The purpose of this regex is to both clean HTML comments and CDATA sections in XML documents, e.g.:

<script>
<![CDATA[
console.log("<");
]]>
</script>

HTML comments themselves should not be an issue as ECMAScript now requires HTML-style comment delimiters to also serve as JS comment ones; a relevant test case from Compat Tables:

--> A comment
<!-- Another comment
var a = 3; <!-- Another comment
return a === 3;

Unfortunately, IE 11 doesn't support this feature. 😞

Using regexes to parse JS is always going to have edge cases, we've learned that the hard way when we had to release jQuery 3.5. However, we need to cut out CDATA & HTML comments to be able to execute inline scripts that may contain them and using a full parser would be too slow & huge.

If there's any other way to remove CDATA & HTML comments without hitting issues like this one, we'd gladly replace the current implementation.

@mgol
Copy link
Member

mgol commented Jul 14, 2021

Hmm, since the spec requires <!-- and --> to serve as single-line comment delimiters, maybe we should just tweak the regex to match not just those tokens but also what's after them until the end of the line?

@mgol
Copy link
Member

mgol commented Jul 15, 2021

Looking at it again, it seems we could skip using rcleanScript completely outside of IE and in IE fix the regex to match full lines.

I also don't see a point of having the CDATA token in the regex. In HTML mode CDATA sections are not recognized and have to be commented out, either via JS comments or HTML-style comments. In XML mode, CDATA sections are not put in the script element textContent so jQuery would also not pick it up (just in case, I tested it in both IE 9 & 11). Therefore, we should be able to just workaround the HTML comments issue for IE and leave the modern browsers without any replacement. This would also play nice with things like trusted types which may not allow manual modifications of evaluated strings taken from the page.

I'll submit a PR.

@mgol mgol self-assigned this Jul 15, 2021
@mgol
Copy link
Member

mgol commented Jul 15, 2021

I'm thinking about making the fix for IE just for the 3.x-stable branch and to remove that logic completely on main. The rationale is that CDATA behind regular single-line JS comments (//) would already work and the problem is just with script contents wrapped in HTML-style comments (<!-- & -->) and only in IE. Such wrapping was necessary for very old browsers without script support so that they don't render contents of script tags. AFAIK this stopped being necessary ages before jQuery came out.

There is some risk some tools will still generate script tags like that and the removal of that logic would break those scenarios but the replacement itself is not and cannot be ideal and there will always be cases it cannot match like HTML comments in the middle of a script. Trying to not mess with the content before processing it seems to be a golden standard to prevent XSS issues (apart from more substantial solutions like CSP).

@mgol
Copy link
Member

mgol commented Jul 15, 2021

Here's a good explanation of why HTML comments, JS comments & CDATA markers are sometimes used together in weird combinations: https://www.sitepoint.com/community/t/cdata-comment/2072/7.

@mgol
Copy link
Member

mgol commented Jul 15, 2021

Another update: while IE doesn't conform to the spec in regular scripts or in eval, jQuery.globalEval - used in dom manipulation for executing scritps - is now always using a script tag where these semantics are preserved even in IE 9. Therefore, we can cut the HTML comments escaping completely.

We still need to leave the CDATA part in 3.x as getting rid of that would be a breaking change but we can remove it in 4.x without breaking handling of HTML comments in HTML documents or CDATA in XHTML ones even in IE.

mgol added a commit to mgol/jquery that referenced this issue Jul 15, 2021
When evaluating scripts, jQuery strips out the possible wrapping HTML comment
and a CDATA section. However, all supported browsers are already doing that
when loading JS via appending a script tag to the DOM which is how we've been
doing `jQuery.globalEval` since jQuery 3.0.0. jQuery logic was imperfect, e.g.
it just stripped the `<!--` and `-->` markers, respectively at the beginning or
the end of the script contents. However, browsers are also stripping everything
following those markers in the same line, treating them as single-line comments
delimiters; this is now also mandated by ECMAScript 2015 in Annex B. Instead
of fixing the jQuery logic, just let the browser do its thing.

We still need to strip CDATA sections for backwards compatibility. This
shouldn't be needed as in XML documents they're already not visible when
inspecting element contents and in HTML documents they have no meaning but
we're preserving that logic for backwards compatibility. This will be removed
completely in 4.0.

Fixes jquerygh-4904
mgol added a commit to mgol/jquery that referenced this issue Jul 15, 2021
When evaluating scripts, jQuery strips out the possible wrapping HTML comment
and a CDATA section. However, all supported browsers are already doing that
when loading JS via appending a script tag to the DOM which is how we've been
doing `jQuery.globalEval` since jQuery 3.0.0. jQuery logic was imperfect, e.g.
it just stripped the `<!--` and `-->` markers, respectively at the beginning or
the end of the script contents. However, browsers are also stripping everything
following those markers in the same line, treating them as single-line comments
delimiters; this is now also mandated by ECMAScript 2015 in Annex B. Instead
of fixing the jQuery logic, just let the browser do its thing.

We also used to strip CDATA sections. However, this shouldn't be needed as in
XML documents they're already not visible when inspecting element contents and
in HTML documents they have no meaning. We've preserved that behavior for
backwards compatibility in 3.x but we're removing it for 4.0.

Fixes jquerygh-4904
@mgol
Copy link
Member

mgol commented Jul 15, 2021

@mgol mgol added this to the 3.6.1 milestone Jul 15, 2021
mgol added a commit that referenced this issue Jul 19, 2021
When evaluating scripts, jQuery strips out the possible wrapping HTML comment
and a CDATA section. However, all supported browsers are already doing that
when loading JS via appending a script tag to the DOM which is how we've been
doing `jQuery.globalEval` since jQuery 3.0.0. jQuery logic was imperfect, e.g.
it just stripped the `<!--` and `-->` markers, respectively at the beginning or
the end of the script contents. However, browsers are also stripping everything
following those markers in the same line, treating them as single-line comments
delimiters; this is now also mandated by ECMAScript 2015 in Annex B. Instead
of fixing the jQuery logic, just let the browser do its thing.

We also used to strip CDATA sections. However, this shouldn't be needed as in
XML documents they're already not visible when inspecting element contents and
in HTML documents they have no meaning. We've preserved that behavior for
backwards compatibility in 3.x but we're removing it for 4.0.

Fixes gh-4904
Closes gh-4906
mgol added a commit that referenced this issue Jul 19, 2021
When evaluating scripts, jQuery strips out the possible wrapping HTML comment
and a CDATA section. However, all supported browsers are already doing that
when loading JS via appending a script tag to the DOM which is how we've been
doing `jQuery.globalEval` since jQuery 3.0.0. jQuery logic was imperfect, e.g.
it just stripped the `<!--` and `-->` markers, respectively at the beginning or
the end of the script contents. However, browsers are also stripping everything
following those markers in the same line, treating them as single-line comments
delimiters; this is now also mandated by ECMAScript 2015 in Annex B. Instead
of fixing the jQuery logic, just let the browser do its thing.

We still need to strip CDATA sections for backwards compatibility. This
shouldn't be needed as in XML documents they're already not visible when
inspecting element contents and in HTML documents they have no meaning but
we're preserving that logic for backwards compatibility. This will be removed
completely in 4.0.

Fixes gh-4904
Closes gh-4905
Ref gh-4906
@mgol
Copy link
Member

mgol commented Jul 19, 2021

Landed in 2f8f39e on main and in 924b515 on 3.x-stable.

@mgol mgol removed the Discuss in Meeting Reserved for Issues and PRs that anyone would like to discuss in the weekly meeting. label Jul 19, 2021
@ovarn
Copy link
Author

ovarn commented Jul 20, 2021

Michał, thank you for the quick resolution and problem clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants
@mgol @ovarn and others