Security: Review our escaping of HTML #2545

joshgoebel · 2020-05-11T01:03:20Z

I guess it is fine if hljs already escapes dangerous characters in the source.

We currently do > < &... it looks like we should maybe add a few?

&   ------->   &amp;
<   ------->   &lt;
>   ------->   &gt;
"   ------->   &quot;
'   ------->   &#x27;
/   ------->   &#x2F;

Should that cover it? I'm not sure (off the top of my head) how the quotes or / could break out on their own though without the tag characters... I think the quotes are more about raw insertion anywhere (like in the middle of an HTML attribute) as shown here:

https://webmasters.stackexchange.com/questions/12335/should-i-escape-the-apostrophe-character-with-its-html-entity-39

That's not what we do... although now I'm wondering if there is a potential attack vector with a evilly coded grammar via className. Although grammars already run any JS they want freely though, so some sort of attack via className would be the HARD way to do it.

Originally posted by @yyyc514 in #2537 (comment)

The text was updated successfully, but these errors were encountered:

joshgoebel · 2020-05-11T01:05:38Z

The one place we mess with HTML attributes (merging embedded code) we always use " and we then also escape "... but I'm wondering if we should perhaps extend the regular utils#escapeHTML to escape all of this anyways - just to be on the side of caution? (or else rename it to indicate it's more limited scope)

CC @allejo @egor-rogov

silverwind · 2020-05-11T17:40:41Z

For HTML strings, based on this, I think just escaping those 5 chars should be enough to prevent interpretation as non-string by the browser.

joshgoebel · 2020-05-12T00:02:43Z

Yeah I thought / was getting a little silly. Ampersand seems so easy to do (and well known) but what type of attack is centered around ampersand? Do you know?

If I escaped everything but & what would the risk be?

joshgoebel · 2020-05-12T04:40:46Z

Ah, another string escaping issue since you can use & to add an escaped quote instead of a literal quote:

https://erlend.oftedal.no/blog/static-124.html

joshgoebel · 2020-05-13T17:33:44Z

This would break 124 tests... and while I can find time to go thru and fix them... I'm becoming less enthusiastic about this if it's not a real issue.

allejo · 2020-05-13T18:11:41Z

I want to say our current escaping is good enough since I think that's the only places where untrusted user input could be used to escape HTML tags that highlight.js is creating. Imagine a blog, a blog user's code block content should be escaped fully and be safe to render on the page. Blog users, however, shouldn't have access to run arbitrary JS nor include custom grammars. Like you said, grammars can run any arbitrary JS so site hosts should only use trusted grammars.

In my opinion, could this still be exploited? I mean, sure; quotes should be escaped too according to "best practices." But I wouldn't consider this a security issue unless it affected the actual code that the parser was highlighting.

joshgoebel · 2020-05-13T18:15:14Z

Blog users, however, shouldn't have access to run arbitrary JS nor include custom grammars.

Well pretty sure they could do that from the browser console - if someone is just using the simple global hljs object... I mean it is all client-side after all.

Imagine a blog, a blog user's code block content should be escaped fully and be safe to render on the page.

Are you talking about highlightBlock here exclusively? Remember there is the browser side usage and also the server-side usage, where you're passing raw content.

I wonder now if innerHTML escapes quotes and such things in the middle of content as they are retrieved?

joshgoebel · 2020-05-13T18:19:04Z

Interesting single and double quotes go in encoded then come out text... while & survives the round trip. Interesting (Safari).

joshgoebel · 2020-05-13T18:20:31Z

I want to say our current escaping is good enough since I think that's the only places where untrusted user input could be used to escape HTML tags that highlight.js is creating.

I think it's ok too, but security is often about layers. :-) I just realized I can probably just have the test suite write out all new files for those 126 cases... so it's not actually any manual work that would need to be done.

allejo · 2020-05-13T18:26:07Z

I just realized I can probably just have the test suite write out all new files for those 126 cases... so it's not actually any manual work that would need to be done.

Are the tests breaking because now quotes are being escaped whereas before they weren't?

allejo · 2020-05-13T18:30:11Z

Remember there is the browser side usage and also the server-side usage, where you're passing raw content.

I forgot about server-side usage. I was assuming something else was magically taking care of the sanitization before giving it to highlight.js on the browser to highlight.

If we're in charge of escaping/sanitizing raw user input, then I'd retract my statement and say we probably should update to escape quotes.

joshgoebel · 2020-05-13T18:30:48Z

Are the tests breaking because now quotes are being escaped whereas before they weren't?

Yeah, I added escaping quotes and everything breaks in the markup tests since we are comparing the literal expected output vs input.

joshgoebel · 2020-05-13T18:31:42Z

If we're in charge of escaping/sanitizing raw user input, then I'd retract my statement and say we probably should update to escape quotes.

Evidentially in the browser we are also since escaped quotes don't survive the HTML -> innerHTML -> variable transition anyways... at least in Safari.

joshgoebel added the parser label May 11, 2020

joshgoebel self-assigned this May 11, 2020

joshgoebel added this to the 10.1 milestone May 11, 2020

joshgoebel mentioned this issue May 18, 2020

(parser) properly escape ' and " in HTML output #2564

Merged

joshgoebel closed this as completed in #2564 May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security: Review our escaping of HTML #2545

Security: Review our escaping of HTML #2545

joshgoebel commented May 11, 2020

joshgoebel commented May 11, 2020 •

edited

silverwind commented May 11, 2020

joshgoebel commented May 12, 2020

joshgoebel commented May 12, 2020

joshgoebel commented May 13, 2020

allejo commented May 13, 2020

joshgoebel commented May 13, 2020

joshgoebel commented May 13, 2020 •

edited

joshgoebel commented May 13, 2020

allejo commented May 13, 2020

allejo commented May 13, 2020

joshgoebel commented May 13, 2020

joshgoebel commented May 13, 2020

Security: Review our escaping of HTML #2545

Security: Review our escaping of HTML #2545

Comments

joshgoebel commented May 11, 2020

joshgoebel commented May 11, 2020 • edited

silverwind commented May 11, 2020

joshgoebel commented May 12, 2020

joshgoebel commented May 12, 2020

joshgoebel commented May 13, 2020

allejo commented May 13, 2020

joshgoebel commented May 13, 2020

joshgoebel commented May 13, 2020 • edited

joshgoebel commented May 13, 2020

allejo commented May 13, 2020

allejo commented May 13, 2020

joshgoebel commented May 13, 2020

joshgoebel commented May 13, 2020

joshgoebel commented May 11, 2020 •

edited

joshgoebel commented May 13, 2020 •

edited