Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security: Review our escaping of HTML #2545

Closed
joshgoebel opened this issue May 11, 2020 · 13 comments · Fixed by #2564
Closed

Security: Review our escaping of HTML #2545

joshgoebel opened this issue May 11, 2020 · 13 comments · Fixed by #2564
Assignees
Labels
Milestone

Comments

@joshgoebel
Copy link
Member

I guess it is fine if hljs already escapes dangerous characters in the source.

We currently do > < &... it looks like we should maybe add a few?

&   ------->   &amp;
<   ------->   &lt;
>   ------->   &gt;
"   ------->   &quot;
'   ------->   &#x27;
/   ------->   &#x2F;

Should that cover it? I'm not sure (off the top of my head) how the quotes or / could break out on their own though without the tag characters... I think the quotes are more about raw insertion anywhere (like in the middle of an HTML attribute) as shown here:

https://webmasters.stackexchange.com/questions/12335/should-i-escape-the-apostrophe-character-with-its-html-entity-39

That's not what we do... although now I'm wondering if there is a potential attack vector with a evilly coded grammar via className. Although grammars already run any JS they want freely though, so some sort of attack via className would be the HARD way to do it.

Originally posted by @yyyc514 in #2537 (comment)

@joshgoebel
Copy link
Member Author

joshgoebel commented May 11, 2020

The one place we mess with HTML attributes (merging embedded code) we always use " and we then also escape "... but I'm wondering if we should perhaps extend the regular utils#escapeHTML to escape all of this anyways - just to be on the side of caution? (or else rename it to indicate it's more limited scope)

CC @allejo @egor-rogov

@joshgoebel joshgoebel self-assigned this May 11, 2020
@joshgoebel joshgoebel added this to the 10.1 milestone May 11, 2020
@silverwind
Copy link

For HTML strings, based on this, I think just escaping those 5 chars should be enough to prevent interpretation as non-string by the browser.

@joshgoebel
Copy link
Member Author

Yeah I thought / was getting a little silly. Ampersand seems so easy to do (and well known) but what type of attack is centered around ampersand? Do you know?

If I escaped everything but & what would the risk be?

@joshgoebel
Copy link
Member Author

Ah, another string escaping issue since you can use & to add an escaped quote instead of a literal quote:

https://erlend.oftedal.no/blog/static-124.html

@joshgoebel
Copy link
Member Author

This would break 124 tests... and while I can find time to go thru and fix them... I'm becoming less enthusiastic about this if it's not a real issue.

@allejo
Copy link
Member

allejo commented May 13, 2020

I want to say our current escaping is good enough since I think that's the only places where untrusted user input could be used to escape HTML tags that highlight.js is creating. Imagine a blog, a blog user's code block content should be escaped fully and be safe to render on the page. Blog users, however, shouldn't have access to run arbitrary JS nor include custom grammars. Like you said, grammars can run any arbitrary JS so site hosts should only use trusted grammars.

In my opinion, could this still be exploited? I mean, sure; quotes should be escaped too according to "best practices." But I wouldn't consider this a security issue unless it affected the actual code that the parser was highlighting.

@joshgoebel
Copy link
Member Author

Blog users, however, shouldn't have access to run arbitrary JS nor include custom grammars.

Well pretty sure they could do that from the browser console - if someone is just using the simple global hljs object... I mean it is all client-side after all.

Imagine a blog, a blog user's code block content should be escaped fully and be safe to render on the page.

Are you talking about highlightBlock here exclusively? Remember there is the browser side usage and also the server-side usage, where you're passing raw content.

I wonder now if innerHTML escapes quotes and such things in the middle of content as they are retrieved?

@joshgoebel
Copy link
Member Author

joshgoebel commented May 13, 2020

Interesting single and double quotes go in encoded then come out text... while &amp; survives the round trip. Interesting (Safari).

@joshgoebel
Copy link
Member Author

I want to say our current escaping is good enough since I think that's the only places where untrusted user input could be used to escape HTML tags that highlight.js is creating.

I think it's ok too, but security is often about layers. :-) I just realized I can probably just have the test suite write out all new files for those 126 cases... so it's not actually any manual work that would need to be done.

@allejo
Copy link
Member

allejo commented May 13, 2020

I just realized I can probably just have the test suite write out all new files for those 126 cases... so it's not actually any manual work that would need to be done.

Are the tests breaking because now quotes are being escaped whereas before they weren't?

@allejo
Copy link
Member

allejo commented May 13, 2020

Remember there is the browser side usage and also the server-side usage, where you're passing raw content.

I forgot about server-side usage. I was assuming something else was magically taking care of the sanitization before giving it to highlight.js on the browser to highlight.

If we're in charge of escaping/sanitizing raw user input, then I'd retract my statement and say we probably should update to escape quotes.

@joshgoebel
Copy link
Member Author

Are the tests breaking because now quotes are being escaped whereas before they weren't?

Yeah, I added escaping quotes and everything breaks in the markup tests since we are comparing the literal expected output vs input.

@joshgoebel
Copy link
Member Author

If we're in charge of escaping/sanitizing raw user input, then I'd retract my statement and say we probably should update to escape quotes.

Evidentially in the browser we are also since escaped quotes don't survive the HTML -> innerHTML -> variable transition anyways... at least in Safari.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants