Carefully constructed markup sneaks tags through as "text" #105

boutell · 2014-10-14T18:08:10Z

This code:

<<img src="javascript:evil"/>img src="javascript:evil"/>

Results in the following sequence of onopentag/ontext/onclosetag events:

text: < open: img (with the expected src attribute) close: img text: img src="javascript:evil"/>

Since the sanitize-html module trusts "text" coming from htmlparser2, and outputs it without further escaping (because htmlparser2 does not decode entities in text before delivering it), this results in an XSS attack vector if sanitize-html ignores the img tag (according to user-configured filter rules) but passes the text intact, as it must do to keep any text in documents.

I have verified that the bug still exists as of version 3.7.3.

The text was updated successfully, but these errors were encountered:

bkimminich · 2014-10-14T19:01:10Z

👍

fb55 · 2014-10-20T18:44:34Z

Sorry for the late response.

@boutell You can enable entity decoding using decodeEntities: true. Anyway, that looks like a bug, maybe switching to the tokenizer of high5 fixes it. Won't be fixed anytime soon though.

boutell · 2014-10-20T18:46:09Z

Thanks for the update. A pity it won't be fixed soon, but hey, we're not
paying you to fix it.

On Mon, Oct 20, 2014 at 2:44 PM, Felix Böhm notifications@github.com
wrote:

Sorry for the late response.

@boutell https://github.com/boutell You can enable entity decoding
using decodeEntities: true. Anyway, that looks like a bug, maybe
switching to the tokenizer of high5 fixes it. Won't be fixed anytime soon
though.

—
Reply to this email directly or view it on GitHub
#105 (comment).

*THOMAS BOUTELL, *DEV & OPS
P'UNK AVENUE | (215) 755-1330 | punkave.com

htmlparser2 3.8.2 has known vulnerabilities: fb55/htmlparser2#105 ↳ jshint 2.5.10 ↳ htmlparser2 3.8.2 But we don't care; it's only a development dependency.

fb55 · 2015-01-11T12:45:51Z

As this seems to be confusing for a lot of people: This is not a vulnerability, but instead a bug in @boutell's module. The behavior is in-line with the HTML spec (I wasn't sure about it in my previous comment).

Entities aren't decoded by default, only not to break backwards compatibility, but will be in the next major release (which will mainly consist of #114, I only need to take a day and add positional support to high5). It is recommended to always decoded entities, then use eg. entities to encode them again. Of course this has a performance penalty, but it eliminates this risk.

For now, I'll add a note to the wiki page recommending to always enable decodeEntities, which is pretty much everything that can be done here.

boutell · 2015-01-11T16:22:54Z

OK, version 1.5.1 of sanitize-html uses decodeEntities: true and passes its filter evasion tests without the need for recursive invocation. Thanks.

If the behavior with decodeEntities: false is inherently unsafe I wonder if it should be offered at all in the next release. But maybe it has an application I'm not seeing.

AlynxZhou · 2019-05-19T04:39:53Z

Always depending on encode() is not a good opinion, because it also encode CJK chars into entities, which is hard to do string operations (chars and length are changed)...

issue fb55/htmlparser2#105 has not been fixed yet

false positive (albeit it could use more secure default) see fb55/htmlparser2#105

boutell mentioned this issue Oct 14, 2014

Sanitization not applied recursively apostrophecms/sanitize-html#29

Closed

boutell pushed a commit to apostrophecms/sanitize-html that referenced this issue Oct 14, 2014

recursive invocation to protect against fb55/htmlparser2#105

762fbc7

fb55 added the Requires investigation label Oct 20, 2014

jfirebaugh added a commit to mapbox/jsskel that referenced this issue Nov 20, 2014

Retireignore jshint

31b1782

htmlparser2 3.8.2 has known vulnerabilities: fb55/htmlparser2#105 ↳ jshint 2.5.10 ↳ htmlparser2 3.8.2 But we don't care; it's only a development dependency.

aslamj mentioned this issue Dec 10, 2014

'grunt-retire' complaining about latest version of grunt-retire 0.3.6 RetireJS/grunt-retire#17

Closed

phun-ky mentioned this issue Dec 11, 2014

Please update dependency version for hmtlparser2 jshint/jshint#2029

Closed

This was referenced Jan 9, 2015

Grunt task to check the dependencies added skepticfx/subquest#9

Merged

Grunt task to look for vulnerabilities in dependencies added TryGhost/Ghost#4786

Closed

Grunt task to look for vulnerabilities in dependencies added TryGhost/Ghost#4787

Closed

fb55 removed the Requires investigation label Jan 11, 2015

fb55 closed this as completed Jan 11, 2015

thorn0 mentioned this issue Sep 3, 2018

Always escape < in text regardless of decodeEntities cheeriojs/dom-serializer#75

Closed

humphd mentioned this issue Oct 5, 2021

escape thing inside of code tag Seneca-CDOT/telescope#2337

Merged

8 tasks

AdamBarah pushed a commit to AdamBarah/retire.js that referenced this issue Jul 15, 2022

htmlparser2 below 3.8.3 still vulnerable

6a5c8a1

issue fb55/htmlparser2#105 has not been fixed yet

AdamBarah pushed a commit to AdamBarah/retire.js that referenced this issue Jul 15, 2022

removed htmlparser2

b6ee0ee

false positive (albeit it could use more secure default) see fb55/htmlparser2#105

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Carefully constructed markup sneaks tags through as "text" #105

Carefully constructed markup sneaks tags through as "text" #105

boutell commented Oct 14, 2014

bkimminich commented Oct 14, 2014

fb55 commented Oct 20, 2014

boutell commented Oct 20, 2014

fb55 commented Jan 11, 2015

boutell commented Jan 11, 2015

AlynxZhou commented May 19, 2019

Carefully constructed markup sneaks tags through as "text" #105

Carefully constructed markup sneaks tags through as "text" #105

Comments

boutell commented Oct 14, 2014

bkimminich commented Oct 14, 2014

fb55 commented Oct 20, 2014

boutell commented Oct 20, 2014

fb55 commented Jan 11, 2015

boutell commented Jan 11, 2015

AlynxZhou commented May 19, 2019