Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PermitScrubber treats ProcessingInstructions as Elements #115

dmpotter44 opened this issue May 24, 2021 · 2 comments · Fixed by #116

PermitScrubber treats ProcessingInstructions as Elements #115

dmpotter44 opened this issue May 24, 2021 · 2 comments · Fixed by #116


Copy link

The PermitScrubber class treats ProcessingInstructions as if they were Elements. (Actually, strictly speaking, it doesn't check the node type at all, beyond special handling for CDATA nodes, it just checks the name field.) This leads to some odd behavior:

require 'rails-html-sanitizer'
sanitizer =
# Sanitizer is using PermitScrubber with default properties
# The actual issue is within PermitScrubber, but for simplicity in showing the
# issue, these examples use a SafeListSanitizer
sanitizer.sanitize('<a href=example.html>Expected link</a><?a href=pi.html>not expected PI')
# => "<a href=\"example.html\">Expected link</a><?a href=pi.html>not expected PI"

However, this is because the PI has a white-listed name. So, for example:

sanitizer.sanitize('<b>Bold</b><?made up>')
# => "<b>Bold</b>"

I haven't figured out any way to do anything evil with this in modern browsers because modern browsers seem to correctly parse and ignore processing instructions. However, the contents of the PI are passed through to the final result unmodified, so:

sanitizer.sanitize('<?a <script>alert("Hello")//<?a </script>')
# => "<?a <script>alert(\"Hello\")//<?a </script>"

There was apparently an issue in older versions of Internet Explorer where PIs were not parsed correctly, potentially allowing the script to be run. It does not appear to run in modern browsers, though. (Chrome just changes the PIs into comments.)

It's worth noting that, yes, HTML has processing instructions, because SGML has processing instructions. <?a something> is a valid HTML processing instruction. It doesn't do anything, but it's valid SGML and may be parsed as such. (Unlike XML PIs, SGML PIs end with just > and not ?>.)

The solution is likely to update the scrub method in PermitScrubber to remove PIs. Alternatively, the allowed_node? method could be updated to determine if the node is really an element before checking it against the whitelist.

Copy link

@dmpotter44 Thanks for opening this issue! I've reproduced what you're seeing.

I'll take a deeper look when I have a bit more time, but I think I agree with your assessment that these nodes should be stripped when sanitizing.

Copy link

PR submitted at #116 to address this.

flavorjones added a commit that referenced this issue Aug 18, 2021
Some scrubbers want to allow comments through, but in v1.4.0 didn't
get the chance because only elements were passed through to

This change allows comments and elements through, but still omits
other non-elements like processing instructions (see #115).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging a pull request may close this issue.

2 participants