New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrub not fully applied on HTML::Document #80

chengguangnan opened this Issue Nov 24, 2014 · 3 comments


None yet
2 participants

chengguangnan commented Nov 24, 2014

I noticed that some HTML comment tags are not removed.

Here is an example, my_scrub should remove all the comments.

Loofah.document("<!DOCTYPE html><!--[if IE 7]><!-- --><html><body><script></script></body></html><!--ww -->").scrub!(my_scrub).to_xml
=> "<!DOCTYPE html>\n<!--[if IE 7]><!-- --><html></html>\n"

I check the code and think the problem is here:

        case self
        when Nokogiri::XML::Document
          scrubber.traverse(root) if root
        when Nokogiri::XML::DocumentFragment
          children.scrub! scrubber

So even a HTML::Document would went through scrubber.traverse(root) if root. So things outside of HTML will not went through this scrubber.


This comment has been minimized.


flavorjones commented Dec 3, 2014


Thank you for reporting your issue. Can you please provide a working example that demonstrates this problem? In your example, you reference my_scrub but have not provided your implementation of that scrubber.



This comment has been minimized.

chengguangnan commented Dec 3, 2014

require "loofah"

class Scrubber < Loofah::Scrubber
  def scrub(node)
    if node.class == Nokogiri::XML::DTD or %w[ script style head comment ].include?(
      Loofah::Scrubber::STOP # don't bother with the rest of the subtree

my_scrub =

puts Loofah.document("<!DOCTYPE html><!--[if IE 7]><!-- --><html><body><script></script></body></html><!--ww -->").scrub!(my_scrub).to_xml

This comment has been minimized.


flavorjones commented Feb 11, 2018

Apologies for the atrociously long delay in responding. I understand what you're reporting, and acknowledge that the comments outside of the html tag are not being scrubbed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment