Scrub not fully applied on HTML::Document #80

Open
chengguangnan opened this Issue Nov 24, 2014 · 2 comments

Projects

None yet

2 participants

@chengguangnan

I noticed that some HTML comment tags are not removed.

Here is an example, my_scrub should remove all the comments.

Loofah.document("<!DOCTYPE html><!--[if IE 7]><!-- --><html><body><script></script></body></html><!--ww -->").scrub!(my_scrub).to_xml
=> "<!DOCTYPE html>\n<!--[if IE 7]><!-- --><html></html>\n"

I check the code and think the problem is here:

https://github.com/flavorjones/loofah/blob/master/lib/loofah/instance_methods.rb#L41

        case self
        when Nokogiri::XML::Document
          scrubber.traverse(root) if root
        when Nokogiri::XML::DocumentFragment
          children.scrub! scrubber
        else
          scrubber.traverse(self)
        end

So even a HTML::Document would went through scrubber.traverse(root) if root. So things outside of HTML will not went through this scrubber.

@flavorjones
Owner

Hi,

Thank you for reporting your issue. Can you please provide a working example that demonstrates this problem? In your example, you reference my_scrub but have not provided your implementation of that scrubber.

-m

@chengguangnan
require "loofah"

class Scrubber < Loofah::Scrubber
  def scrub(node)
    if node.class == Nokogiri::XML::DTD or %w[ script style head comment ].include?(node.name)
      node.remove
      Loofah::Scrubber::STOP # don't bother with the rest of the subtree
    end
  end
end

my_scrub = Scrubber.new

puts Loofah.document("<!DOCTYPE html><!--[if IE 7]><!-- --><html><body><script></script></body></html><!--ww -->").scrub!(my_scrub).to_xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment