New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrub not fully applied on HTML::Document #80

Open
chengguangnan opened this Issue Nov 24, 2014 · 3 comments

Comments

Projects
None yet
2 participants
@chengguangnan

chengguangnan commented Nov 24, 2014

I noticed that some HTML comment tags are not removed.

Here is an example, my_scrub should remove all the comments.

Loofah.document("<!DOCTYPE html><!--[if IE 7]><!-- --><html><body><script></script></body></html><!--ww -->").scrub!(my_scrub).to_xml
=> "<!DOCTYPE html>\n<!--[if IE 7]><!-- --><html></html>\n"

I check the code and think the problem is here:

https://github.com/flavorjones/loofah/blob/master/lib/loofah/instance_methods.rb#L41

        case self
        when Nokogiri::XML::Document
          scrubber.traverse(root) if root
        when Nokogiri::XML::DocumentFragment
          children.scrub! scrubber
        else
          scrubber.traverse(self)
        end

So even a HTML::Document would went through scrubber.traverse(root) if root. So things outside of HTML will not went through this scrubber.

@flavorjones

This comment has been minimized.

Owner

flavorjones commented Dec 3, 2014

Hi,

Thank you for reporting your issue. Can you please provide a working example that demonstrates this problem? In your example, you reference my_scrub but have not provided your implementation of that scrubber.

-m

@chengguangnan

This comment has been minimized.

chengguangnan commented Dec 3, 2014

require "loofah"

class Scrubber < Loofah::Scrubber
  def scrub(node)
    if node.class == Nokogiri::XML::DTD or %w[ script style head comment ].include?(node.name)
      node.remove
      Loofah::Scrubber::STOP # don't bother with the rest of the subtree
    end
  end
end

my_scrub = Scrubber.new

puts Loofah.document("<!DOCTYPE html><!--[if IE 7]><!-- --><html><body><script></script></body></html><!--ww -->").scrub!(my_scrub).to_xml
@flavorjones

This comment has been minimized.

Owner

flavorjones commented Feb 11, 2018

Apologies for the atrociously long delay in responding. I understand what you're reporting, and acknowledge that the comments outside of the html tag are not being scrubbed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment