Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrub not fully applied on HTML::Document #80

Closed
chengguangnan opened this issue Nov 24, 2014 · 4 comments
Closed

Scrub not fully applied on HTML::Document #80

chengguangnan opened this issue Nov 24, 2014 · 4 comments
Milestone

Comments

@chengguangnan
Copy link

I noticed that some HTML comment tags are not removed.

Here is an example, my_scrub should remove all the comments.

Loofah.document("<!DOCTYPE html><!--[if IE 7]><!-- --><html><body><script></script></body></html><!--ww -->").scrub!(my_scrub).to_xml
=> "<!DOCTYPE html>\n<!--[if IE 7]><!-- --><html></html>\n"

I check the code and think the problem is here:

https://github.com/flavorjones/loofah/blob/master/lib/loofah/instance_methods.rb#L41

        case self
        when Nokogiri::XML::Document
          scrubber.traverse(root) if root
        when Nokogiri::XML::DocumentFragment
          children.scrub! scrubber
        else
          scrubber.traverse(self)
        end

So even a HTML::Document would went through scrubber.traverse(root) if root. So things outside of HTML will not went through this scrubber.

@flavorjones
Copy link
Owner

Hi,

Thank you for reporting your issue. Can you please provide a working example that demonstrates this problem? In your example, you reference my_scrub but have not provided your implementation of that scrubber.

-m

@chengguangnan
Copy link
Author

require "loofah"

class Scrubber < Loofah::Scrubber
  def scrub(node)
    if node.class == Nokogiri::XML::DTD or %w[ script style head comment ].include?(node.name)
      node.remove
      Loofah::Scrubber::STOP # don't bother with the rest of the subtree
    end
  end
end

my_scrub = Scrubber.new

puts Loofah.document("<!DOCTYPE html><!--[if IE 7]><!-- --><html><body><script></script></body></html><!--ww -->").scrub!(my_scrub).to_xml

@flavorjones
Copy link
Owner

Apologies for the atrociously long delay in responding. I understand what you're reporting, and acknowledge that the comments outside of the html tag are not being scrubbed.

@flavorjones
Copy link
Owner

Fix will be in v2.5.0.

Note: in the fix I've chosen to remove any comments from a Loofah::HTML::Document that exist outside the html tag. I could have applied the scrubber to non-html root nodes, but unfortunately these nodes (or rather, the document itself) don't meet the contract expected by the scrubber (for example, these comments can't be replaced by an arbitrary node type, or have a sibling node added).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants