HTML Scanner parser regression #2550

Closed
titanous opened this Issue Aug 16, 2011 · 11 comments

Comments

Projects
None yet
7 participants

@tenderlove:

586a944 causes at least two different types of false-positive parse errors that did not exist in 3.0.9:

def test_xml_tag
  assert_nothing_raised { HTML::Document.new('<?xml version="1.0" encoding="UTF-8"?>', true, true) }
end

def test_js
  assert_nothing_raised { HTML::Document.new('<script>1<=2</script>', true, true) }
end
Loaded suite test/template/html-scanner/document_test
Started
......F........F
Finished in 0.016336 seconds.

  1) Failure:
test_js(DocumentTest)
    [test/template/html-scanner/document_test.rb:153:in `test_js']:
Exception raised:
Class: <RuntimeError>
Message: <"expected > (got \"\" for <=2, {})">
---Backtrace---
/Users/titanous/src/rails/actionpack/lib/action_controller/vendor/html-scanner/html/node.rb:194:in `parse'
/Users/titanous/src/rails/actionpack/lib/action_controller/vendor/html-scanner/html/document.rb:20:in `initialize'
test/template/html-scanner/document_test.rb:153:in `new'
test/template/html-scanner/document_test.rb:153:in `test_js'
test/template/html-scanner/document_test.rb:153:in `test_js'
---------------

  2) Failure:
test_xml_tag(DocumentTest)
    [test/template/html-scanner/document_test.rb:149:in `test_xml_tag']:
Exception raised:
Class: <RuntimeError>
Message: <"expected > (got \"?>\" for <?xml version=\"1.0\" encoding=\"UTF-8\"?>, {\"encoding\"=>\"UTF-8\", \"version\"=>\"1.0\"})">
---Backtrace---
/Users/titanous/src/rails/actionpack/lib/action_controller/vendor/html-scanner/html/node.rb:194:in `parse'
/Users/titanous/src/rails/actionpack/lib/action_controller/vendor/html-scanner/html/document.rb:20:in `initialize'
test/template/html-scanner/document_test.rb:149:in `new'
test/template/html-scanner/document_test.rb:149:in `test_xml_tag'
test/template/html-scanner/document_test.rb:149:in `test_xml_tag'
---------------

16 tests, 30 assertions, 2 failures, 0 errors
Owner

tenderlove commented Aug 16, 2011

I highly recommend you use a real HTML parser like loofah.

I will be replacing the html scanner with loofah in Rails 3.2.

That's a good idea, but assert_select uses HTML Scanner, which causes a bunch of false-positive warnings to spew while running our test suite (previous to 3.0.10 we had strict mode on, breaking our tests if things didn't parse properly).

Owner

tenderlove commented Aug 17, 2011

Yes. Your choices are:

  • Fix your code so that assert_select is happy
  • Provide a patch that you can guarantee against XSS attacks
  • Use loofah and write a different assert_select
  • wait until Rails 3.2.

Writing an HTML parser is a fools errand. I advise against it. Bottom line is that I won't fix this until 3.2, but I will apply patches.

Member

NZKoz commented Aug 17, 2011

I'd veto #2 in that list too, it's too risky for us to make those changes given the severity of the vulnerability that was fixed.

A patch to fix assert_select itself would be worth it, but the html scanner is a big risk to change.

@ghost ghost assigned tenderlove Dec 20, 2011

Member

steveklabnik commented Apr 30, 2012

@tenderlove Did you actually put loofah into Rails 3.2? Can this be closed?

Doesn't look like it: https://github.com/rails/rails/tree/master/actionpack/lib/action_controller/vendor

Owner

tenderlove commented Apr 30, 2012

Not yet. Maybe we can get some help with that? /cc @flavorjones

Serious work needs to be done to either a) deprecate parts of Rails's sanitization API which are both undocumented and untested (yes, srsly) or b) enhance Loofah to support Rails's sanitzation API.

I've got a side branch where some of b) is done, but if we can have an conversation about deprecating some of the API, that might be both easier and better.

Owner

rafaelfranca commented Jun 12, 2012

I can help to get this done.

@rafaelfranca - Let me ramp back up on what's partially done ... give me a day or two to find the code and reread my notes.

Owner

rafaelfranca commented Jun 12, 2012

@flavorjones sure!

@ghost ghost assigned rafaelfranca Jun 29, 2012

Owner

guilleiguaran commented Nov 2, 2013

Guys I'll close this since looks like this won't be fixed in 4.0.x and we have a patch to integrate Loofah in 4.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment