Skip to content

Commit

Permalink
fix(cruby): stop clobbering libxml2 error handler on SAX parser init
Browse files Browse the repository at this point in the history
This was leading to loss of error capture on extremely short HTML docs
when encoding was not passed by the caller.

This call was introduced in d23fe2c (#87) for reasons that are
unclear, but we've come a long way with how we manage the global error
handlers and so I think we're OK to stop doing this now.
  • Loading branch information
flavorjones committed Dec 28, 2020
1 parent 14cd299 commit 74fa2f5
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 2 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::S
* [JRuby] XML::Schema XSD validation errors are captured in `XML::Schema#errors`. These errors were previously ignored.
* [JRuby] Standardize reading from IO like objects, including StringIO. [[#1888](https://github.com/sparklemotion/nokogiri/issues/1888), [#1897](https://github.com/sparklemotion/nokogiri/issues/1897)]
* [JRuby] Comparison of Node to Document with `Node#<=>` now matches CRuby/libxml2 behavior.
* [CRuby] Syntax errors are now correctly captured in `Document#errors` for short HTML documents. Previously the SAX parser used for encoding detection was clobbering libxml2's global error handler.
* [CRuby] Fixed installation on AIX with respect to `vasprintf`. [[#1908](https://github.com/sparklemotion/nokogiri/issues/1908)]
* [CRuby] On some platforms, avoid symbol name collision with glibc's `canonicalize`. [[#2105](https://github.com/sparklemotion/nokogiri/issues/2105)]
* [Windows Visual C++] Fixed compiler warnings and errors. [[#2061](https://github.com/sparklemotion/nokogiri/issues/2061), [#2068](https://github.com/sparklemotion/nokogiri/issues/2068)]
Expand Down
2 changes: 0 additions & 2 deletions ext/nokogiri/xml_sax_parser.c
Original file line number Diff line number Diff line change
Expand Up @@ -259,8 +259,6 @@ static VALUE allocate(VALUE klass)
{
xmlSAXHandlerPtr handler = calloc((size_t)1, sizeof(xmlSAXHandler));

xmlSetStructuredErrorFunc(NULL, NULL);

handler->startDocument = start_document;
handler->endDocument = end_document;
handler->startElement = start_element;
Expand Down
11 changes: 11 additions & 0 deletions test/html/test_document_encoding.rb
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,17 @@ def binopen(file)
assert_equal(evil, ary_from_file)
end
end

describe "error handling" do
RAW = "<html><body><div"

{"read_memory" => RAW, "read_io" => StringIO.new(RAW)}.each do |flavor, input|
it "#{flavor} should handle errors" do
doc = Nokogiri::HTML.parse(input)
assert_operator(doc.errors.length, :>, 0)
end
end
end
end
end
end
Expand Down

0 comments on commit 74fa2f5

Please sign in to comment.