v1.11.0.rc4 / 2020-12-29
Pre-releasev1.11.0.rc4 / 2020-12-29
Latest is v1.11.0.rc4 (2020-12-29). To try out release candidates, use gem install --prerelease or gem install nokogiri -v1.11.0.rc4
If you're using bundler, try updating your Gemfile with:
gem "nokogiri", "~> 1.11.0.rc4"`Delta since v1.11.0.rc3:
Notes
- Added precompiled native gem support for Darwin (OSX) platform
arm64-darwin
Dependencies
Ruby
- End of support for Ruby 2.4, for which official support ended on 2020-04-05
Gems
Security
See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".
Performance
- [CRuby] The CSS
~=operator and class selector.are about 2x faster. [#2137, #2135] - [CRuby] Patch libxml2 to call
strlenfromxmlStrlenrather than the naive implementation, becausestrlenis generally optimized for the architecture. [#2144] (Thanks, @ilyazub!) - Always compile libxml2 and libxslt with '-O2' [#2022, #2100] (Thanks, @ilyazub!)
- [CRuby]
RelaxNG.from_documentno longer leaks memory. [#2114]
Improved
- [CRuby] Handle incorrectly-closed HTML comments as WHATWG recommends for browsers. [#2058] (Thanks to HackerOne user mayflower for reporting this!)
- {HTML,XML}::Document#parse now accept
Pathnameobjects. Previously this worked only if the referenced file was less than 4096 bytes long; longer files resulted in undefined behavior because thereadmethod would be repeatedly invoked. [#1821, #2110] (Thanks, @doriantaylor and @phokz!) - [CRuby] Nokogumbo builds faster because it can now use header files provided by Nokogiri. [#1788] (Thanks, @stevecheckoway!)
- [JRuby] Clean up deprecated calls into JRuby. [#2027] (Thanks, @headius!)
Fixed
- HTML Parsing in "strict" mode (i.e., the
RECOVERparse option not set) now correctly raises aXML::SyntaxErrorexception. Previously the value of theRECOVERbit was being ignored by CRuby and was misinterpreted by JRuby. [#2130] - The CSS
~=operator now correctly handles non-space whitespace in theclassattribute. commit e45dedd - The Node methods
add_previous_sibling,previous=,before,add_next_sibling,next=,after,replace, andswapnow correctly use their parent as the context node for parsing markup. These methods now also raise aRuntimeErrorif they are called on a node with no parent. [nokogumbo#160] - [JRuby] XML::Schema XSD validation errors are captured in
XML::Schema#errors. These errors were previously ignored. - [JRuby] Fix how custom XPath function namespaces are inferred to be less naive. [#1890, #2148]
- [JRuby] Clarify exception message when custom XPath functions can't be resolved.
- [JRuby] Comparison of Node to Document with
Node#<=>now matches CRuby/libxml2 behavior. - [CRuby] Syntax errors are now correctly captured in
Document#errorsfor short HTML documents. Previously the SAX parser used for encoding detection was clobbering libxml2's global error handler. - [CRuby] On some platforms, avoid symbol name collision with glibc's
canonicalize. [#2105] - [CRuby] Fixed Nokogumbo integration which broke in the v1.11.0 release candidates. [#1788] (Thanks, @stevecheckoway!)
- [JRuby] Fixed document encoding regression in v1.11.0 release candidates. [#2080, #2083] (Thanks, @thbar!)
Changed
XML::Schema input is now "untrusted" by default
Address CVE-2020-26247.
In Nokogiri versions <= 1.11.0.rc3, XML Schemas parsed by Nokogiri::XML::Schema were trusted by default, allowing external resources to be accessed over the network, potentially enabling XXE or SSRF attacks.
This behavior is counter to the security policy intended by Nokogiri maintainers, which is to treat all input as untrusted by default whenever possible.
Please note that this security fix was pushed into a new minor version, 1.11.x, rather than a patch release to the 1.10.x branch, because it is a breaking change for some schemas and the risk was assessed to be "Low Severity".
More information and instructions for enabling "trusted input" behavior in v1.11.0.rc4 and later is available at the public advisory.
HTML parser now obeys the strict or norecover parsing option
(Also noted above in the "Fixed" section) HTML Parsing in "strict" mode (i.e., the RECOVER parse option not set) now correctly raises a XML::SyntaxError exception. Previously the value of the RECOVER bit was being ignored by CRuby and was misinterpreted by JRuby.
If you're using the default parser options, you will be unaffected by this fix. If you're passing strict or norecover to your HTML parser call, you may be surprised to see that the parser now fails to recover and raises a XML::SyntaxError exception. Given the number of HTML documents on the internet that libxml2 would consider to be ill-formed, this is probably not what you want, and you can omit setting that parse option to restore the behavior that you have been relying upon.
Apologies to anyone inconvenienced by this breaking bugfix being present in a minor release, but I felt it was appropriate to introduce this fix because it's straightforward to fix any code that has been relying on this buggy behavior.
VersionInfo, the output of nokogiri -v, and related constants
This release changes the metadata provided in Nokogiri::VersionInfo which also affects the output of nokogiri -v. Some related constants have also been changed. If you're using VersionInfo programmatically, or relying on constants related to underlying library versions, please read the detailed changes for Nokogiri::VersionInfo at #2139 and accept our apologies for the inconvenience.