Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] v1.14.0 introduces inconsistency between CRuby and JRuby #2761

Closed
dometto opened this issue Jan 13, 2023 · 3 comments
Closed

[bug] v1.14.0 introduces inconsistency between CRuby and JRuby #2761

dometto opened this issue Jan 13, 2023 · 3 comments

Comments

@dometto
Copy link

dometto commented Jan 13, 2023

First off, thanks for working on nokogiri! The latest release looks like it has some great improvements. Unfortunately, here's a bug. :)

Since v1.14.0, the following code has different results on CRuby and JRuby (where CRuby shows the correct, and previous, behaviour):

require 'nokogiri'

data = "<div>\n<pre><code>one\ntwo\nthree\nfour\n</code></pre>\n</div>"
doc = Nokogiri::XML::DocumentFragment.parse(data)

opts = { :save_with => Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML ^ 1, :indent => 0, :encoding => 'UTF-8' }
puts doc.css("div").children.to_xml(opts).inspect

Result on CRuby: "\n<pre><code>one\ntwo\nthree\nfour\n</code></pre>\n"
Result on JRuby: "\n<pre>\n <code>one\ntwo\nthree\nfour\n</code>\n</pre>\n\n"

That is: on JRuby, a newline and two spaces are mistakenly added inside the <pre> tag, despite using :indent => 0 and despite the fact that :save_with has FORMAT disabled in both cases.

@dometto dometto added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Jan 13, 2023
@flavorjones
Copy link
Member

Thanks for reporting, I'll take a look after the weekend.

@flavorjones
Copy link
Member

@dometto Thanks again for reporting.

I think what's going on here is that you're toggling a bit in the SaveOptions bitmask, instead of explicitly turning it on or off, and because the underlying bits have changed, this is doing the opposite of what you want.

Let me explain what's going on, step by step ...

First, in JRuby Nokogiri v1.13.0:

RUBY_DESCRIPTION # => "jruby 9.4.0.0 (3.1.0) 2022-11-23 95c0ec159f OpenJDK 64-Bit Server VM 11.0.17+8-post-Ubuntu-1ubuntu220.04 on 11.0.17+8-post-Ubuntu-1ubuntu220.04 [x86_64-linux]"
Nokogiri::VERSION # => "1.13.10"

Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML # => 19
Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML ^ 1 # => 18

But in Nokogiri v1.14.0:

RUBY_DESCRIPTION # => "jruby 9.4.0.0 (3.1.0) 2022-11-23 95c0ec159f OpenJDK 64-Bit Server VM 11.0.17+8-post-Ubuntu-1ubuntu220.04 on 11.0.17+8-post-Ubuntu-1ubuntu220.04 [x86_64-linux]"
Nokogiri::VERSION # => "1.14.0"

Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML # => 18
Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML ^ 1 # => 19

The value of DEFAULT_XHTML changed from FORMAT | NO_DECLARATION | AS_XHTML to NO_DECLARATION | AS_XHTML in commit a0194a0 as an effort to clean up, standardize, and centralize how these flags are set and used.

Removing FORMAT seems to be what you were already doing with this code:

:save_with => Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML ^ 1

which with Nokogiri 1.13.10 turns off the FORMAT option, but with Nokogiri v1.14.0 turns on the FORMAT option.

A better way to do this would be to avoid toggling the bit with the ^ (bitwise XOR) and instead explicitly set the bit off:

Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML & (~Nokogiri::XML::Node::SaveOptions::FORMAT)

Putting it all together, here we see this is backwards-compatible with v1.13.x:

Nokogiri::VERSION # => "1.13.10"

data = "<div>\n<pre><code>one\ntwo\nthree\nfour\n</code></pre>\n</div>"
doc = Nokogiri::XML::DocumentFragment.parse(data)

opts = { :save_with => Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML ^ 1, :indent => 0, :encoding => 'UTF-8' }
# => {:save_with=>18, :indent=>0, :encoding=>"UTF-8"}
doc.css("div").children.to_xml(opts)
# => "\n" +
#    "<pre><code>one\n" +
#    "two\n" +
#    "three\n" +
#    "four\n" +
#    "</code></pre>\n"

opts = { :save_with => Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML & (~Nokogiri::XML::Node::SaveOptions::FORMAT), :indent => 0, :encoding => 'UTF-8' }
# => {:save_with=>18, :indent=>0, :encoding=>"UTF-8"}
doc.css("div").children.to_xml(opts)
# => "\n" +
#    "<pre><code>one\n" +
#    "two\n" +
#    "three\n" +
#    "four\n" +
#    "</code></pre>\n"

and works properly in Nokogiri v1.14.x:

Nokogiri::VERSION # => "1.14.0"

data = "<div>\n<pre><code>one\ntwo\nthree\nfour\n</code></pre>\n</div>"
doc = Nokogiri::XML::DocumentFragment.parse(data)

opts = { :save_with => Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML ^ 1, :indent => 0, :encoding => 'UTF-8' }
# => {:save_with=>19, :indent=>0, :encoding=>"UTF-8"}
doc.css("div").children.to_xml(opts)
# => "\n" +
#    "<pre>\n" +
#    "  <code>one\n" +
#    "two\n" +
#    "three\n" +
#    "four\n" +
#    "</code>\n" +
#    "</pre>\n" +
#    "\n"

opts = { :save_with => Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML & (~Nokogiri::XML::Node::SaveOptions::FORMAT), :indent => 0, :encoding => 'UTF-8' }
# => {:save_with=>18, :indent=>0, :encoding=>"UTF-8"}
doc.css("div").children.to_xml(opts)
# => "\n" +
#    "<pre><code>one\n" +
#    "two\n" +
#    "three\n" +
#    "four\n" +
#    "</code></pre>\n"

But really, since Nokogiri v1.14.0 is doing what you want already, you should be able to simplify to:

Nokogiri::VERSION # => "1.14.0"

opts = { :indent => 0, :encoding => 'UTF-8' }
# => {:indent=>0, :encoding=>"UTF-8"}
doc.css("div").children.to_xml(opts)
# => "\n" +
#    "<pre><code>one\n" +
#    "two\n" +
#    "three\n" +
#    "four\n" +
#    "</code></pre>\n"

I hope all that makes sense, and I apologize for the confusion caused by changing the value of this options mask. The intention behind the change to was to have better and more consistent defaults.

@dometto
Copy link
Author

dometto commented Jan 19, 2023

Thanks you for the very clear explanation and the help, @flavorjones! Much appreciated. ❤️ And apologies for not looking closer: I didn't notice the way I was setting the property was not robust relative to changes in the values of the constants nokogiri defines.

wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this issue Nov 12, 2023
5.2.4 / 2023-03-22

* Bugfix release: address XSS vulnerability ( @6661620a, @dometto)

5.2.3 / 2023-03-13

* Bugfix release: update adapter dependencies for Ruby 3.2 support.

5.2.2 / 2023-01-18

* Bugfix release: set Nokogiri default XHTML conversion options more
  relaibly. See sparklemotion/nokogiri#2761

5.2.1 / 2022-09-13

* Fixed: 'controls' attribute in audio and video tags should not be
  sanitized. #430 (@dometto)

5.2 / 2022-05-28

* Conditionally render "editable" heading classes. Resolves
  gollum/gollum#1785 (@benjaminwil)
* Improvement: Allow escaping quotations in quoted macro
  arguments. #406. (@srbaker)
* Improvement: [Allow for extended PlantUML
  types](https://github.com/gollum/gollum/wiki#plantuml-diagrams).
  #413. (@manofstick)
* API Addition: allow defining handlers for specific languages in
codeblocks. #410. (@dometto)

v5.1.2

* SECURITY UPDATE: sanitize HTML generated by Macros.

v5.0.1

* Bugfix release: fix emoji when using base path. Thanks to
  @heavywatal.

v5.0

For a detailed overview of changes in 5.0 and a guide to migrating
your wiki, see https://github.com/gollum/gollum/wiki/5.0-release-notes

* Removed support for Web Sequence Diagroms, PlantUML now default.
  ** PlantUML users in 4.x please note: in this release PlantUML uses
  the server at https://plantuml.com by default, not `localhost`. Use
  the config option to keep using your own server.

v4.2.1

* Performances improvements
* Dependency updates

ruby-gollum-lib: 5.2.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants