Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Inconsistent behavior on JRuby/CRuby #2355

Closed
dometto opened this issue Nov 6, 2021 · 5 comments · Fixed by #2356
Closed

[bug] Inconsistent behavior on JRuby/CRuby #2355

dometto opened this issue Nov 6, 2021 · 5 comments · Fixed by #2356
Milestone

Comments

@dometto
Copy link

dometto commented Nov 6, 2021

The following snippet produces different results on JRuby and CRuby:

require 'nokogiri'
data = "<h1>First H1</h1>"
doc = Nokogiri::HTML::DocumentFragment.parse(data)
header = doc.css('h1').first
a = Nokogiri::XML::Node.new('a', doc)
header.children.before(a)
puts header.to_xml({:save_with => Nokogiri::XML::Node::SaveOptions::DEFAULT_XHTML ^ 1})

On CRuby, it produces what one would expect:

<h1><a></a>First H1</h1>

However, on JRuby, it producs the following invalid XHTML:

<h1><a>First H1</h1>

Environment

CRuby:

# Nokogiri (1.12.5)
    ---
    warnings: []
    nokogiri:
      version: 1.12.5
      cppflags:
      - "-I/Users/redacted/.rvm/gems/ruby-2.6.3/gems/nokogiri-1.12.5-x86_64-darwin/ext/nokogiri"
      - "-I/Users/redacted/.rvm/gems/ruby-2.6.3/gems/nokogiri-1.12.5-x86_64-darwin/ext/nokogiri/include"
      - "-I/Users/redacted/.rvm/gems/ruby-2.6.3/gems/nokogiri-1.12.5-x86_64-darwin/ext/nokogiri/include/libxml2"
      ldflags: []
    ruby:
      version: 2.6.3
      platform: x86_64-darwin18
      gem_platform: x86_64-darwin-18
      description: ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-darwin18]
      engine: ruby
    libxml:
      source: packaged
      precompiled: true
      patches:
      - 0001-Remove-script-macro-support.patch
      - 0002-Update-entities-to-remove-handling-of-ssi.patch
      - 0003-libxml2.la-is-in-top_builddir.patch
      - 0004-use-glibc-strlen.patch
      - 0005-avoid-isnan-isinf.patch
      - 0006-update-automake-files-for-arm64.patch
      - 0007-Fix-XPath-recursion-limit.patch
      libxml2_path: "/Users/redacted/.rvm/gems/ruby-2.6.3/gems/nokogiri-1.12.5-x86_64-darwin/ext/nokogiri"
      memory_management: ruby
      iconv_enabled: true
      compiled: 2.9.12
      loaded: 2.9.12
    libxslt:
      source: packaged
      precompiled: true
      patches:
      - 0001-update-automake-files-for-arm64.patch
      - 0002-Fix-xml2-config-check-in-configure-script.patch
      datetime_enabled: true
      compiled: 1.1.34
      loaded: 1.1.34
    other_libraries:
      zlib: 1.2.11
      libiconv: '1.15'
      libgumbo: 1.0.0-nokogiri

JRuby:

# Nokogiri (1.12.5)
    ---
    warnings: []
    nokogiri:
      version: 1.12.5
    ruby:
      version: 2.5.8
      platform: java
      gem_platform: universal-java-11
      description: jruby 9.2.18.0 (2.5.8) 2021-06-08 d67cb7d6e0 OpenJDK 64-Bit Server
        VM 11.0.13+8 on 11.0.13+8 +jit [darwin-x86_64]
      engine: jruby
      jruby: 9.2.18.0
    other_libraries:
      xerces: Xerces-J 2.12.0
      nekohtml: NekoHTML 1.9.21

Additional context

This bug was not present on version 1.11.7

@dometto dometto added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Nov 6, 2021
flavorjones added a commit that referenced this issue Nov 6, 2021
The JRuby code for SaveContextVisitor is brittle and clearly
buggy (since this bug exists for fragment nodes but not fragments or
document nodes?), but this seems like an easy fix.

Fixes #2355
@flavorjones
Copy link
Member

Thanks for reporting this, and sorry you ran into this problem.

The JRuby code for serialization is pretty old, apparently buggy, relatively unmaintained, and not well tested. I also don't have a ton of time or interest in keeping the JRuby implementation going.

That said, this seems like an easy fix and #2356 is a PR that should address it.

@flavorjones
Copy link
Member

I'm reading gollum/gollum#1779 and now I'm confused as to what could possibly have introduced this bug in Nokogiri since this code hasn't really been touched in a long time. I'll keep digging.

@flavorjones
Copy link
Member

A git bisect leads me to 7376cbd as the commit that introduced this behavior change. This is an interesting chain of events:

  • libxml 2.9.11 fixed a long-standing serialization bug in libxml2
  • which exposed a long-standing serialization bug (1d06b4f) in nokogiri's CRuby extension
  • which, when fixed by 7376cbd, exposed this long-standing serialization bug in nokogiri's JRuby extension

I think #2356 is the right fix here, to make JRuby's XHTML serialization follow the spec when NO_EMPTY_TAGS is not set.

@flavorjones
Copy link
Member

Note that 7376cbd (aka a0180c7) was introduced in v1.12.5, so v1.12.4 and earlier should not have this problem.

@dometto
Copy link
Author

dometto commented Nov 6, 2021

thanks for the quick explanation and fix @flavorjones!

@flavorjones flavorjones removed this from the v1.12.x patch releases milestone Dec 24, 2021
@flavorjones flavorjones removed the state/needs-triage Inbox for non-installation-related bug reports or help requests label Jan 9, 2022
@flavorjones flavorjones added this to the v1.13.0 milestone Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants