Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XmlSaxPushParser on JRuby and MRI disagree on how it handles empty input #1758

Open
jvshahid opened this issue May 14, 2018 · 1 comment
Open

Comments

@jvshahid
Copy link
Member

What problems are you experiencing?

the output from the following script is different on JRuby and MRI.

#!/usr/bin/env ruby

require 'nokogiri'

class Doc < Nokogiri::XML::SAX::Document
  attr_reader :errors
  def error error
    (@errors ||= []) << error
    super
  end
end
parser = Nokogiri::XML::SAX::PushParser.new(Doc.new)
parser.finish
puts parser.document.errors.inspect

on MRI

/home/jvshahid/codez/nokogiri/lib/nokogiri/xml/sax/push_parser.rb:47:in `native_write': 1:1: FATAL: Extra content at the end of the document (Nokogiri::XML::SyntaxError)
        from /home/jvshahid/codez/nokogiri/lib/nokogiri/xml/sax/push_parser.rb:47:in `write'
        from /home/jvshahid/codez/nokogiri/lib/nokogiri/xml/sax/push_parser.rb:55:in `finish'

on JRuby

nil

What's the output from nokogiri -v?

Below is the output for JRuby and MRI

# Nokogiri (1.8.2)
    ---
    warnings: []
    nokogiri: 1.8.2
    ruby:
      version: 2.3.3
      platform: java
      description: jruby 9.1.16.0 (2.3.3) 2018-02-21 8f3f95a OpenJDK 64-Bit Server VM
        9-internal+0-2016-04-21-232247.buildd.src on 9-internal+0-2016-04-21-232247.buildd.src
        +jit [linux-x86_64]
      engine: jruby
      jruby: 9.1.16.0
    xerces: Xerces-J 2.11.0
    nekohtml: NekoHTML 1.9.21
# Nokogiri (1.8.2)
    ---
    warnings:
    - Nokogiri was built against LibXML version 2.9.7, but has dynamically loaded 2.9.8
    nokogiri: 1.8.2
    ruby:
      version: 2.2.3
      platform: x86_64-linux
      description: ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-linux]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/home/jvshahid/codez/nokogiri/ports/x86_64-pc-linux-gnu/libxml2/2.9.8"
      libxslt_path: "/home/jvshahid/codez/nokogiri/ports/x86_64-pc-linux-gnu/libxslt/1.1.32"
      libxml2_patches:
      - 0001-Revert-Do-not-URI-escape-in-server-side-includes.patch
      libxslt_patches: []
      compiled: 2.9.7
      loaded: 2.9.8

Can you provide a self-contained script that reproduces what you're seeing?

provided above

@flavorjones
Copy link
Member

@jvshahid I looked at the libxml2 implementation here (xmlParseChunk) and there isn't any way to affect that behavior. We could count how many bytes we've seen and if 0 then suppress that particular error from xmlParseChunk ... but that seems like a lot of work for an edge case. I'm wondering what you think an alternative solution might look like?

@flavorjones flavorjones removed this from the v1.10.x patch releases milestone Jan 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants