Skip to content

replace_entities ignored for SAX parser (nokogiri 1.5.0-java / jruby 1.6.6) #614

@matadon

Description

@matadon

I'm attempting to use a SAX parser on a very large XML document with a custom DTD, and don't want to replace entities with their DTD-defined values.

I can't see a way with Nokogiri::XML::SAX::Document to specify ParserOptions (to give a go with Nokogiri::XML::ParseOptions::NOENT), as the initializer for SAX::Document doesn't seem to take any arguments, and when I try to set replace_entities, it just seems to be ignored:

 file = "myfile.xml"
document = NokogiriXMLSAXDocumentSubclass.new
parser = Nokogiri::XML::SAX::Parser.new(document)
parser.parse_file(file) do |context|
    context.replace_entities = false
end

Attempting to parse XML with this throws an error:

The entity "entityname" was referenced, but not declared.

Which seems to indicate that entities are being processed.

Digging into the extension code, I see a private IRubyObject replaceEntities; in ext/java/nokogiri/XmlSaxParserContext.java, and this appears to get set by the 'replace_entities' methods, but from there I can't see how replaceEntities gets applied -- there's nothing about it in the Xerces docs, and it's not referenced in any other file.

Am I missing something?

If not, maybe the way to go about this is to use setFeature on the Xerces parser? Looking at https://xerces.apache.org/xerces2-j/features.html, using one of these might do the trick:

http://xml.org/sax/features/use-entity-resolver2
http://apache.org/xml/features/validation/unparsed-entity-checking

I'll have a go with these later on, assuming nobody chimes in to point out where I missed something...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions