-
-
Notifications
You must be signed in to change notification settings - Fork 930
Description
I'm attempting to use a SAX parser on a very large XML document with a custom DTD, and don't want to replace entities with their DTD-defined values.
I can't see a way with Nokogiri::XML::SAX::Document to specify ParserOptions (to give a go with Nokogiri::XML::ParseOptions::NOENT), as the initializer for SAX::Document doesn't seem to take any arguments, and when I try to set replace_entities, it just seems to be ignored:
file = "myfile.xml"
document = NokogiriXMLSAXDocumentSubclass.new
parser = Nokogiri::XML::SAX::Parser.new(document)
parser.parse_file(file) do |context|
context.replace_entities = false
end
Attempting to parse XML with this throws an error:
The entity "entityname" was referenced, but not declared.
Which seems to indicate that entities are being processed.
Digging into the extension code, I see a private IRubyObject replaceEntities; in ext/java/nokogiri/XmlSaxParserContext.java, and this appears to get set by the 'replace_entities' methods, but from there I can't see how replaceEntities gets applied -- there's nothing about it in the Xerces docs, and it's not referenced in any other file.
Am I missing something?
If not, maybe the way to go about this is to use setFeature on the Xerces parser? Looking at https://xerces.apache.org/xerces2-j/features.html, using one of these might do the trick:
http://xml.org/sax/features/use-entity-resolver2
http://apache.org/xml/features/validation/unparsed-entity-checking
I'll have a go with these later on, assuming nobody chimes in to point out where I missed something...