Skip to content

Reader outer_xml is decoding escaped entities in JRuby #1523

@jcronk

Description

@jcronk

Example:

require 'nokogiri'

reader = Nokogiri::XML::Reader(File.open 'entities.xml')
reader.each do |node|
  puts node.outer_xml if node.node_type == 1 && node.name == 'Main'
end

using this XML:

<Root>
    <Main>
        <Element>This has &amp; in it</Element>
    </Main>
    <Main>
        <Element>This has &gt; in it</Element>
    </Main>
    <Main>
        <Element>This has &lt; in it</Element>
    </Main>
</Root>

Produces this output:

<Main>
        <Element>This has & in it</Element>
    </Main>
<Main>
        <Element>This has > in it</Element>
    </Main>
<Main>
        <Element>This has < in it</Element>
    </Main>

This makes it impossible to pass outer_xml to Nokogiri::XML() because with strict mode on, it will throw an error, and with strict off, it will drop parts of the XML. I tested this on jruby-1.7.24 and jruby-9.0.5.0.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions