Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should unescaped XML raise an error? #40

Closed
burlesona opened this issue Mar 7, 2013 · 2 comments
Closed

Should unescaped XML raise an error? #40

burlesona opened this issue Mar 7, 2013 · 2 comments
Labels

Comments

@burlesona
Copy link

I assume this is expected behavior since xml should escape ampersands, but I was surprised by the output:

irb(main):007:0> xml = Nori.new
=> #<Nori:0x007fa2ab5caaf8 @options={:strip_namespaces=>false, :convert_tags_to=>nil, :advanced_typecasting=>true, :parser=>:nokogiri}>

irb(main):008:0> xml.parse "<outer><test>Hello&Goodbye</test></outer>"
=> {"test"=>"Hello"}

irb(main):009:0> xml = Nori.new :advanced_typecasting => false
=> #<Nori:0x007fa2ac85ef98 @options={:strip_namespaces=>false, :convert_tags_to=>nil, :advanced_typecasting=>false, :parser=>:nokogiri}>

irb(main):010:0> xml.parse "<outer><test>Hello&Goodbye</test></outer>"
=> {"test"=>"Hello"}

irb(main):011:0> xml = Nori.new :parser => :nokogiri
=> #<Nori:0x007fa2ab8f66a0 @options={:strip_namespaces=>false, :convert_tags_to=>nil, :advanced_typecasting=>true, :parser=>:nokogiri}>

irb(main):012:0> xml.parse "<outer><test>Hello&Goodbye</test></outer>"
=> {"test"=>"Hello"}

If the ampersand is escaped it works:

irb(main):013:0> xml = Nori.new
=> #<Nori:0x007fa2abe82150 @options={:strip_namespaces=>false, :convert_tags_to=>nil, :advanced_typecasting=>true, :parser=>:nokogiri}>

irb(main):014:0> xml.parse "<outer><test>Hello&amp;Goodbye</test></outer>"
=> {"outer"=>{"test"=>"Hello&Goodbye"}}

I would have expected this to raise an error, rather than return a bad result. Would be interested to hear any comments. Thanks!

@robuye
Copy link
Contributor

robuye commented Mar 8, 2013

it works as expected with REXML.

 Nori.new(parser: :rexml).parse("<outer><test>Hello&Goodbye</test></outer>")
# => {"outer"=>{"test"=>"Hello&Goodbye"}}

Its Nokogiri not handling the text properly.

Nokogiri::XML('<outer><test>Hello&Goodbye</test></outer>')
=> #<Nokogiri::XML::Document:0x157ade0 name="document" children=[#<Nokogiri::XML::Element:0x1580790 name="outer" children=[#<Nokogiri::XML::Element:0x157f87c name="test" children=[#<Nokogiri::XML::Text:0x157eb84 "Hello">]>]>]>

@rubiii
Copy link
Contributor

rubiii commented Mar 29, 2013

@robuye interesting find! just installed nokogiri v1.5.9 and i can verify the problem.
i would suggest to open an issue for nokogiri.

@rubiii rubiii closed this as completed Mar 29, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants