Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML Parsing fails with unescaped ampersand in content (not tag) #71

Open
drsharp opened this issue Jul 15, 2015 · 1 comment
Open

XML Parsing fails with unescaped ampersand in content (not tag) #71

drsharp opened this issue Jul 15, 2015 · 1 comment

Comments

@drsharp
Copy link

drsharp commented Jul 15, 2015

If I have XML like this:

<?xml version="1.0" encoding="UTF-8" ?>
<outer>
  <inner>
    <before>data before</before>
    <data>Some & More</data>
    <after>here is after</after>
  </inner>
</outer>

and try to parse it like this:

xml = File.read("bad.xml")
result = Nori.new.parse(xml)

I get this:

{
    "data" => "Some  More\n        here is after\n  \n"
}

Which is clearly wrong. If I change the & into & it parses just fine:

<?xml version="1.0" encoding="UTF-8" ?>
<outer>
  <inner>
    <before>data before</before>
    <data>Some &amp; More</data>
    <after>here is after</after>
  </inner>
</outer>
{
    "outer" => {
        "inner" => {
            "before" => "data before",
              "data" => "Some & More",
             "after" => "here is after"
        }
    }
}

Why can't I use a raw & in the content? That seems to be a bug, right?

@drsharp
Copy link
Author

drsharp commented Jul 23, 2015

My bad... I didn't know my XML validation well enough. Apparently a "naked ampersand" is invalid. It either needs to be part of an HTML encoding (like < for example) or it needs to be encoded itself (as in: & ). So this isn't a Nori issue at all.

However, I wonder if Nori should do something other than just try and parse, because the result was really not what it should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant