New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML declaration is incorrectly handled as processing instruction #3

Closed
ghost91- opened this Issue Feb 13, 2018 · 8 comments

Comments

2 participants
@ghost91-

ghost91- commented Feb 13, 2018

The following program crashes:

#!/usr/bin/env dub
/+ dub.sdl:
	name "bug"
	dependency "dxml" version="~>0.2.3"
+/

import dxml.parser;

void main()
{
	`<?xml version="1.0" encoding="UTF-8"?><root></root>`.parseXML();
}

Compile it with dub run --single bug.d where bug.d is a file containing the above code. Here is the resulting error messages:

johannesloher@saiph:..lopment/xml/source> dub run --single bug.d
Performing "debug" build using /usr/bin/dmd for x86_64.
dxml 0.2.3: target for configuration "library" is up to date.
bug ~master: building configuration "application"...
Linking...
To force a rebuild of up-to-date targets, run again with --force.
Running ./bug 
dxml.parser.XMLParsingException@../../../.dub/packages/dxml-0.2.3/dxml/source/dxml/parser/package.d(1985): [2:5]: Processing instructions cannot be named xml
----------------
??:? pure @safe void dxml.parser.EntityRange!(dxml.parser.Config(0, 0, 0), immutable(char)[]).EntityRange._parsePI() [0xa7308887]
??:? @safe void dxml.parser.EntityRange!(dxml.parser.Config(0, 0, 0), immutable(char)[]).EntityRange._parseAtPrologMisc!(1)._parseAtPrologMisc() [0xa7314751]
??:? @safe void dxml.parser.EntityRange!(dxml.parser.Config(0, 0, 0), immutable(char)[]).EntityRange._parseDocumentStart() [0xa73085bd]
??:? @safe void dxml.parser.EntityRange!(dxml.parser.Config(0, 0, 0), immutable(char)[]).EntityRange.popFront() [0xa73082c7]
??:? ref @safe dxml.parser.EntityRange!(dxml.parser.Config(0, 0, 0), immutable(char)[]).EntityRange dxml.parser.EntityRange!(dxml.parser.Config(0, 0, 0), immutable(char)[]).EntityRange.__ctor(immutable(char)[]) [0xa7309c50]
??:? @safe dxml.parser.EntityRange!(dxml.parser.Config(0, 0, 0), immutable(char)[]).EntityRange dxml.parser.parseXML!(dxml.parser.Config(0, 0, 0), immutable(char)[]).parseXML(immutable(char)[]) [0xa7307825]
??:? _Dmain [0xa7306eb8]
Program exited with code 1

The problem disappears when using parseXML!simpleXML(), probably because processing instructions are then skipped alltogether.

@jmdavis

This comment has been minimized.

Owner

jmdavis commented Feb 13, 2018

Well, that's weird, given that there are tests with the XML declaration in them, and this loots pretty straightforward. Well, clearly, I'll have to dig into it and figure out what I screwed up.

@jmdavis

This comment has been minimized.

Owner

jmdavis commented Feb 13, 2018

I can't reproduce this locally - either with your dub example program or by taking the code and putting it in dxml's unit tests.

Based on the exception you're getting, it looks like the entity that the parser is complaining about is on the second line, with the name at the 5th column, whereas your example has it at the start of the first line. So, is the example you gave really what you used to generate that stack trace? And what kind of system are you on? In theory, that shouldn't matter, but there's clearly something different between what you have and what I have if we're running the same example, and it works for me but doesn't work for you. What version of dmd are you using? What version of dub?

@ghost91-

This comment has been minimized.

ghost91- commented Feb 13, 2018

Hm, after trying my minimal example again, it seems to work correctly. However, I have a more complicated example which still fails. I will try to narrow it down and post the result here, once I am done.

For reference, I am using Arch Linux (kernel 4.15.2-2-ARCH), dmd v2.078.2 and dub v1.7.2.

@jmdavis

This comment has been minimized.

Owner

jmdavis commented Feb 13, 2018

Well, the exception message does include the line and column number of where it thinks the XML is wrong, so at least in theory, it should be possible to create a fairly minimal XML document using that information - especially since this is at the start of the file. And with how dxml's parser works, it can't have parsed much past the point that it's referring to.

@ghost91-

This comment has been minimized.

ghost91- commented Feb 13, 2018

I think I figured out what the issue was: In my tests, I had some whitespace before the xml declaration (which, as I just learned, seems to be illegal). The failing code is actually the following:

#!/usr/bin/env dub
/+ dub.sdl:
	name "bug"
	dependency "dxml" version="~>0.2.3"
+/

import dxml.parser;

void main()
{
    `
    <?xml version="1.0" encoding="UTF-8"?>
    <root>
    </root>
    `.parseXML();
}

So the behaviour of dxml is actually correct, however I think the error message could be improved :) Something along the lines of "found xml declaration at an illegal location"

So sorry for taking up your time because of my lack of knowledge about XML :/

@jmdavis

This comment has been minimized.

Owner

jmdavis commented Feb 13, 2018

dxml actually has quite a few tests that verify that the TextPos is correct in error messages, but I'm sure that it could use more work towards improving the messages themselves (e.g. removing all of the cases where you get something like "prematurely reached end of document" is on the todo list), and I agree that this one could and should be better.

@jmdavis

This comment has been minimized.

Owner

jmdavis commented Feb 13, 2018

Okay. It's fixed in master now so that you should see "[1:1] Cannot have whitespace before the <?xml...?> declaration". It'll go in 0.3.0, and if there are more serious issues (or enough smaller improvements like this) such that I release a 0.2.4, then it will go in there.

@ghost91-

This comment has been minimized.

ghost91- commented Feb 13, 2018

Thanks a lot for that!

jmdavis added a commit that referenced this issue Apr 18, 2018

Rename dxml.util.normalize to dxml.util.decodeXML.
#3

normalize is now decodeXML and asNormalized is now asDecodedXML. The old
names are now deprecated aliases and are scheduled to be removed in dxml
0.4.0.

The new names fit better with the newly added writer functionality
(denormalize didn't seem like a great function name) and don't risk
conflicting with std.uni.normalize (which stands a fairly high chance of
being used in conjuction with dxml and thus would result in symbol
conflicts).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment