Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Need way to handle XML with entity references #5
Hi, I have an XML document that happens to contain references to entities in its DTD. For my use case, I don't care about interpreting them, but the references are still there. I get the tradeoff dxml makes in not supporting the DTD, but currently I can't use dxml to process this document at all because an
It would be useful to have a way to work around this case.
How about supporting a hook like
I don't mind submitting a PR, but I'd like to get your feedback first.
I didn't add support for it, because it wasn't clear to me from reading the XML spec that it was even possible to guarantee that skipping an entity when parsing it would result in a valid XML document (e.g. if it inserted a start tag but not an end tag). After some discussions about it in D.Announce, I think that it's guaranteed that any such entity has to be complete enough that skipping it won't screw up the rest of the document. And as such, what I'm probably going to do is add an option to
It's my intention to tackle this after I've finished the writer support, since that's almost done.
Either way, I don't see much point in adding support for trying to actually process entity references. If I made it possible to skip the entities, then in principle, a parser could parse the DTD, then use dxml to parse the rest of the document, and then process the entities in the document itself, but if you're going that far, you probably might as well just write the full parser rather than using dxml. Given that dxml doesn't parse the DTD, I think that the only options that make sense are to either throw when it encounters an entity reference (like it does now) or to just skip them and let the program using dxml either ignore them or try do something on its own to handle them if it really wants to. And I'm fine with making the second possible so long as it's not going to result in treating invalid XML documents as valid due to the fact that the entities weren't replaced with whatever they were supposed to be replaced with.