Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConstructingParser does not tolerate start of file whitespace #532

Open
mbeckerle opened this issue Jun 1, 2021 · 1 comment
Open

ConstructingParser does not tolerate start of file whitespace #532

mbeckerle opened this issue Jun 1, 2021 · 1 comment

Comments

@mbeckerle
Copy link

We use the constructing parser so as to get file/line/column information added to parsed XML, as well as for proper handling of CDATA regions.

However, we've encountered some things where we have had to add flexibility.

In particular we discovered that it requires the first character of an XML file to be "<" starting either an XML prolog or a comment, DTD, or the root element.

We have numerous XML files that begin with whitespace. E.g., a blank line, after which are comments, other ProcInstrs, etc.

We also have numerous XML files that begin with "<?xml" but where that is NOT an XML Prolog. As in

<?xml-model href="...." ... ?>

These things are all tolerated by standard Xerces.

So we've enhanced the constructing parser to be tolerant of these things.

Our constructing parser method overloads are all in this file:

project: https://github.com/apache/daffodil

file: daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala

I can create a PR with suggested changes, but before doing so wanted to run the whole idea past the maintainers of scala-xml. Is there a reason it should not be enhanced in this way?

@ashawley
Copy link
Member

ashawley commented Sep 9, 2021

I don't see why it shouldn't be able to be supported. Please feel free to submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants