Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XMLOutputter removes newlines between attributes #153

Closed
lobodpav opened this issue Apr 4, 2016 · 3 comments
Closed

XMLOutputter removes newlines between attributes #153

lobodpav opened this issue Apr 4, 2016 · 3 comments

Comments

@lobodpav
Copy link

lobodpav commented Apr 4, 2016

When reading an XML which has newlines between attributes, the XMLOutputter does not preserve these even when RAW format is used.

XML source:

<Field name = "foo"
      label = "I am foo"
      width = "100px"/>

The resulting XML after parsing and outputting:

<Field name = "foo" label = "I am foo" width = "100px"/>

Sample Groovy code:

File inFile = new File("test.xml")
File outFile = new File("test-JDOM.xml")

SAXBuilder builder = new SAXBuilder()
Document document = builder.build(inFile)

Format format = Format.getRawFormat();
format.setTextMode(Format.TextMode.PRESERVE)
XMLOutputter outputter = new XMLOutputter(format)

outFile.withWriter { fileWriter ->
    outputter.output(document, fileWriter)
}
@rolfl
Copy link
Collaborator

rolfl commented Apr 4, 2016

The XML Specification gives no special value to whitespace between attriibutes in an Element Start Tag: https://www.w3.org/TR/xml/#sec-starttags (In fact, even the order of attributes is declared to be insignificant).

This , in part, carries through to both the SAX and DOM parsing specifications where XML parsers (like the xerces parser built in to Java) completely ignore the whitespace, and do not report it, when parsing an XML document. Note that the StartElement SAX method simply lists the attributes, and not the space between them: http://docs.oracle.com/javase/8/docs/api/org/xml/sax/ContentHandler.html#startElement-java.lang.String-java.lang.String-java.lang.String-org.xml.sax.Attributes-

The attributes are recorded with a simple Attributes instance: http://docs.oracle.com/javase/8/docs/api/org/xml/sax/Attributes.html

Because the XML specification gives no significance to space in Elements, and because no standard XML parsers exist which will actually report the space between attributes, the JDOM code has never been written to input this space. As a result, it does not output it either. Further, there is no way in JDOM to manipulate (add, remove, change) the space between attributes programmatically.

Note that the document is semantically identical with large amounts, or just a single space between attributes. Further, the document is semantically identical even if the order of the attributes changes (though JDOM does maintain the order of input attributes, though it will re-order the XML Namespace Declarations, if any)

There are no (standards-conforming) parsers, or any other Java XML libraries (like JDOM) I know of, that will report these specific spaces for you.

The "PRESERVE" format in JDOM refers specifically to the whitespace inside of Element tags (between start/end pairs). JDOM does handle that process (a standard one), correctly.

@rolfl rolfl closed this as completed Apr 4, 2016
@lobodpav
Copy link
Author

lobodpav commented Apr 4, 2016

Thanks for the explanation. This was something I wasn't sure about - what exactly was meant by PRESERVE.

@rolfl
Copy link
Collaborator

rolfl commented Apr 4, 2016

"Preserve" has a special meaning in XML ( https://www.w3.org/TR/xml/#sec-white-space ) where there is a special XML attribute <sometag ..... xml:space="preserve" ...> that can be set. A conforming parser/system should accurately maintain any whitespace inside an element with the "preserve" attribute set. JDOM does/honours this attribute. It can also be set (with the XMLOutputter's Preserve format) to do it for all elements, not just the ones that are specially marked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants