JDOM2 Feature Attribute Specified

paulk-asert edited this page Apr 9, 2012 · 2 revisions

JDOM Contrib has previously had code which parses a DTD document, and identifies those Attributes in the DTD which are 'defaulted' in the input. It has matching code which hooks in to the XMLOutputter to exclude those Attributes from the output if the attribute is simply a defaulted input value. The logic is that the output will have reference to the DTD, and any application which processes the output XML will automatically have the defaulted Attributes added back in again. The benefit is that the output XML is much smaller, cleaner, and 'readable'.

There are a number of drawbacks to this system:

  1. It requires parsing a DTD document
  2. It requires managing the Attributes on the output side.
  3. The indicator of whether an Attribute is a default value or not is not attached to the attribute.
  4. It applies to DTD sources only.

JDOM2 introduces a new flag on the Attribute class, accessed through Attribute.isSpecified() and Attribute.setSpecified().

When documents are parsed through the SAX parsing process, the JDOM handler will be informed of whether the Attribute was specified in the XML or not (see Attributes2.isSpecified(int) ) and the handler will set the Attribute.isSpecified() flag to true for those values that were specified in the input XML, and the flag will be false for those values 'inferred' from the DTD. The same appears to be true for Attributes inferred from XML Schema attribute specifications (default/fixed), but I cannot find authoritative documentation on whether there is a guarantee for all SAX2 parsers, or whether it is just Xerces that sets the specified flags correctly for the XSD-sourced attribute definitions.

Finally, the JDOM2 output code has been adapted to optionally honour the isSpecified flag on the Attributes. When outputting a JDOM Document through one of the outputters, you can set the Format.setSpecifiedAttributesOnly() to true and the Outputter will then only output those Attributes where isSpecified() returns true (the Attribute was part of the XML, not inferred from the DTD).

The end result is that:

  1. There is additional useful information on all Attribute instances.
  2. The existing DTD parsers inside the SAX parsers are used instead of relying on an external DTD parse.
  3. The code works for DTD and XML Schema also
  4. There is no need for custom code to 'cull' the inferred Attributes from the output