JDOM2 Feature Namespaces In Scope

paulk-asert edited this page Apr 9, 2012 · 3 revisions

Namespaces create a fair amount of confusion, and a number of different mechanisms existed in the code to locate and process Namespace data. These numerous internal mechanisms were not available to JDOM users, resulting in the users having to implement similar functionality in their code. These disparate mechanisms can now be replaced by just two 'standard' mechanisms:

  • There is a new interface org.jdom2.NamespaceAware which has three methods List<Namespace> getNamespacesInScope(), List<Namespace> getNamespacesInherited(), and List<Namespace> getNamespacesIntroduced(). These detail those Namespaces which are in scope, which of the in-scope Namespaces are inherited from the Parent scope, and which are introduced by the content. These three methods are 'dynamic', calculating the values as the method is called. This means that it can be slow to call this method often. The Document, Attribute, and Content classes all implement this interface (which also implies that all the JDOM objects are Namespace-Aware - Element, Text, etc.).
  • New class org.jdom2.util.NamespaceStack (which is a java.util.Iterable instance) has been introduced which allows for the JDOM content to be processed in a batch. This is more efficient than the getNamespacesInScope() method calls, but it needs to be maintained as the document is traversed. There are two methods used for maintaining the namespaceStack - push(Element) and pop(). The actual namespaces in scope can then be queried using the iterator() method, as well as the addedForward() and addedReverse() methods.

In addition to centralizing the Namespace handling of JDOM content, JDOM now has a reliable and consistent mechanism for ordering Namespace values. Namespaces are always accessed from the perspective of some Content, for example, an Element. In this case (Element), the order of the Namespaces in scope will be: first the Element's Namespace, followed by the remaining Namespaces in alphabetical order by prefix.

All JDOM processes that expose Namespaces will use the above system, thus, all iterators, lists, and XML output will output the Namespaces in that order.

When outputting JDOM in some other format, the Namespace declarations will always be output before any Attributes for the Element.

It is worth noting that in JDOM 1.x that Namespaces are not centrally coordinated, so to identify the Namespaces in scope on an Element you would have to inspect 5 places:

  1. the Element's Namespace
  2. each of the Element's Attributes
  3. Any additional namespaces declared explicitly for the Element
  4. any Namespace set on the Element's parent, but not redeclared by something on this Element
  5. any of the default namespaces ( the default and xml namspaces).

The getNamespacesInScope() method does exactly the above, so it requires a scan of the Element and it's ancestry. Do not use it in 'tight' loops or performance critical code. Use the NamespaceStack implementation instead.

The getNamespacesIntroduced() method returns a subset of the getNamespacesInScope() method. The subset contains only those namespaces which are on this content, but are not on the content's parent. The order of the Namespaces is the same as they would appear in the getNamespacesInScope().

The getNamespacesInherited() method returns a subset of the getNamespacesInScope() method too. This subset contains only those Namespaces in scope on this Element, but also in scope on the Element's parent. The order of the Namespaces is the same as they would appear in the getNamespacesInScope().

The getNamespacesInherited() and getNamespacesIntroduced() are mutually exclusive, and the union of them will always be the full getNamespacesInScope() set.

Special notes about this functionality are:

  • Not all JDOM content has an Element parent, either becuase it is a document-level Content, or because it is detached. These instances are assumed to have a 'virtual' ancestry for Namespace purposes, and this virtual ancestor has the in-scope set of Namespaces consisting of the XML namespace xmlns:xml="http://www.w3.org/XML/1998/namespace".
  • For all non-Element and non-Attribute content (Comment, Text, CDATA, Processing, EntityRef, DocType) the in-scope Namespaces will be the same as the content's Parent Element (or the 'virtual' set if the content does not have a parent Element). The getNamespacesIntroduced() list will always be empty. The order for these content types will be the same as the order in the parent Element even though these content types do not have a Namespace concept.
  • Attributes have always been a special case for Namespaces. Non-prefixed Attributes are always in the NO_NAMESPACE namespace (even if the Attribute's parent Element has re-bound the 'default' (no-prefix) namespace to some other URI). Thus, it is possible for the Namespace set for an Attribute to be different from the Attribute's parent Element (because the Attribute can potentially re-bind the "" prefix to the "" URI). It also follows that the order of the in-scope Namespaces will always be the Attribute's Namespace, followed by other in-scope Namespaces in prefix order. The getNamespacesIntroduced() set will always be empty unless the Attribute re-binds the "" prefix to the "" URI.