Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

RFE: Modify XMLOutputter to allow smart subclasses (Paul Libbrecht) #5

Closed
hunterhacker opened this Issue · 3 comments

2 participants

@hunterhacker
Owner

http://markmail.org/message/4nfda3qfi36lc5w5

Hello,

please find at:
http://www.activemath.org/~paul/tmp/DTDaware
a contribution to JDOM in the form of a patched XMLOutputter to allow

subclasses to stop output of some attributes and namespace

declarations as well as a DTDAwareXMLOutputter subclass which uses

Mark Wutka's DTDparser (I used version 1.23) to decide not to output

attributes or namespace decls if they are implicit in the DTD. This

feature has been a key to maintenance of a clean authorable XML.

I haven't put licenses yet... want me to?
Feel free to apply any license there.
thanks in advance

paul

Here's the XMLOutputter.java diff

--- XMLOutputter.java   2010-10-28 14:05:31.000000000 -0700
+++ /tmp/XMLOutputter.java  2011-07-31 17:50:46.000000000 -0700
@@ -115,7 +115,7 @@
 public class XMLOutputter implements Cloneable {

     private static final String CVS_ID =
-      "@(#) $RCSfile: XMLOutputter.java,v $ $Revision: 1.117 $ $Date: 2009/07/23 05:54:23 $ $Name:  $";
+      "@(#) $RCSfile: XMLOutputter.java,v $ $Revision: 1.117 $ $Date: 2009/07/23 05:54:23 $ $Name: jdom_1_1_1 $";

     // For normal output
     private Format userFormat = Format.getRawFormat();
@@ -1099,9 +1099,10 @@
      * declarations.
      *
      * @param ns <code>Namespace</code> to print definition of
+     * @param elt <code>Element</code> in which this namespace is output
      * @param out <code>Writer</code> to use.
      */
-    private void printNamespace(Writer out, Namespace ns,
+    private void printNamespace(Writer out, Namespace ns, Element elt,
                                 NamespaceStack namespaces)
                      throws IOException {
         String prefix = ns.getPrefix();
@@ -1111,6 +1112,9 @@
         if (uri.equals(namespaces.getURI(prefix))) {
             return;
         }
+        if(!shouldOutputNamespace(ns,elt,namespaces)) {
+            return;
+        }

         out.write(" xmlns");
         if (!prefix.equals("")) {
@@ -1123,6 +1127,19 @@
         namespaces.push(ns);
     }

+    protected boolean shouldOutputNamespace(Namespace ns, Element element, NamespaceStack namespaces) {
+        // Add namespace decl only if it's not the XML namespace and it's
+        // not the NO_NAMESPACE with the prefix "" not yet mapped
+        // (we do output xmlns="" if the "" prefix was already used and we
+        // need to reclaim it for the NO_NAMESPACE)
+        if (ns == Namespace.XML_NAMESPACE) {
+            return false;
+        } else if ( ((ns == Namespace.NO_NAMESPACE) &&
+               (namespaces.getURI("") == null))) {
+            return false;
+        } else
+            return true;
+    }
     /**
      * This will handle printing of a <code>{@link Attribute}</code> list.
      *
@@ -1141,36 +1158,33 @@
         for (int i = 0; i < attributes.size(); i++) {
             Attribute attribute = (Attribute) attributes.get(i);
             Namespace ns = attribute.getNamespace();
-            if ((ns != Namespace.NO_NAMESPACE) &&
-                (ns != Namespace.XML_NAMESPACE)) {
-                    printNamespace(out, ns, namespaces);
+            if (shouldOutputNamespace(ns,parent,namespaces)
+                    && ns != Namespace.NO_NAMESPACE && ns != Namespace.XML_NAMESPACE) {
+                    printNamespace(out, ns, parent, namespaces);
             }

-            out.write(" ");
-            printQualifiedName(out, attribute);
-            out.write("=");
+            if(shouldOutputAttribute(attribute,parent,namespaces)) {
+                out.write(" ");
+                printQualifiedName(out, attribute);
+                out.write("=");

-            out.write("\"");
-            out.write(escapeAttributeEntities(attribute.getValue()));
-            out.write("\"");
+                out.write("\"");
+                out.write(escapeAttributeEntities(attribute.getValue()));
+                out.write("\"");
+            }
         }
     }

+    protected boolean shouldOutputAttribute(Attribute attribute, Element parent, NamespaceStack namespaces) {
+        return true;
+    }
+
     private void printElementNamespace(Writer out, Element element,
                                        NamespaceStack namespaces)
                              throws IOException {
-        // Add namespace decl only if it's not the XML namespace and it's
-        // not the NO_NAMESPACE with the prefix "" not yet mapped
-        // (we do output xmlns="" if the "" prefix was already used and we
-        // need to reclaim it for the NO_NAMESPACE)
         Namespace ns = element.getNamespace();
-        if (ns == Namespace.XML_NAMESPACE) {
-            return;
-        }
-        if ( !((ns == Namespace.NO_NAMESPACE) &&
-               (namespaces.getURI("") == null))) {
-            printNamespace(out, ns, namespaces);
-        }
+        if(shouldOutputNamespace(ns,element,namespaces))
+            printNamespace(out, ns, element, namespaces);
     }

     private void printAdditionalNamespaces(Writer out, Element element,
@@ -1180,7 +1194,7 @@
         if (list != null) {
             for (int i = 0; i < list.size(); i++) {
                 Namespace additional = (Namespace)list.get(i);
-                printNamespace(out, additional, namespaces);
+                printNamespace(out, additional, element, namespaces);
             }
         }
     }

Here's DTDAwareXMLOutputter.java

package org.jdom.output;

import com.wutka.dtd.DTD;
import com.wutka.dtd.DTDElement;
import com.wutka.dtd.DTDAttribute;
import org.jdom.Element;
import org.jdom.Namespace;
import org.jdom.Attribute;

/** A subclass of {@link XMLOutputter} to avoid printing some attributes and namespace declarations
 * whose values is already the default as specified by the DTD. This is a key ingredient to provide
 * a much more readable output but may break re-parsing if not output with the appropriate
 * {@link org.jdom.DocType}.
 *
 * @author Paul Libbrecht <paul@activemath.org>
 */
public class DTDAwareXMLOutputter extends XMLOutputter {

    public DTDAwareXMLOutputter() {
        super();
    }

    public DTDAwareXMLOutputter(DTD dtd) {
        super();
        this.setDtd(dtd);
    }

    public DTDAwareXMLOutputter(Format format) {
        super(format);
    }

    public DTDAwareXMLOutputter(XMLOutputter that) {
        super(that);
    }

    protected DTD dtd;

    public DTD getDtd() {
        return dtd;
    }

    public void setDtd(DTD dtd) {
        this.dtd = dtd;
    }



    protected boolean shouldOutputNamespace(Namespace ns, Element element, NamespaceStack namespaces) {
        if(super.shouldOutputNamespace(ns,element,namespaces))
        if(dtd == null) return true;
        DTDElement eltDecl = null;
        eltDecl = (DTDElement) dtd.elements.get(element.getName());
         if(eltDecl!=null) {
             String nsAttName;
             String prefix = ns.getPrefix();
             if(prefix!=null && prefix.length()>0) {
                 nsAttName = "xmlns:".concat(prefix);
             } else {
                 nsAttName = "xmlns";
             }
            DTDAttribute nsDecl = eltDecl.getAttribute(nsAttName);
            if(nsDecl != null && ns.getURI().equals(nsDecl.getDefaultValue())) {
                return false;
            }
         }
        return true;
    }

    protected boolean shouldOutputAttribute(Attribute attribute, Element parent, NamespaceStack namespaces) {
        if(false == super.shouldOutputAttribute(attribute, parent, namespaces)) return false;
        // PL: check if attribute is in default value, then don't output it
        DTDElement eltDecl = null;
        if(dtd!=null) {
            eltDecl = (DTDElement) dtd.elements.get(parent.getName());
            if(eltDecl!=null) {
                DTDAttribute attDecl = eltDecl
                        .getAttribute(attribute.getQualifiedName());
                if(attDecl!=null) {
                    String defaultValue = attDecl.getDefaultValue();
                    if(defaultValue!=null && defaultValue.equals(attribute.getValue()))
                        return false;
                }
            }
        }
        return true;
    }
}
@hunterhacker
Owner

Found an old email from Paul explaining the use case some more:

I gave some cycles into making XMLOutputter DTD-aware.
The reason for doing so is that we write XML-files with a dtd-reference with a large set of hidden information encoded in the DTD such as namespace for almost all elements. Most of these are attribute default values.
Only using such an outputter I can claim that our authors' files are not changed too dramatically. Otherwise, each line is made twice as big and completely unreadable.

Clearly using a finer-grained parsing (that would report wether an attribute is present or only "implied") would bring it all... but such fine-grained lexical analysis isn't available as far as I know.

So I just adapted XMLOutputter to prevent the output of attributes of namespaces if equivalent to the DTD-specified values... seems to be working fine.

I'd love providing this to the project. It is relying on Mark Wutka's DTD parser, now with an Apache-style-license, which is the only usable DTD-parser I found.
Where could I drop such a submission ?

Also I would have wished to subclass XMLOutputter but this turned out to be impossible... there are too many private methods that needed to be either re-used or overridden... not sure if that's solvable. Currently, I just added "setDtd" and modified XMLOuputter's methods directly.

@rolfl
Collaborator

The following is some example code using Wutka's DTDParser to inspect a DTD document, then to populate an XMLOutputProcessor with Attributes (with default values) to ignore. The XMLOutputProcessor will then ignore attributes with the specified values. You can see the example XMLOutputProcessor in org.jdom2.contrib.dtdaware.AttAwareXMLOutputProcessor

Because we do not want to include the DTDParser in JDOM, the example code is simply pasted here... that avoids compile issues....

import java.io.File;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import com.wutka.dtd.DTD;
import com.wutka.dtd.DTDAttribute;
import com.wutka.dtd.DTDElement;
import com.wutka.dtd.DTDParser;

import org.jdom2.DocType;
import org.jdom2.Document;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;
import org.jdom2.input.sax.XMLReaders;
import org.jdom2.output.Format;
import org.jdom2.output.XMLOutputter;
import org.jdom2.contrib.dtdaware.AttAwareXMLOutputProcessor;
/**
 * Sample code for 'hiding' defaulted attribute values.
 *
 */
public class DTDAwareManager {

    /**
     * @param args Files to process.
     * @throws IOException If there's IO problems
     * @throws JDOMException For other problems
     * @throws URISyntaxException for URI problems
     */
    public static void main(String[] args) throws JDOMException, IOException, URISyntaxException {
        SAXBuilder sb = new SAXBuilder(XMLReaders.DTDVALIDATING);
        for (String fname : args) {
            File file = new File(fname);
            if (file.canRead() && file.isFile()) {
                Document doc = sb.build(file);
                DocType dt = doc.getDocType();
                AttAwareXMLOutputProcessor processor = new AttAwareXMLOutputProcessor();

                if (dt.getSystemID() != null) {
                    URI uri = new URI(dt.getSystemID());
                    if (!uri.isAbsolute() && doc.getBaseURI() != null) {
                        uri = new URI(doc.getBaseURI()).resolve(uri);
                    }
                    DTDParser dtdp = new DTDParser(uri.toURL());
                    DTD dtd = dtdp.parse();
                    for (final Object en : dtd.elements.keySet()) {
                        final DTDElement de = (DTDElement)dtd.elements.get(en);
                        for (final Object an : de.attributes.keySet()) {
                            DTDAttribute att = (DTDAttribute)de.attributes.get(an);
                            if (att.defaultValue != null) {
                                processor.ignore(de.getName(), att.getName(), 
                                        att.getDefaultValue());
                            }
                        }
                    }
                }
                // this one uses the new processor, which ignores content
                XMLOutputter xoutf = new XMLOutputter(Format.getPrettyFormat(), processor);
                xoutf.output(doc, System.out);

                // this one uses the standard processor, which does not ignore.
                XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
                xout.output(doc, System.out);
            }

        }

    }

}
@rolfl rolfl closed this issue from a commit
@rolfl rolfl Fixes #5 - Easly extendable XMLOutputter
This is an example extension of the Outputter that shows how to remove
Attributes that have specific details.
053890e
@rolfl rolfl closed this in 053890e
@rolfl
Collaborator

I have revisited this issue while moving code from contrib to core (issue #66). This particular functionality (tracking 'Specified' Attributes vs. those attributes which are 'defaulted' in the DTD) is now natively supported in the JDOM Core functionality.

Now, in order to print only the specified attributes, you can simply:

Format format = Format.getPrettyFormat();
format.setSpecifiedAttributesOnly(true);
XMLOutputter xout = new XMLOutputter();
xout.setFormatter(format);
System.out.println(xout.outputString(mydocument));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.