Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Schema validation can miss namespaces of default attributes (Thomas Scheffler) #1

Closed
hunterhacker opened this Issue · 7 comments

3 participants

@hunterhacker

On Wed, Jul 20, 2011 at 8:23 AM, Thomas Scheffler
thomas.scheffler@uni-jena.de wrote:
Hi,

if I parse a valid MODS document with XML Schema validation, JDOM changes
attributes as it handles default values of schema not correctly (by ignoring
the namespace).

Here is a short code to demonstrate this:

SAXBuilder builder = new SAXBuilder(true);
builder.setFeature("http://xml.org/sax/features/namespaces", true);
builder.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
builder.setFeature("http://apache.org/xml/features/validation/schema",
true);

Document document = builder.build(new
URL("http://academiccommons.columbia.edu/download/fedora_content/show_pretty/ac:111060/CONTENT/ac111060_description.xml"));
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
xout.output(document, System.out);

Here is a result fragment:


Edwards
Stephen A.

author

Columbia University. Computer Science

If you look at the original document you can see, that @type of name is
"personal". The "simple" comes from the xlink XML-Schema that was included
by the MODS-Schema. Therefor the result fragment should look like this:


Edwards
Stephen A.

author

Columbia University. Computer Science

If I use DOM from Java this is done correctly (but a bit ugly as it does not
use the namespace prefix already defined).

Could someone just fix this, please?

@hunterhacker
Owner

Thomas sent in a patch:
http://markmail.org/message/j2e2xu6yyklmw5ea

Before accepting it we should run a test to see what SAX is actually reporting given his document to see whether it's a problem with JDOM or with SAX. The patch makes up a namespace prefix (one that isn't in the original document) and adds it to the known namespaces, and that's sketchy.

@hunterhacker
Owner

Thomas says:

It is a bug in the SAXHandler class where attributes with a different Namespace are only detected by their QName and not by the different Namespace-URI. I attached a patch that fixes this bug.
It would be great, if this could be integrated and released soon in a version 1.1.2.

@hunterhacker
Owner

Brad investigated and confirmed SAX is reporting two attributes with the same qname:

I did a simple test and printed out what was being given to SAXHandler
by the parser. Sure enough, on this document when validation is turn
on, the SAX parser is reporting two attributes with the same qname
(which happens to be the local name since they are both unprefix), but
in different namespaces.

JDOM should take the events as presented and build a object tree to
represent it. It shouldn't "generate" anything. However, in this
case I don't know who to point the finger at, but it's not JDOM.

Elliotte, are you still lurking around?


namespaceURI = http://www.loc.gov/mods/v3
localName = name
qName = name
attribute local name = type
attribute qname = type
attribute uri =
attribute type = CDATA
attribute value = personal
attribute local name = type
attribute qname = type
attribute uri = http://www.w3.org/1999/xlink
attribute type = CDATA
attribute value = simple


namespaceURI = http://www.loc.gov/mods/v3
localName = namePart
qName = namePart
attribute local name = type
attribute qname = type
attribute uri =
attribute type = CDATA
attribute value = family

@rolfl
Collaborator

After looking in to the issue, I see the following logic issues:

The XMLSchema declares an attribute for an element to be either 'default' or 'fixed'. Additionally, the attribute is defined as form="qualified" (or all attributes are default-qualified with attributeFormDefault="qualified")

As far as I can tell, you can reproduce the problem without actually having 'import' declarations on your schema.

In these conditions, the validating parser will add the attribute to the parse 'Handler'.

In the 'worst' case, your document declares a 'default' namespace ... xmlns="http://my.namespace"

Your schema declares an attribute:
''''

In this condition, the parser needs to add the 'att' attribute to the startElement, but it literally does not have a prefix to work with. Nothing has declared a prefix for the namespace "http://my.namespace"

In this condition, the only way to add the attribute to the element is to 'generate' a namespace.

With an 'imported' XMLSchema, the specification requires that the document declare the prefix/namespace for the imported namespace. I guess, techically, it would be possible to construct a situation where the imported XMLSchema is in the default namespace....

Regardless, in the situation where there is no prefixed namespace declaration for the attribute's namespace, the code has to generate a prefix.

Another conditition where there could be a problem is where the actual namespace for the attribute was declared higher up in the document structure, but then the prefix was re-declared for a different namespace lower down...

I proposed the following patch http://markmail.org/message/gpvgp5afzltqes5e

If I get the syntax-highlighting right though ....

            } else if (atts.getURI(i) != null && atts.getURI(i).length() > 0) {
                // the localname and qName are the same, but there is a
                // Namspace URI. We need to figure out the namespace prefix.
                // this is an unusual condition. Currently the only known trigger
                // is when there is a fixed/defaulted attribute from a validating
                // XMLSchema, and the attribute is in a different namespace
                // than the rest of the document, this happens whenever there
                // is an attribute definition that has form="qualified".
                //  <xs:attribute name="attname" form="qualified" ... />
                // or the schema sets attributeFormDefault="qualified"
                String attURI = atts.getURI(i);
                Namespace attNS = null;
                Element p = element;
                // We need to ensure that a particular prefix has not been
                // overridden at a lower level than what we are expecting.
                // track all prefixes to ensure they are not changed lower
                // down.
                HashSet overrides = new HashSet();
                uploop: do {
                    // Search up the Element tree looking for a prefixed namespace
                    // matching our attURI
                    if (p.getNamespace().getURI().equals(attURI)
                            && !overrides.contains(p.getNamespacePrefix())
                            && !"".equals(element.getNamespace().getPrefix())) {
                        // we need a prefix. It's impossible to have a namespaced
                        // attribute if there is no prefix for that attribute.
                        attNS = p.getNamespace();
                        break uploop;
                    }
                    overrides.add(p.getNamespacePrefix());
                    for (Iterator it = p.getAdditionalNamespaces().iterator();
                            it.hasNext(); ) {
                        Namespace ns = (Namespace)it.next();
                        if (!overrides.contains(ns.getPrefix())
                                 && attURI.equals(ns.getURI())) {
                            attNS = ns;
                            break uploop;
                        }
                        overrides.add(ns.getPrefix());
                    }
                    if (p == element) {
                        p = currentElement;
                    } else {
                        p = p.getParentElement();
                    }
                } while (p != null);
                if (attNS == null) {
                    // we cannot find a 'prevailing' namespace that has a prefix
                    // that is for this namespace.
                    // This basically means that there's an XMLSchema, for the
                    // DEFAULT namespace, and there's a defaulted/fixed
                    // attribute definition in the XMLSchema that's targeted
                    // for this namespace,... but, the user has either not
                    // declared a prefixed version of the namespace, or has
                    // re-declared the same prefix at a lower level with a
                    // different namespace.
                    // All of these things are possible.
                    // Create some sort of default prefix.
                    int cnt = 0;
                    String base = "attns";
                    String pfx = base + cnt;
                    while (overrides.contains(pfx)) {
                        cnt++;
                        pfx = base + cnt;
                    }
                    attNS = Namespace.getNamespace(pfx, attURI);
                }
                attribute = factory.attribute(attLocalName, atts.getValue(i),
                        attType, attNS);
            } else {
@elharo

rolfi's comment sounds correct. something is off here, and it may
not be JDOM. Jason says:

"I did a simple test and printed out what was being given to SAXHandler
by the parser. Sure enough, on this document when validation is turn
on, the SAX parser is reporting two attributes with the same qname
(which happens to be the local name since they are both unprefix), but
in different namespaces."

That should NOT be happening. There is no such thing as an
unprefixed attribute that is in a namespace. That simply does not
exist in XML. If the parser is reporting such an attribute, the parser
is wrong.

@rolfl
Collaborator

I think everyone is in violent agreement here... it sucks, and it's 'wrong'.

XMLSchema should make it clear what should happen in this case, but XSD 1.0 is 'clear' that it should do nothing, and XSD 1.1 has partial coverage of the issue... (or, if you interpret it in just the right way, it does cover it, and say the prefix should be generated by the parser).

Xerces (the parser) currently takes the XSD1.0 view, and claims that it's 'right' to not be prefixing the attributes.

JDOM is the victim (other than the end user like Thomas) because it is getting bad information, and then doing worse things with it.

Michael filed a 'ticket' against XMLSchema (1.1) to clarify/resolve it. You can see the discussion at:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=13750

I filed a ticket against xerces at:
https://issues.apache.org/jira/browse/XERCESJ-1524

Then there is the JDOM issue at:
#1

ahh, you found it.

Rolf

@rolfl rolfl closed this issue from a commit
@rolfl rolfl fixes #1 - Default attributes in namespaces from XMLSchemas
Add tests for this code.
re-arrange the tests in AllTests to group them better.
Add a specific class (currently incomplete) to test SAXHandler directly.
Get repo tidied up so I can get some 1.1.2 stuff done.
7ff5828
@rolfl rolfl closed this in 7ff5828
@rolfl rolfl referenced this issue from a commit
@rolfl rolfl Implement DOM-Side fox for issue #1
The fix is not quite the same as the SAX side because the Element
hierarchy is neater in the DOM build process.
Include tests for the complex schema setups.
54af3d1
@rolfl
Collaborator

Because DOMBuilder was a 'victim' of the EOL marker mess up, and was later fixed, it is hard to see the actual diff.

As a result, you have to compare the EOL Fixed version with the pre-fix version.

This URL should do that for you:
58790e7...8f8e664#diff-14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.