Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML library: Element in no namespace assigned to parent element namespace #1784

Closed
akloeber opened this issue Sep 5, 2014 · 8 comments
Closed
Assignees
Milestone

Comments

@akloeber
Copy link

akloeber commented Sep 5, 2014

When using the default XML namespace stripping (i.e. keep_clark_notation not set) namespaces can become corrupted as demonstrated in the following example:

${xml}=    XML.Parse Xml    <ns0:foo xmlns:ns0="http://example.com/ns0"><bar>buzz</bar></ns0:foo>
Log element    ${xml}

The output becomes:

 <foo xmlns="http://example.com/ns0"><bar>buzz</bar></foo>

In the input the <bar> element has no namespace as there is no namespace prefix on the <bar> element itself and no default namespace is set. In the output the <bar> element belongs to namespace http://example.com/ns0 inherited by the default namespace and therefore the output is no more logically equivalent to the input and thus invalid against the corresponding XML schema.

The only way to work around this issue is enabling Clark Notation which makes evaluation of XPaths really hards as already explained in the XML library documentation. Moreover there is no way to register and use namespace prefixes in order to use them in XPaths.

Environment:
Robotframework 2.8.5
Python 2.7.5
Mac OS X 10.9.4

@pekkaklarck
Copy link
Member

Good point. I simply didn't take this situation into account when designing how to strip namespaces. According to this Oracle doc it ought to be possible to undeclare the default namespace with xmlns="". In other words, the correct output of the original example should be:

<foo xmlns="http://example.com/ns0"><bar xmlns="">buzz</bar></foo>

Does that sound good to you @akloeber?

Registering namespace prefixes sounds like a valid enhancement request but requires a separate issue, preferably backed by a pull request.

@pekkaklarck pekkaklarck added bug and removed invalid labels Sep 5, 2014
@pekkaklarck pekkaklarck added this to the 2.8.6 milestone Sep 5, 2014
@pekkaklarck pekkaklarck self-assigned this Sep 5, 2014
@pekkaklarck
Copy link
Member

Assuming adding xmlns="" is a good solution, then this is the actual fix:

diff --git a/src/robot/libraries/XML.py b/src/robot/libraries/XML.py
index abc0126..e208d5d 100644
--- a/src/robot/libraries/XML.py
+++ b/src/robot/libraries/XML.py
@@ -1324,6 +1324,9 @@ class NameSpaceStripper(object):
             if ns != current_ns:
                 elem.attrib['xmlns'] = ns
                 current_ns = ns
+        elif current_ns:
+            elem.attrib['xmlns'] = ''
+            current_ns = None
         for child in elem:
             self.strip(child, current_ns)

@akloeber
Copy link
Author

akloeber commented Sep 6, 2014

Unfortunately this is only a simplified example. In our setup the XML we'd like to check is server generated and hence we do not have control over its structure.

@pekkaklarck
Copy link
Member

Is there a reason xmlns="" wouldn't work in your case?

@pekkaklarck
Copy link
Member

Assuming I understand xml namespaces correctly, using xmlns="" to undeclare the default namespace set earlier and thus ought to fix the reported problem regardless the xml structure. After the fix the resulting output ought to be semantically identical to the original.

@akloeber
Copy link
Author

akloeber commented Sep 7, 2014

You are right, if default namespaces are explicitely reset the output
should be equivalent. Sorry, I didn't get that after your first suggestion!

Am Sonntag, 7. September 2014 schrieb Pekka Klärck :

Assuming I understand xml namespaces correctly, using xmlns="" to
undeclare the default namespace set earlier and thus ought to fix the
reported problem regardless the xml structure. After the fix the resulting
output ought to be semantically identical to the original.


Reply to this email directly or view it on GitHub
#1784 (comment)
.

@pekkaklarck pekkaklarck changed the title XML-Namespace corruption if keep_clark_notation not set XML library: Element in no namespace assigned to parent element namespace Sep 8, 2014
@pekkaklarck
Copy link
Member

As I noted in the commit message of revision 6515328, this bug affected both the standard etree and lxml modes. In the lxml mode namespace prefixes are preserved correctly, so you may want to use that if prefixes are important.

@akloeber
Copy link
Author

akloeber commented Sep 8, 2014

Good to know, I'll give it a try. Thanks for your support!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants