New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: fix name spaces #13400

Open
rsc opened this Issue Nov 25, 2015 · 19 comments

Comments

Projects
None yet
@rsc
Contributor

rsc commented Nov 25, 2015

There are many pending issues related to the handling of xml name spaces. We need to rethink this. One more try and then we're going to give up.

  • #7535 encoding/xml: Encoder duplicates namespace tags
  • #11496 encoding/xml: Serializing XML with namespace prefix
  • #9775 encoding/xml: Unmarshal does not properly handle NCName in XML namespaces
  • #12624 encoding/xml: brittle support for matching a namespace by identifier or url
  • #8167 encoding/xml: disallow attributes named xmlns:*
  • #11735 encoding/xml: empty namespace conventions are badly documented
  • #8068 encoding/xml: empty namespace prefix definitions should be illegal
  • #8535 encoding/xml: failure to handle conflicting tags in different namespaces
  • #11431 encoding/xml: loss of xmlns= in encoding since Go 1.4
  • #7113 encoding/xml: missing nested namespace not handled
  • #11724 encoding/xml: namespaced and non-namespaced attributes conflict
  • #7233 encoding/xml: omit parent tags if value is empty #2
  • #12406 encoding/xml: support QName values / expose namespace bindings
  • #9519 encoding/xml: support for XML namespace prefixes
@aeden

This comment has been minimized.

aeden commented Dec 7, 2015

then we're going to give up

Please don't give up 😄

No doubt that proper namespace handling is a pain, but getting it right is very valuable for those of us who are stuck with legacy APIs that only speak XML.

@justinlindh-wf

This comment has been minimized.

justinlindh-wf commented Dec 14, 2015

Seconding the comment above. Please don't give up on this. There is still value in handling namespaces properly in golang XML.

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 14, 2015

@simplicitylab

This comment has been minimized.

simplicitylab commented Dec 18, 2015

+1 I was researching Go as a possible replacement for Python in processing ODT files and I was quickly hitting snags when dealing with multiple namespaces.

@keltia

This comment has been minimized.

keltia commented Dec 18, 2015

I feel your pain, I'm also stuck with SOAP-based web services and the namespace issue is a thorn here. Going to generate XML w/ templates for now. Please do not give up :)

@rsc

This comment has been minimized.

Contributor

rsc commented Dec 18, 2015

@keltia

This comment has been minimized.

keltia commented Dec 18, 2015

Point taken. I do not have code right now (because I decided to generate my SOAP requests throught text/template for the moment) but thet fact is, xml.Marshal generate tags w/o any namespace. I tried to use https://github.com/gabstv/go-soap (which use xml.Marshal) and even though there are "xmlns:" declaration for many tags, none of them are in the resulting XML file. Many of the open tickets above are in the same situation.

xml.Unmarshal() is fine.

@SamWhited

This comment has been minimized.

Member

SamWhited commented Jan 1, 2016

A somewhat specific example I'd like to see work properly without a lot of hacks (I normally pretend the namespace is part of the localname, which works well enough with XMPP because it's such a limited profile of XML, but won't work with more complex documents) is something like the following valid XMPP stream:

<?xml version='1.0'?>
<stream:stream
    from='im.example.com'
    to='juliet@im.example.com'
    version='1.0'
    xml:lang='en'
    xmlns='jabber:client'
    xmlns:stream='http://etherx.jabber.org/streams'>
<stream:error>
  <see-other-host xmlns="urn:ietf:params:xml:ns:xmpp-streams">im.example.net</see-other-host>
</stream:error>
</stream>

which, when decoded and re-encoded will currently result in something like:

<stream
    xmlns="stream"
    from="juliet@im.example.com"
    to="im.example.com"
    version="1.0"
    xmlns:_xml="xml"
    _xml:lang="en"
    xmlns="jabber:client"
    xmlns:_xmlns="xmlns"
    _xmlns:stream="http://etherx.jabber.org/streams">
…

Example here: https://play.golang.org/p/XyMFO301HA

Note that the example reads a raw token because this is XMPP and it has to be streamed which the current XML implementation isn't very good at; that's a separate issue though.

@pdw-mb

This comment has been minimized.

pdw-mb commented Feb 17, 2016

The issue in the above comment is effectively issue #7535

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Feb 19, 2016

See #14407 .

gopherbot pushed a commit to golang/net that referenced this issue Apr 7, 2016

webdav: remove runtime check for Go 1.4 or earlier.
During the Go 1.5 development cycle, this package used to require the
standard library's encoding/xml package from Go 1.5 or later, but
https://go-review.googlesource.com/#/c/12772/ (which was submitted in
July 2015) made an internal fork of encoding/xml, as some
namespace related changes introduced in the Go 1.5 cycle were rolled
back in response to golang/go#11841

Thus, this "go1.4" runtime check is no longer necessary. In the long
term, this package should use the standard library's version, and the
internal fork deleted, once golang/go#13400 is
resolved. We could re-introduce a similar check at that time, although
it could be done at compile time (via a "go1.7" build tag) instead of at
runtime.

Change-Id: I18258443aa3d9b519e23106aedb189f25c35495d
Reviewed-on: https://go-review.googlesource.com/21634
Reviewed-by: Andrew Gerrand <adg@golang.org>

gopherbot pushed a commit to golang/net that referenced this issue Apr 7, 2016

webdav: have the exported API use the standard library's xml.Name type.
In particular, the Property and DeadPropsHolder types need to refer to
the standard xml package, not the internal fork, to be usable by other
packages.

Inside the package, the XML marshaling and unmarshaling is still done by
the etc/internal/xml package, and will remain that way until
golang/go#13400 is resolved.

Fixes golang/go#15128.

Change-Id: Ie7e7927d8b30d97d10b1a4a654d774fdf3e7a1e3
Reviewed-on: https://go-review.googlesource.com/21635
Reviewed-by: Andrew Gerrand <adg@golang.org>

@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016

@quentinmit quentinmit added the NeedsFix label Oct 7, 2016

@rsc rsc modified the milestones: Go1.9Early, Go1.8 Oct 26, 2016

@klauern

This comment has been minimized.

klauern commented Feb 17, 2017

Has there been any work done on this front? I have a big need to work with a bunch of clunky SOAP API's, and these namespaces are just killing me everywhere with it. Are there any libraries that work around this that I could make use of in the interim?

@rsc

This comment has been minimized.

Contributor

rsc commented Feb 17, 2017

No, there has not.

@RayfenWindspear

This comment has been minimized.

RayfenWindspear commented Oct 13, 2017

Seems there hasn't been any progress on this. I know it's a dirty workaround, but for Marshalling I ended up just generating a string constant that I just replace with the proper namespaces. Probably just a tad safer than what someone in a duplicate issue did to workaround using text/template

https://play.golang.org/p/MlJEz9PSSO

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@alexrsagen

This comment has been minimized.

alexrsagen commented Jan 26, 2018

I work a lot with integration against RFC5730 EPP servers. This highly depends on the encoding/xml package, so i wrote an example showing this use case and how the outlined issues affect it 😄

https://play.golang.org/p/H1s6P95N6tx

I'll be glad to help out with more examples of this use case, just mention me if needed.

@tajtiattila tajtiattila referenced this issue Mar 27, 2018

Open

png + xmp? #3

@iWdGo

This comment has been minimized.

Contributor

iWdGo commented Apr 18, 2018

A short summary of information submitted for the individual issues. Generally speaking, the namespace standard (https://www.w3.org/TR/xml-names) was not correctly enforced. Only #7113 fix is not yet available. No work on extending functionality has been done.

#7535 Fix submitted
#11496 Commented (invalid syntax)
#9775 Commented (invalid syntax)
#12624 Commented (invalid syntax)
#8167 Improvement request to compact output
#11735 Related to documentation. XMLName usage must be understood by reading Marshal/Unmarshal documentation jointly.
#8068 Fix submitted
#8535 Fix submitted
#11431 No other fix required. Tests have been submitted. (duplicates #8535)
#7113 No fix available yet.
#11724 Fix submitted
#7233 Commented
#12406 Adding func - no work so far
#9519 Improvement request to compact output. (duplicates #8167)

@iWdGo

This comment has been minimized.

Contributor

iWdGo commented Apr 23, 2018

#7113 has also a proposed fix. All key issues related to the enforcement of namespace standard are solved.
This fix leaves active the handling of the nesting of the tags which would be a basis for other related improvements.

@iWdGo

This comment has been minimized.

Contributor

iWdGo commented Apr 27, 2018

All above submitted fixes have minor incompatibilities when merged related to the handling of prefixes. A merged version has been submitted as one. The commit message contains the details previously submitted. Namespace rules are completely enforced. Declaring a namespace is done using xmlns="value w/o prefix" or xmlns:prefix="value with prefix".

The implementation is referring not to eventual prefixes but to their values. To improve XML compatibility, it was using a workaround for this behavior by adding attributes names (using _xmlns) to create chains and consequently almost valid XML. This whole business is now inactive for namespaces as it is not documented. Valid XML is now always returned but sometimes contains oddities as previously.

There are several ways to create namespaces and attributes and various possible conflicts. Further Marshal and Unmarshal share codes so exceptions arise from both types of operations. Idempotency (marshal/unmarshal in cycles w/o changes) is the objective for regular XML and has been mentioned several times in the issues. It is implicit to the documentation. It is not achieved as, for instance, prefix are not restored in tags (cf. TestIssue7535).

The XMLName of a struct must be unique. It designates either the override of the struct name together with an optional non-prefixed namespace. When unmarshaling, the value (the namespace URI usually) is stored into the XMLName of the struct. Without an XMLName field, it is lost.

When marshaling, it can only be an attribute as no prefix is available. A decoder has a strict property which can be false and allows to produce invalid XML. Unmarshal is always strict as it uses Token. Unmarshal can create a prefix only with an attribute.

A Token contains a token.Name.Value used as the value of the namespace when there is one which means that the prefix used is never available if it is not an attribute.

@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 Jun 13, 2018

@alexrsagen

This comment has been minimized.

alexrsagen commented Jul 18, 2018

For everyone watching these XML issues in Go and hoping for a fix ASAP, I finally took the plunge and wrote an API-compatible binding to libxml2, which serves great as a temporary workaround for these namespace issues.

https://github.com/alexrsagen/go-libxml - I hope this helps someone!

@gunnsth

This comment has been minimized.

gunnsth commented Oct 25, 2018

I would like to be able to load XML and write back out similarly, such that the XML is readable. Namespaces are used with a prefix to simplify the XML for readability purposes.

Input XML:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<oval_definitions
	xmlns="http://oval.mitre.org/XMLSchema/oval-definitions-5"
	xmlns:ios="http://oval.mitre.org/XMLSchema/oval-definitions-5#ios"
	xmlns:oval="http://oval.mitre.org/XMLSchema/oval-common-5"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://oval.mitre.org/XMLSchema/oval-definitions-5#ios http://oval.mitre.org/language/version5.11/ovaldefinition/complete/ios-definitions-schema.xsd http://oval.mitre.org/XMLSchema/oval-definitions-5 http://oval.mitre.org/language/version5.11/ovaldefinition/complete/oval-definitions-schema.xsd http://oval.mitre.org/XMLSchema/oval-common-5 http://oval.mitre.org/language/version5.11/ovaldefinition/complete/oval-common-schema.xsd">
	<generator>
		<oval:product_name>Creator</oval:product_name>
		<oval:schema_version>5.11.2</oval:schema_version>
		<oval:timestamp>2018-07-11T12:11:47-04:00</oval:timestamp>
	</generator>	
	<tests>
		<ios:global_test check="all" check_existence="at_least_one_exists" id="oval:tst:1" version="1">
			<ios:object object_ref="oval:obj:1"/>
			<ios:state state_ref="oval:ste:1"/>
		</ios:global_test>
	</tests>
</oval_definitions>

When I write this back out with xml.MarshalIndent, the output is

<oval_definitions>
    <generator xmlns="http://oval.mitre.org/XMLSchema/oval-definitions-5">
        <product_name>Creator</product_name>
        <schema_version>5.11.2</schema_version>
        <timestamp>2018-07-11T12:11:47-04:00</timestamp>
    </generator>
    <tests>
        <global_test xmlns="http://oval.mitre.org/XMLSchema/oval-definitions-5#ios" check="all" check_existence="at_least_one_exists" id="oval:tst:1" comment="" version="1">
            <object object_ref="oval:obj:1"></object>
            <state state_ref="oval:ste:1"></state>
        </global_test>
    </tests>
</oval_definitions>

This makes the XML less readable, defining the xmlns on a tag basis rather than using the prefix/alias closer to the top (top level tag). And with huge XML files that can be ugly to work with.

Example here: https://play.golang.org/p/oDmT9nDhLmu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment