Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: cannot marshal self-closing tag #21399

Open
halfcrazy opened this issue Aug 11, 2017 · 30 comments

Comments

Projects
None yet
@halfcrazy
Copy link

commented Aug 11, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8.3 darwin/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/xxx/.go:/Users/xxx/workspace/goproject"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.8.3/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.8.3/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/53/r61bh2fj1hg9n0w03fs__7kc0000gn/T/go-build440341166=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

I want to marshal a xml struct which has a self-closing tag field. After looking up in golang doc, there isn't an official support. In google group they use string.Replace, it's ugly. It seems we could write marshal func for the custom struct, by i'm willing to see an official support like

	type Person struct {
		XMLName   xml.Name `xml:"person"`
		Id        int      `xml:"id,attr"`
		FirstName string   `xml:"name>first"`
		LastName  string   `xml:"name>last"`
		Age       int      `xml:"age"`
		Height    float32  `xml:"height,omitempty"`
		Married   bool
		Address
		Comment string `xml:",comment"`
		Balabala string `xml:"balabala,allowempty"`  // i want something like this
	}

In xml standard, a self-closing tag is permitted.
https://www.w3.org/TR/REC-xml/#dt-empty

And empty-element tag should be used, and should only be used, for elements which are declared EMPTY.
https://www.w3.org/TR/REC-xml/#d0e2480

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2017

I like this idea; though i suspect the community will have to agree on it (and also the naming of the tag)

However, I whipped up a quick implementation at https://go-review.googlesource.com/59830 .

I'll polish it up and add tests if this proposal actually goes through.

@gopherbot

This comment has been minimized.

Copy link

commented Aug 29, 2017

Change https://golang.org/cl/59830 mentions this issue: encoding/xml: add 'allowempty' tag for self-closing XML tags

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Aug 29, 2017

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2017

CC @rsc XML suggestion.

@SamWhited

This comment has been minimized.

Copy link
Member

commented Aug 29, 2017

Change https://golang.org/cl/59830 mentions this issue: encoding/xml: add 'allowempty' tag for self-closing XML tags

The proposed patch only works for marshaling structs. What if we're using the EncodeToken method and write a StartElement immediately followed by its corresponding EndElement? If this ends up being desired and it were an option on the Encoder instead it would work in both places (although it would be global for all tags, not local to specific struct fields, but that seems more desriable to me).

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2017

We still need the ability to output non-selfclosing pairs of tags. The idea is to give the user the option to make certain fields self closing, not to do that for all fields.

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2017

Oh, I see, you're suggesting this be global. Hm. I don't really like the idea of making this a global change, but we could also make it an option on the marshaler.

@SamWhited

This comment has been minimized.

Copy link
Member

commented Oct 2, 2017

I don't really like the idea of making this a global change

Is there a benefit to only doing this for some elements and not for others?

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2017

For the same reason this bug exists in the first place; not all XML parsers handle this correctly and I wanted maximum flexibility.

I'm fine with doing it the other way though.

@SamWhited

This comment has been minimized.

Copy link
Member

commented Oct 2, 2017

not all XML parsers handle this correctly and I wanted maximum flexibility

If a parser did not handle this you couldn't mix the two anyways, you'd just have to turn it on or off entirely to make sure that parser would work.

In case there's confusion, I'm not suggesting that we always do self-closing tags or non-self-closing tags, but that an individual *Encoder should be configurable to do one or the other.

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2017

fair. I'll change the patches. Should/can I tag you for review?

@SamWhited

This comment has been minimized.

Copy link
Member

commented Oct 2, 2017

fair. I'll change the patches. Should/can I tag you for review?

I don't have +2 permissions, but I'm happy to run trybots and provide feedback (also, setting this on the Encoder is my suggestion, but I don't actually have any authority here, so grain of salt until rsc or someone replies :) )

@dnaeon

This comment has been minimized.

Copy link

commented Nov 18, 2017

Can we expect this to be implemented in Go 1.10?

Currently we have a Go library, that needs to communicate with a remote API endpoint, which can only understand empty XML elements with self-closing tags.

With the current implementation of XML package we are not able to use Go in order to communicate with that API endpoint. I realize that this is an issue of the remote side by not supporting both ways of specifying empty elements, but having the option to define how empty elements are defined in Go structs would be very useful and would allow us to overcome such edge cases.

Thanks,
Marin

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@rsc

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2017

Sorry, but we're not going to get to this for Go 1.10.

@iWdGo

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2018

The current behaviour of documented tag "omitempty" allows the following workaround.
AllowEmpty string `xml:"allowempty>doesnotmatter,omitempty"'
will be marshalled as
<allowempty></allowempty>

@cfstras

This comment has been minimized.

Copy link

commented Apr 24, 2018

@iWdGo how does this solve the original question?

The desire is to produce either
<tag /> or <tag>notempty</tag>. Not <tag></tag>.

@iWdGo

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2018

The answer was focused on the suggested additional tag. A medium fix is required to use self-closing tags. If implemented, it would not be optional without breaking Go 1 promise. #9519 and #8167 have the same constraint.

@robzon

This comment has been minimized.

Copy link

commented Jan 21, 2019

Any chances of this being addressed? Stumbled upon this issue while working with the XMPP protocol. Some clients will only accept <required/>, but not <required></required>.

@SamWhited

This comment has been minimized.

Copy link
Member

commented Jan 21, 2019

@robzon see the milestone; this issue is marked for Go 1.13.

EDIT: I should have also said: "please file an issue against those clients when you find them; they are broken".

@robzon

This comment has been minimized.

Copy link

commented Jan 21, 2019

It's been modified several times, so I'm not sure how reliable this is. Don't want to come across as complaining, I'm just curious on what can be realistically expected as I'm sure it's not a high priority thing.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jan 21, 2019

Go is an open source project. Want to send a patch?

We hope to do a general overhaul of the JSON and XML patches and feature requests in the 1.13 timeframe, but we hoped to do it for 1.12 also and didn't make it.

@SamWhited

This comment has been minimized.

Copy link
Member

commented Jan 22, 2019

Go is an open source project. Want to send a patch?

We hope to do a general overhaul of the JSON and XML patches and feature requests in the 1.13 timeframe, but we hoped to do it for 1.12 also and didn't make it.

I believe there has been a patch sitting around for this since around the 1.9 timeframe: https://golang.org/cl/59830

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jan 22, 2019

Thanks, I'll mark this as a release blocker for 1.13 so that we decide on that CL.

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Jan 22, 2019

@SamWhited worth noting, we already have an omitempty flag, and it's nice to have some uniformity here. But if not I can implement it the other way with a setter function on the marshaller.

@SamWhited

This comment has been minimized.

Copy link
Member

commented Jan 22, 2019

@Manishearth I don't have +2 powers, so I'm certainly not the decision maker, but my reasoning is that it's not really uniformity because one of them you might want on individual fields (because sometimes that absence of a field is significant), while the other is a behavior that I don't see you ever wanting to set for just one thing (using self-closing vs. non-self closing tags is never significant). Naturally, I may be wrong in this assumption though.

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Jan 22, 2019

@SamWhited hmm, sounds good. Though I feel like the exact same logic applies to omitempty. We could potentially allow for both, make both flags something you can set on the marshaller or on fields.

regarding an earlier point you made:

What if we're using the EncodeToken method and write a StartElement immediately followed by its corresponding EndElement?

Doesn't this mean we need to unedit some stuff as well? I'm looking through the code and it's tricky to support this use case at all regardless of approach. Best way would be to buffer up writeStart entries but that's kinda heavyweight imo

@SamWhited

This comment has been minimized.

Copy link
Member

commented Jan 22, 2019

make both flags something you can set on the marshaller or on fields

Personally I wouldn't mess with the one that already exists; at least not in this CL. That's a separate issue.

Doesn't this mean we need to unedit some stuff as well? I'm looking through the code and it's tricky to support this use case at all regardless of approach. Best way would be to buffer up writeStart entries but that's kinda heavyweight imo

It's been a while since I've thought about this, but at first blush I don't think we can "unedit" anything after the buffer is flushed since the Encoder is writing to an io.Writer. It would effectively be a "hint" that only works if the start element and end element are flushed at the exact same time, I think. If we write the first token, flush the encoder, then write the second there's nothing we could do about that.

Best way would be to buffer up writeStart entries but that's kinda heavyweight imo

All Encoders are already buffered, so this doesn't add too much. It probably requires a lot of refactoring of the code though to somehow have the previous token still exist when we go to write the end tag (or the next token when we write the start tag). I'm also not sure if editing the buffer can be done in a way that's not error prone; is it a reasonable assumption to just always replace the last character with /> if the tokens match?

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Jan 22, 2019

If we write the first token, flush the encoder, then write the second there's nothing we could do about that.

That sounds super fragile, especially for handling the original use case where the consumer requires self-closing tags (as opposed to just wanting cleaner output, where a fragile heuristic sounds okay).

@SamWhited

This comment has been minimized.

Copy link
Member

commented Jan 22, 2019

I'm not sure that there's any other way; even if we buffer tokens presumably you could still call Flush after writing the start token. If you really require self-closing tags, it's up to you to synchronize your flushes accordingly seems like an okay thing to require for a rare use case.

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Jan 22, 2019

I'm not sure that there's any other way

There is, we simply support this for marshalling structs -- the primary use case -- and not for using EncodeToken. (which is what I've been suggesting)

What we can also do is allow Token to have a SelfClosingElement variant so that logic for choosing when to emit empty tags lives in the caller

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Jan 22, 2019

To recap: we have the following choices we can make:

We can make allowempty be a tag on the struct or a field on the encoder, or both. I'll probably go with a field (and maybe uniformly make it both for both allowempty and omitempty in a different CL).

If it's a field on the encoder, we can either only add support for marshalling structs, or add support for handling custom emitted XML too. The latter is much more tricky to get right. I have some ideas but I'd rather us decide for sure on doing that before going ahead an implementing it. A potential middle ground is to make the field only affect marshalling structs, and for EncodeToken add a new SelfClosing token variant so the caller can deal with it.

@andybons andybons modified the milestones: Go1.13, Go1.14 May 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.