Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: can't round-trip a slice like in JSON #20735

Closed
audathuynh opened this Issue Jun 20, 2017 · 10 comments

Comments

Projects
None yet
5 participants
@audathuynh
Copy link

audathuynh commented Jun 20, 2017

What version of Go are you using (go version)?

go1.8.3

What operating system and processor architecture are you using (go env)?

darwin/amd64

What did you do?

Tried to unmarshal a slice from a string which is obtained by marshalling a slice.

https://play.golang.org/p/Yn57p2--fy

What did you expect to see?

  1. The slice needs to be restored in the variable b2.
  2. The method "Unmashal" in the package encoding/xml should work the same as the method "Unmashal" in the package encoding/json.

What did you see instead?

nil

@mvdan

This comment has been minimized.

Copy link
Member

mvdan commented Jun 20, 2017

A smaller repro, with just a couple of ints: https://play.golang.org/p/B8P6xVJkGk

Funnily enough, if you unmarshal into []int instead of []interface{}, JSON still gets [1 2] and XML gets [1] instead of [<nil>].

Edit: same stuff on go version devel +c52aca1c76 Fri Jun 16 05:45:48 2017 +0000 linux/amd64.

@mvdan mvdan changed the title Problem when unmarshalling a slice using encoding/xml encoding/xml: can't round-trip a slice like in JSON Jun 20, 2017

@SamWhited

This comment has been minimized.

Copy link
Member

SamWhited commented Jun 21, 2017

Funnily enough, if you unmarshal into []int instead of []interface{}, JSON still gets [1 2] and XML gets [1] instead of [].

I can't reproduce this; in the example you posted it still appears to be [<nil>].

The slice needs to be restored in the variable b2.

While it may be unfortunate that Marshal and Unmarshal are not inverse functions, this is working as expected per the rules from the package docs and it's too late to change it.

The method "Unmashal" in the package encoding/xml should work the same as the method "Unmashal" in the package encoding/json.

I don't think this statement actually is very meaningful; XML and JSON are two very different serialization formats that are useful for different situations. They are not interchangeable: XML does not have a "slice" or "array" type, per say, JSON does, so they cannot do the same thing here.

/cc @rsc since he seems to know the XML package the best and I'm not really comfortable closing this as a won't fix without go team input. Temporarily marked with a milestone and "needs decision", although I suspect the decision is that this is working as expected.

@SamWhited SamWhited added this to the Unplanned milestone Jun 21, 2017

@audathuynh

This comment has been minimized.

Copy link
Author

audathuynh commented Jun 21, 2017

Sorry, I don't know what package docs you are talking about.
They could be very important.

However, some questions are in my mind right now.

Who wrote the package docs? Can we change them if they are wrong?

Regarding to the package encoding/xml, if it is not a bug of the library, I think we will need to re-think why we need to build the library.

Why do we need to marshal something and then we cannot unmarshal it?

Thank you very much.

@SamWhited

This comment has been minimized.

Copy link
Member

SamWhited commented Jun 21, 2017

Sorry, I don't know what package docs you are talking about.

The docs on the Marshal method: https://godoc.org/encoding/xml#Marshal

Specifically:

The name for the XML elements is taken from, in order of preference:

  • the tag on the XMLName field, if the data is a struct
  • the name of the marshaled type

Can we change them if they are wrong?

Why do we need to marshal something and then we cannot unmarshal it?

I agree with you that it's not ideal, I just don't think it can be changed at this point due to the Go 1 compatibility promise. Maybe someone on the Go team who's opinion actually matters here will correct me though :)

@mvdan

This comment has been minimized.

Copy link
Member

mvdan commented Jun 21, 2017

I can't reproduce this; in the example you posted it still appears to be [<nil>].

https://play.golang.org/p/N5O1MPAMtr

@audathuynh

This comment has been minimized.

Copy link
Author

audathuynh commented Jun 21, 2017

XML and JSON are two very different serialization formats that are useful for different situations. They are not interchangeable: XML does not have a "slice" or "array" type, per say, JSON does,

Yes. I completely understand that.

so they cannot do the same thing here.

Do you really think that XML does not have a "slice" or "array" type and therefore we cannot implement the method or fix the bug?

I don't think this statement actually is very meaningful;

I think we need to separate the logic of the problem and the implementation of the problem.
I talk about the logic of the problem while you just focus on the implementation.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Jun 21, 2017

JSON encodes data structures. As such, it is fairly straightforward to turn JSON back into data structures when unmarshaling into something with no type information, like interface{}. XML does not encode data structures - it encodes a document stream, which sometimes is written in a way that can be viewed as a data structure. But if you unmarshal XML into an interface{}, with no type information, there's no real way to figure out the types that should be used. I'm sorry about this, but it is what it is.

Note that even JSON is not perfect here: the int turns into a float64, and the time.Time turns into a string. This is for the same reason as XML: there's missing type information in the encoded stream that cannot be reconstructed. It's just that XML carries even less type information than JSON.

Note also the comment in the BUGS section at the bottom of the xml docs:

Mapping between XML elements and data structures is inherently flawed: an XML element is an order-dependent collection of anonymous values, while a data structure is an order-independent collection of named values. See package json for a textual representation more suitable to data structures.

This is working as best it can, both for XML and JSON (and it can work better in JSON than in XML). The typical way to address this is to unmarshal into an actual data structure instead of []interface{}, and then the missing type information is provided by the data structure type itself.

@rsc rsc closed this Jun 21, 2017

@audathuynh

This comment has been minimized.

Copy link
Author

audathuynh commented Jun 21, 2017

Indeed, I don't feel satisfied with the explanation of @rsc .

I will be more happy if someone just says that it is a bug but we cannot fix, or frankly says that the current design of the library is bad and we cannot fix because of the promise of go1compat.
All of us want Go better and better, don't we?

XML does not encode data structures - it encodes a document stream, which sometimes is written in a way that can be viewed as a data structure.

XML is just a language to describe what we want.
Why don't we use the semantics of tags in an xml document to describe the data structure that we want?

Could you please explain the reason why we have the result [1] in the variable b2 in the example given by @mvdan ?

https://play.golang.org/p/N5O1MPAMtr

You may say we cannot unmarshal the xml string correctly because we do not have the data type.

Why don't we marshal the slice into the below xml string and then we have the datatype of the data structure?
We build the library and we can control what we marshal so that we can unmarshal, don't we?

<int-slice>
  <int>1</int>
  <int>2</int>
</int-slice>

Thank you very much.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Jun 22, 2017

These packages are about processing arbitrary JSON and arbitrary XML, not just a specific hypothetical XML dialect that includes type information in its element tag names. Handling arbitrary XML makes the package useful for documents and XMPP streams and any number of other XML-based protocols. If you have the luxury of controlling the encoding to the point of inventing hypothetical containers like <int-slice>, then you also have the luxury of avoiding XML, and I would strongly encourage you to do that. The encoding/json and encoding/gob packages are both far better for data structures than encoding/xml, because both JSON and Gob are designed for data structures, while XML is not.

If you'd like to read more about XML's general inapplicability to data structures, I highly recommend Siméon and Wadler, “The Essence of XML,” POPL 2003.

As for the int slice, I do see the asymmetry between Marshal and Unmarshal: given a slice, Marshal will emit a sequence of XML elements, but Unmarshal always processes only a single XML element.
I filed #20754 for that. Thanks.

@audathuynh

This comment has been minimized.

Copy link
Author

audathuynh commented Jun 22, 2017

Thank you very much for the link.
Indeed, I am using encoding/gob to marshal and unmarshal my data in my app :-)

As for the int slice, I do see the asymmetry between Marshal and Unmarshal: given a slice, Marshal will emit a sequence of XML elements, but Unmarshal always processes only a single XML element.

That is because you accept the tag "int" in XML document but you don't accept the tag "int-slice" or "slice" when you describe a slice value.

Please tell me if I am wrong. Below is what I think.

XML is a markup language. In an XML document, tags are actually only labels. The semantics of tags are defined of human and we use them to describe what we want.

Currently, we use the label "int" to describe the datatype integer in the example.
Actually, we can use any label, e.g "int", "integer", "abc" or whatever if we want.
The important thing is the semantics of the labels.

As I guess, you use and accept the label "int" in the xml string because it is the label people use to name the datatype of an integer value in Go language.
If that is correct, my next question is "why do you accept the label "int" as a tag name for the concept of the datatype "int" but you don't accept the label "slice" for the concept of the datatype "slice" in Go?".

To me, all documents including package docs and even the paper about XML are written by human. We read, we learn, we judge and we use. If they are not correct in someway, we will fix them.
That is also the reason we build Go, isn't it?

Anyway, I am very happy with your answer.

Thank you very much.

@golang golang locked and limited conversation to collaborators Jun 22, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.