New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: unmarshal only processes first XML element #20754

Open
rsc opened this Issue Jun 22, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@rsc
Contributor

rsc commented Jun 22, 2017

If you Marshal []int{1,2} you get <int>1</int><int>2</int>, but then if you Unmarshal it back into a new slice, you get just []int{1}. Unmarshal simply stops after the first top-most XML element, because it is implemented as NewDecoder(bytes.NewReader(data)).Decode(v). When v is not a slice, this makes sense. But if v is a slice there's an argument that maybe Unmarshal should repeat the Decode calls until it reaches the end of the data. That's maybe easier said than done, and also maybe not a compatible change, but we should at least consider it. The Decoder itself is right to process only a single element, since it is processing an arbitrary input stream that might block if one reads too far. But Unmarshal, holding a []byte, has perfect knowledge of the remainder of the stream and might be able to do better. Or perhaps Unmarshal should return an error.

package main

import (
	"encoding/xml"
	"fmt"
	"log"
)

func main() {
	x1 := []int{1, 2}
	out, _ := xml.Marshal(x1)
	fmt.Println("XML:", string(out)) // <int>1</int><int>2</int>

	x2 := []int{3}
	if err := xml.Unmarshal(out, &x2); err != nil {
		log.Fatal(err)
	}
	fmt.Println(x2) // [3 1] not [3 1 2]
}

In contrast, the equivalent input given to json.Unmarshal produces an error:

package main

import (
	"encoding/json"
	"fmt"
	"log"
)

func main() {
	out := []byte("[1] [2]")
	x2 := []int{3}
	if err := json.Unmarshal(out, &x2); err != nil {
		log.Fatal(err) // invalid character '[' after top-level value
	}
	fmt.Println(x2)
}

This is more justified in the case of JSON, since Marshaling []int{1,2} does not produce [1] [2].

@SamWhited

This comment has been minimized.

Member

SamWhited commented Jun 22, 2017

I think there may be two issues here; the other is that the call to Marshal is producing invalid XML. All XML MUST have a root element; if we're marshaling using an Encoder and a root element has already been written, writing <int>1</int><int>2</int> for a call to Encode with a slice is probably correct, but marshal is not for writing streams, it is for marshaling an entire XML document given some data (I think? maybe that understanding is incorrect), so I would expect it to write a root element.

EDIT: To clarify, the "correct" way to write an array or slice type, assuming we were marshaling it as a complete document and not part of a stream, in XML would be something like this (names don't matter, I just stuck with "int" and arbitrarily picked "slice" as the root):

<slice>
  <int>1</int>
  <int>2</int>
</slice>
@rsc

This comment has been minimized.

Contributor

rsc commented Jun 23, 2017

Sorry @SamWhited, but I disagree. Marshal clearly can be used to generate fragments of XML documents as well, for example if you pass it a slice of things to marshal. The docs explicitly contemplate this:

Marshal handles an array or slice by marshaling each of the elements.

We can't change that now, and I think it would be a mistake to do so anyway, since it would take control of the wrapping away from the users.

@SamWhited

This comment has been minimized.

Member

SamWhited commented Jun 23, 2017

but I disagree

Marshal handles an array or slice by marshaling each of the elements.
We can't change that now

Fair enough; I did not take that statement in the docs to mean that it marshals each element and does not ensure validity or a root node, but as you said, there's probably no way to change that anyways even if we agreed.

On an interesting, but unrelated note, in the paper you recommend ("The Essence of XML; Siméon, Wadler") it says that "lists" in XML are actually space-separated (<ints>1 2 3</ints>). I've never seen this, but I also avoid XML schema as much as possible and haven't read much of the spec. Just thought it was interesting; maybe I've been doing lists wrong.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 Jun 13, 2018

@rsc rsc added the early-in-cycle label Nov 14, 2018

@rsc rsc modified the milestones: Go1.12, Go1.13 Nov 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment