Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: confusing errors when unmarshaling custom types #28189

Open
adnsv opened this issue Oct 13, 2018 · 17 comments
Open

encoding/json: confusing errors when unmarshaling custom types #28189

adnsv opened this issue Oct 13, 2018 · 17 comments
Milestone

Comments

@adnsv
Copy link

@adnsv adnsv commented Oct 13, 2018

In go1.11

There is a minor issue with custom type unmarshaling using "encoding/json". As far as I understand the documentation (and also looking through the internal code), the requirements are:

  • when processing map keys, the custom type needs to support UnmarshalText
  • when processing values, the custom type needs to support UnmarshalJSON

When both of these interfaces are supported, everything is fine. Otherwise, the produced error messages are a bit cryptic and in some cases really confusing. I believe, adding a test for Implements(jsonUnmarshallerType) inside encoding/json/decode.go: func (d *decodeState) object(v reflect.Value) will make things more consistent.

Here is the code (try commenting out UnmarshalText and/or/xor UnmarshalJSON methods):

package main

import (
	"encoding/json"
	"fmt"
)

type Enum int

const (
	Enum1 = Enum(iota + 1)
	Enum2
)

func (enum Enum) String() string {
	switch enum {
	case Enum1: return "Enum1"
	case Enum2: return "Enum2"
	default: return "<INVALID ENUM>"
	}
}

func (enum *Enum) unmarshal(b []byte) error {
	var s string
	err := json.Unmarshal(b, &s)
	if err != nil { return err }
	switch s {
	case "ONE": *enum = Enum1
	case "TWO": *enum = Enum2
	default: return fmt.Errorf("Invalid Enum value '%s'", s)
	}
	return nil
}

func (enum *Enum) UnmarshalText(b []byte) error {
	return enum.unmarshal(b)
}

func (enum *Enum) UnmarshalJSON(b []byte) error {
	return enum.unmarshal(b)
}

func main() {
	data := []byte(`{"ONE":"ONE", "TWO":"TWO"}`)

	var ss map[string]string
	err := json.Unmarshal(data, &ss)
	if err != nil { fmt.Println("ss failure:", err) } else { fmt.Println("ss success:", ss) }

	var se map[string]Enum
	err = json.Unmarshal(data, &se)
	if err != nil { fmt.Println("se failure:", err) } else { fmt.Println("se success:", se) }

	var es map[Enum]string
	err = json.Unmarshal(data, &es)
	if err != nil { fmt.Println("es failure:", err) } else { fmt.Println("es success:", es) }

	var ee map[Enum]Enum
	err = json.Unmarshal(data, &ee)
	if err != nil { fmt.Println("ee failure:", err) } else { fmt.Println("ee success:", ee) }

	// Output when both UnmarshalText and UnmarshalJSON are defined:
	// ss success: map[ONE:ONE TWO:TWO]
	// se success: map[ONE:Enum1 TWO:Enum2]
	// es success: map[Enum1:ONE Enum2:TWO]
	// ee success: map[Enum1:Enum1 Enum2:Enum2]

	// Output when UnmarshalJSON is commented out:
	// ss success: map[ONE:ONE TWO:TWO]
	// se failure: invalid character 'T' looking for beginning of value
	// es failure: invalid character 'O' looking for beginning of value
	// ee failure: invalid character 'T' looking for beginning of value

	// Output when UnmarshalText is commented out:
	// ss success: map[ONE:ONE TWO:TWO]
	// se success: map[ONE:Enum1 TWO:Enum2]
	// es failure: json: cannot unmarshal number ONE into Go value of type main.Enum
	// ee failure: json: cannot unmarshal number ONE into Go value of type main.Enum

	// In more complex cases, having UnmarshalText undefined also produced this
	// error message: JSON decoder out of sync - data changing underfoot?
}
@FiloSottile FiloSottile changed the title Asymmetrical json requirements for unmarshaling custom types encoding/json: confusing errors when unmarshaling custom types Oct 15, 2018
@FiloSottile FiloSottile added this to the Go1.12 milestone Oct 15, 2018
@FiloSottile
Copy link
Contributor

@FiloSottile FiloSottile commented Oct 15, 2018

Sounds like the error messages for all those cases should be improved. Also, if you could provide a reproducer for "JSON decoder out of sync - data changing underfoot?", I believe that should never happen.

/cc @rsc @dsnet @bradfitz @mvdan per owners.

@adnsv
Copy link
Author

@adnsv adnsv commented Oct 15, 2018

Here is the example that produces the ...underfoot message:

package main

import (
	"encoding/json"
	"fmt"
)

type Lang int

const (
	LangC = Lang(iota)
	LangCPP
)

type Project struct {
	OptionMap map[Lang]*Option `json:"options"`
	Typedefs  []string         `json:"typedefs"`
}

type Option struct {
	Active bool `json:"active"`
}

func main() {
	var pr *Project
	err := json.Unmarshal([]byte(data), &pr)
	if err != nil {
		fmt.Println(err)
		// Output: JSON decoder out of sync - data changing underfoot?
		// Uncommenting UnmarshalText below gets rid of this problem
	} else {
		fmt.Println("ok")
	}
}

func (lang *Lang) unmarshal(b []byte) error {
	var s string
	err := json.Unmarshal(b, &s)
	if err != nil {
		return err
	}
	switch s {
	case "C":
		*lang = LangC
	case "C++":
		*lang = LangCPP
	default:
		return fmt.Errorf("Unsupported Lang '%s'", s)
	}
	return nil
}

func (lang *Lang) UnmarshalJSON(b []byte) error {
	return lang.unmarshal(b)
}

/*
func (lang *Lang) UnmarshalText(b []byte) error {
	return lang.unmarshal(b)
}
/**/

const data = `
{
    "options": {
        "C": { "active": true },
        "C++": { "active": false }
    },
    "typedefs": [ "hello" ]
}`

@mvdan
Copy link
Member

@mvdan mvdan commented Oct 16, 2018

Thanks for raising this issue. The "underfoot" error is definitely wrong, so we can start with that one.

@mvdan
Copy link
Member

@mvdan mvdan commented Oct 16, 2018

Manually minimized reproducer below - will continue digging.

package main

import "encoding/json"

func main() {
        json.Unmarshal([]byte(data), new(T))
}

type T struct {
        F map[int]int
}

const data = `{
        "F": {
                "a": 2,
                "b": 3
        }
}`

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 16, 2018

Change https://golang.org/cl/142518 mentions this issue: encoding/json: fix "data changed underfoot?" panic

gopherbot pushed a commit that referenced this issue Oct 16, 2018
Given a program as follows:

	data := []byte(`{"F": {
		"a": 2,
		"3": 4
	}}`)
	json.Unmarshal(data, &map[string]map[int]int{})

The JSON package should error, as "a" is not a valid integer. However,
we'd encounter a panic:

	panic: JSON decoder out of sync - data changing underfoot?

The reason was that decodeState.object would return a nil error on
encountering the invalid map key string, while saving the key type error
for later. This broke if we were inside another object, as we would
abruptly end parsing the nested object, leaving the decoder in an
unexpected state.

To fix this, simply avoid storing the map element and continue decoding
the object, to leave the decoder state exactly as if we hadn't seen an
invalid key type.

This affected both signed and unsigned integer keys, so fix both and add
two test cases.

Updates #28189.

Change-Id: I8a6204cc3ff9fb04ed769df7a20a824c8b94faff
Reviewed-on: https://go-review.googlesource.com/c/142518
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@mvdan
Copy link
Member

@mvdan mvdan commented Oct 18, 2018

The underfoot panic is now fixed - the example above prints:

json: cannot unmarshal number C into Go struct field Option.active of type main.Lang

I'll try to look at the other cases before the cycle is over.

@mvdan mvdan self-assigned this Oct 18, 2018
@mvdan mvdan removed the help wanted label Nov 6, 2018
@mvdan
Copy link
Member

@mvdan mvdan commented Nov 16, 2018

Output when UnmarshalJSON is commented out

I think this block is just the user's bug. If you add fmt.Println(string(b)) to the beginning of both UnmarshalText and UnmarshalJSON, you'll find that they get strings like ONE and "ONE", respectively.

When you comment out UnmarshalJSON, the JSON package falls back to using UnmarshalText. Which then essentially does:

var s string
err := json.Unmarshal([]byte("ONE"), &s)

And this, as you'd expect, errors with invalid character 'O' looking for beginning of value. The bug here is that you should generally not use json.Unmarshal within a TextUnmarshaler implementation, because it will receive the string's contents, not the string object's input bytes.

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 16, 2018

To clarify my comment above: I don't think we can improve those errors, because the nested call to json.Unmarshal has no context about where it's being used. The user code is using it incorrectly, so it's not surprising that the returned error is confusing.

As for using Enum as a map key when it only implements UnmarshalJSON, it makes sense because of this piece of documentation:

The map's key type must either be a string, an integer, or implement encoding.TextUnmarshaler.

However, the error could be clearer. Instead of cannot unmarshal number ONE into Go value of type main.Enum, we could have an error like main.Enum is not a valid map key type. Will look into a CL for that, and close this issue.

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 16, 2018

Smaller repro for the last case:

type T int

func (*T) UnmarshalJSON(b []byte) error {
        return nil
}

func main() {
        data := []byte(`{"foo":"bar"}`)
        var m map[T]string
        fmt.Println(json.Unmarshal(data, &m))
}

Not having carefully read the encoding/json docs, it's understandable if a user sees the error json: cannot unmarshal number foo into Go value of type main.T and thinks "that doesn't make any sense, I implement UnmarshalJSON".

@mvdan mvdan removed this from the Go1.12 milestone Nov 28, 2018
@mvdan mvdan added this to the Go1.13 milestone Nov 28, 2018
@mvdan
Copy link
Member

@mvdan mvdan commented Nov 28, 2018

It's not clear that the last remaining case needs a fix, so I'm moving this to 1.13 and marking as NeedsDecision.

We could modify UnmarshalTypeError to have a hint that a json.Unmarshaler type was found but was ignored because it was a map key, but that seems like a bit of a hacky solution.

Perhaps @rsc has some thoughts.

@mvdan mvdan removed their assignment Dec 2, 2018
@andybons andybons removed this from the Go1.13 milestone Jul 8, 2019
@andybons andybons added this to the Go1.14 milestone Jul 8, 2019
@rsc rsc removed this from the Go1.14 milestone Oct 9, 2019
@rsc rsc added this to the Backlog milestone Oct 9, 2019
@snargleplax
Copy link

@snargleplax snargleplax commented Dec 30, 2020

@mvdan said:

As for using Enum as a map key when it only implements UnmarshalJSON, it makes sense because of this piece of documentation:

The map's key type must either be a string, an integer, or implement encoding.TextUnmarshaler.

However, that doesn't quite match what I'm seeing in the current docs, which have (emphasis added):

The map's key type must either be any string type, an integer, implement json.Unmarshaler, or implement encoding.TextUnmarshaler.

Note the addition of "implement json.Unmarshaler" into the list. Note as well that this forms a mismatched pair with a similar note for json.Marshal, which does match what @mvdan had quoted:

The map's key type must either be a string, an integer type, or implement encoding.TextMarshaler.

I am working with a type that implements json.Unmarshaler but not encoding.TextUnmarshaler (equivalent to this case), and based on the docs I'm expecting json.Unmarshal to work. However, I'm getting the json.UnmarshalTypeError discussed above (json: cannot unmarshal number quux into Go value of type main.foo). Since the docs appear as they do, this seems unexpected and incorrect, regardless of whether the error is clear.

Do you concur? Should I open a separate issue for that?

@snargleplax
Copy link

@snargleplax snargleplax commented Dec 31, 2020

Also worth noting, when I add an UnmarshalText method to the type to circumvent the error, it's UnmarshalJSON that actually gets called.

@mvdan
Copy link
Member

@mvdan mvdan commented Dec 31, 2020

Thanks for bringing this up again, @snargleplax. The docs were amended in https://go-review.googlesource.com/c/go/+/188821 to better reflect the existing behavior, without changing any of the code. I reviewed and merged that CL without thinking of this issue.

The CL also mentions the likely bug in the logic:

When both UnmarshalText and UnmarshalJSON are defined, the latter is called. But when only UnmarshalJSON is defined, it's not called.

So I think this definitely needs a fix; either clarifying in the docs, or a code fix to call UnmarshalJSON even when UnmarshalText is not defined.

cc @eliben @oncilla from #33298

@mvdan
Copy link
Member

@mvdan mvdan commented Dec 31, 2020

As for which fix to do; I'm already discarding never calling UnmarshalJSON on keys, because that would likely break existing programs, and because the docs have been saying the opposite for a year.

I lean towards fixing the code to reflect the current docs. That is, to call UnmarshalJSON on map keys even UnmarshalText is not defined. I don't think it would break a significant amount of existing code, and it would make the logic more consistent and in line with the docs.

@snargleplax
Copy link

@snargleplax snargleplax commented Dec 31, 2020

I am also in favor of fixing the code to match the docs. My absolute druthers would be to also change json.Marshal to behave consistently with 'json.Unmarshal', having it handle map key types that implement json.Marshaler as well as (and in preference to) those implementing encoding.TextMarshaler. That would provide a cleaner round-trip story. But perhaps that would be changing too much in terms of existing behavior? Or is it viable to consider? There doesn't seem to be a good reason for these to disagree, apart from history.

@mvdan
Copy link
Member

@mvdan mvdan commented Dec 31, 2020

I am all for more consistency in the JSON package, but historically I've repeatedly failed at doing that - too much existing code would break. I think we should start fixing this one, and then perhaps try making encoding more consistent with decoding too.

@snargleplax
Copy link

@snargleplax snargleplax commented Mar 5, 2021

Thinking about this more (while helping another team member who tripped over it), I realized the likely reason for the inconsistency between marshal and unmarshal behavior. Namely, not all valid JSON values are valid map keys. So, you can't in general assume it's safe to take something that implements MarshalJSON and use that to produce a map key when marshaling to JSON. Going the other direction, though, you don't have the same problem. Arguably you could restrict it anyway, but you could also argue there's no reason to prevent it, and that's the path that was taken.

So it seems like there's a good reason for the core design decision there, but it's not the most obvious or ergonomic. I don't have a better idea at the moment besides editing the docs to explicitly call out the asymmetry (and perhaps hint at this rationale, though defending design decisions is not really the purpose of standard library GoDoc, so it's a fine line to walk). At least mentioning it, though, seems like it would give folks more of a chance to realize the potential issue sooner. It's natural enough to walk into this context expecting symmetry as a general principle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants