Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: unable to set options when unmarshaling #17654

Closed
vcabbage opened this issue Oct 28, 2016 · 3 comments

Comments

Projects
None yet
5 participants
@vcabbage
Copy link
Member

commented Oct 28, 2016

What version of Go are you using (go version)?

go version go1.7.3 darwin/amd64

What did you do?

Attempted to make Go pass JSON parsing tests which include large numbers and trailing invalid JSON.

Background

json.Unmarshal and json.Decoder have different behavior beyond one parsing from []byte and the other from an io.Reader.

https://play.golang.org/p/64EDWtq5M6

invalidJSON := []byte(`{"valid": "object"}gibberish`)

var out1 interface{}
err := json.Unmarshal(invalidJSON, &out1)
fmt.Printf("Unmarshal:\n  Out: %+v \n  Error: %v\n", out1, err)

fmt.Println()

var out2 interface{}
err = json.NewDecoder(bytes.NewReader(invalidJSON)).Decode(&out2)
fmt.Printf("Decode:\n  Out: %+v \n  Error: %v\n", out2, err)
Unmarshal:
  Out: <nil> 
  Error: invalid character 'g' after top-level value

Decode:
  Out: map[valid:object] 
  Error: <nil>

Unmarshal fails, Decode succeeds. This is expected behavior as Decoder is documented to process a stream of JSON and decodes the first valid object, stopping before reading any of the invalid string.

Problem

The problem comes in when one wants the behavior of Unmarshal (interpreting the entire input as a single JSON element and returning an error if it is invalid) and to set decoding options, such as UseNumber(). The only way to set UseNumber() is via Decoder.

This is highlighted in the results of a recent comparison of JSON parsers (article/repo). Go did well, except for a few test cases. The test cases relevant to this issue involve large numbers (y_number_huge_exp.json, y_number_real_pos_overflow.json, y_number_real_neg_overflow.json)

On twitter, @rsc noted that Go would pass the large number tests if the UseNumber() option were set. Unfortunately this change also causes Go to fail a number of other tests, such as n_array_comma_after_close.json. This test expects a failure due to the comma after the array, but of course Decoder never reads the comma if Decode() is only called once.

Workarounds

It is possible to get the desired behavior by checking the Decoder for more data (https://play.golang.org/p/TKLyTWmo0M), but this is unnecessary extra work considering that the UseNumber() is setting an option on decodeState, which is also used inside of Unmarshal.

Potential Fixes

New Function

The simplest solution would be to create a UnmarshalUseNumber function, similar to what was suggested in #7067. This is not ideal for the reason stated in #7067 (comment):

We can't provide a function for every combination of options. I think using Decoder here is fine.

To be fair, this isn't completely without precedent, as MarshalIndent already exists.

New Decoder Type

Create something like NewUnmarshaler, which would return a struct that had UseNumber() and Unmarshal([]byte, interface{}) error methods, allowing options to be set, but with the behavior of json.Unmarshal. However, NewUnmarshaler is a poor name because one might reasonably expect it to return an Unmarshaler, which is already a defined interface.

In my opinion, this is the best general approach, but I have been unable to come up with good name to suggest for the function and returned struct type.

@groob

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2016

What about a variation on UnmarshalUseNumber.

We can't provide a function for every combination of options. I think using Decoder here is fine.

A function with the signature
func UnmarshalWithOptions([]byte, interface{}, opts ...UnmarshalOption) error Then you could pass UseNumber() and allow the API to be extensible in the future.

@quentinmit quentinmit added this to the Go1.9 milestone Nov 1, 2016

@quentinmit

This comment has been minimized.

Copy link
Contributor

commented Nov 1, 2016

You don't need the complicated use of Buffered and ReadAll. You can just try calling Decode again, and if you get something other than io.EOF, there's a problem with the input.

See https://play.golang.org/p/WHRgvgrsG4

I'm inclined to say this workaround means it's not worth adding more API surface.

/cc @rsc

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 1, 2016

I agree with @quentinmit. That's not just a workaround, that is the API.

@bradfitz bradfitz closed this Nov 1, 2016

@golang golang locked and limited conversation to collaborators Nov 1, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.