Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: the Decoder.Decode API lends itself to misuse #36225

Open
dsnet opened this issue Dec 19, 2019 · 21 comments
Open

encoding/json: the Decoder.Decode API lends itself to misuse #36225

dsnet opened this issue Dec 19, 2019 · 21 comments
Labels

Comments

@dsnet
Copy link
Member

@dsnet dsnet commented Dec 19, 2019

I'm observing the existence of several production servers that are buggy because the json.Decoder.Decode API lends itself to misuse.

Consider the following:

r := strings.NewReader("{} bad data")

var m map[string]interface{}
d := json.NewDecoder(r)
if err := d.Decode(&m); err != nil {
	panic(err) // not triggered
}

json.NewDecoder is often used because the user has an io.Reader on hand or wants to configure some of the options on json.Decoder. However, the common case is that the user only wants to decode a single JSON value. As it stands the API does not make the common case easy since Decode is designed with the assumption that the user will continue to decode more JSON values, which is rarely the case.

The code above executes just fine without reporting an error and silently allows the decoder to silently accept bad input without reporting any problems.

@dsnet dsnet added the Go2 label Dec 19, 2019
@OneOfOne
Copy link
Contributor

@OneOfOne OneOfOne commented Dec 19, 2019

I don't think that's a bug, or misuse, I actually depend on it in a project where there's a json header then there's other data.

This change would break a lot of working code.

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 20, 2019

The only example for the Decoder type calls Decode in a loop. Perhaps there should be a lint or vet check that either Decode is read until a non-nil error or its Buffered method is called?

The call to the Buffered method seems like a reliable indication of the “header before other data” case.

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 20, 2019

A vet check in particular, if it is feasible, seems like it could address this problem without an API change.

@dsnet
Copy link
Member Author

@dsnet dsnet commented Dec 20, 2019

@OneOfOne, I'm not proposing that we change the json.Decoder.Decode behavior. In and of itself, the method's current behavior is entirely reasonable. However, there's fairly compelling evidence that there is a misuse problem, though. A sampling of json.Decoder.Decode usages at Google shows that a large majority are using it in a way that actually expects no subsequent JSON values.

From my observation, the source of misuse is due to:

  1. many users having an io.Reader on hand (especially from a http.Request.Body), and the lack of API in the json package to unmarshal a single JSON value from a io.Reader. As a result, they turn to json.Decoder, but don't realize that it's intended for reading multiple JSON values back-to-back.
  2. some users wanting to configure the optional arguments (e.g., DisallowUnknownFields or UseNumber) and unfortunately those options are only hung off json.Decoder.

The first issue could be alleviated by adding:

func UnmarshalReader(r io.Reader, v interface{}) error

where UnmarshalReader consumes the entire input and guarantees that it unmarshals exactly one JSON value.

The second issue could be alleviated by having an options pattern that doesn't require the creation of a streaming json.Decoder. In the v2 protobuf API, we use an options struct with methods hanging off of it.

@bcmills, a vet check is interesting, but I'm not sure calling Decoder.Buffer is the right approach. Rather, I think you need to get the next token and check that it returns io.EOF:

d := json.NewDecoder(r)
if err := d.Decode(&v); err != nil {
	panic(err) // not triggered
}
if _, err := d.Token(); err != io.EOF {
	panic("some unused trailing data")
}
@dsnet dsnet changed the title encoding/json: the Decoder .Decode API lends itself to misuse encoding/json: the Decoder.Decode API lends itself to misuse Dec 20, 2019
@bcmills
Copy link
Member

@bcmills bcmills commented Dec 20, 2019

I'm not sure calling Decoder.Buffer is the right approach. Rather, I think you need to get the next token and check that it returns io.EOF

That would work too. So that gives three patterns that should suppress the warning:

  1. Call Decode until it does not return nil.
  2. Call Token until it does not return nil.
  3. Call Buffered.

(1) and (2) suppress the warning if the user has read enough of the input to detect either EOF or an earlier syntax error. (3) suppresses the warning if a non-JSON blob follows the JSON value.

@josharian
Copy link
Contributor

@josharian josharian commented Dec 20, 2019

The first issue could be alleviated by adding UnmarshalReader

Or a Decoder option to mark it as being single-object only.

That way fixing existing instances of this problem (of which there are many, including all over my own code) involves just adding a line, instead of refactoring.

@cespare
Copy link
Contributor

@cespare cespare commented Dec 20, 2019

Or a Decoder option to mark it as being single-object only.

That way fixing existing instances of this problem (of which there are many, including all over my own code) involves just adding a line, instead of refactoring.

I like how concise using a decoder often is:

if err := json.NewDecoder(r).Decode(&t); err != nil {

It would be nice if the fix didn't require breaking up the line. I like how @dsnet's UnmarshalReader suggestion offers something similarly concise:

if err := json.UnmarshalReader(r, &t); err != nil {

We could also use an alternative Decode method:

if err := json.NewDecoder(r).DecodeOne(&t); err != nil {
@mvdan
Copy link
Member

@mvdan mvdan commented Dec 20, 2019

  1. As a result, they turn to json.Decoder, but don't realize that it's intended for reading multiple JSON values back-to-back.

To give further evidence that this is a common issue, I'm one of the encoding/json owners and I've now just realised I've been making this mistake for years.

I'm personally in favour of UnmarshalReader. If anyone wants to be in control of the decoder or any options, they can copy the contents of UnmarshalReader and adapt it as needed.

I think a vet check would also be useful, because many existing users might not realise that this common json decoding pattern is buggy. Coupled with UnmarshalReader, the users would just need to slightly modify one line. A vet check that required the user to refactor the code and add a few lines of non-trivial code seems unfortunate to me, so I generally agree with @cespare.

@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Dec 20, 2019

Maybe a method like Done() error that reads the rest of the input and errors if there's something other than whitespace? That could also be used for reading multiple documents from the stream and quitting once you get to one with a certain value.

@networkimprov
Copy link

@networkimprov networkimprov commented Dec 20, 2019

Shouldn't this be a concrete proposal (with possible variations)? The issue description doesn't seem actionable.

@mvdan
Copy link
Member

@mvdan mvdan commented Dec 21, 2019

@networkimprov usually, a proposal brings forward a specific solution to be implemented. It seems like the main point of this issue was to bring up the problem, so the proposal could come later if needed.

@networkimprov
Copy link

@networkimprov networkimprov commented Dec 21, 2019

Time and again, I've seen issues closed and further discussion directed to golang-nuts/dev because the item isn't actionable.

@dmitshur dmitshur added this to the Backlog milestone Dec 21, 2019
@dmitshur dmitshur removed this from the Backlog milestone Dec 21, 2019
@leafrhythms
Copy link

@leafrhythms leafrhythms commented Dec 21, 2019

I agree with a new method. It re-use options for json.Decoder, and is very easy to adopt. But instead of DecodeOne, I would suggest it DecodeWhole.

@TennyZhuang
Copy link
Contributor

@TennyZhuang TennyZhuang commented Dec 22, 2019

I suggest the name DecodeFull, it is more consistent with ReadFull.

@beono
Copy link

@beono beono commented Dec 27, 2019

I agree that DecodeFull makes it more clear, that it decodes the the entire stream.
Whereas DecodeOne sounds like it does the same thing as Decode – decodes only one json value per call.

@wmark
Copy link

@wmark wmark commented Dec 27, 2019

Remove this entirely, don't let close-to–duplicate functionality lying around. Especially one that invites misuse.

@zikaeroh
Copy link
Contributor

@zikaeroh zikaeroh commented Dec 27, 2019

I don't want to bikeshed names too hard, but my suggestion would be DecodeSingle. I find this a bit better than "full" where my first thought is that it's trying to decode the "full reader" to completion.

(Also, this is a function name in C#'s LINQ, where it signifies that an IEnumerable contains a single thing, and throws if not, which feels similar.)

@dsnet
Copy link
Member Author

@dsnet dsnet commented Dec 31, 2019

Removing Go2 label and adding NeedsDecision instead. When I first filed this issue, it was more so to point out an API wart that we should avoid repeating if there were ever a hypothetical v2 of encoding/json. However, it seems that this discussion has gone in the direction where it might make sense to do something about this today.

Let's avoid bikeshedding on names; it brings unnecessary noise to the discussion. Bikeshedding can occur in future CLs that implement this assuming we should do anything about this.

To summarize the discussion so far, there seems to be 3 (non-mutually exclusive) approaches:

  1. Add a top-level function that takes in an io.Reader as an input and consume it entirely. This approach is provides a minor convenience benefit over approach 2.
  2. Add a method to Decoder that consume the entire io.Reader. This approach is most flexible as it permits options to be specified and provides an obvious fix to existing misuses.
  3. Add a vet check as suggested by @bcmills in #36225 (comment)

An issue with approaches 1 or 2 is that it assumes that io.EOF will always be hit by the io.Reader. For the case of an http.Request.Body, I suspect that this may possibly be an issue if the client keeps the connection open, in which case, an io.EOF will never be observed and the server will deadlock trying to read all input from the client, but the client won't end the close the request since it's waiting for a response from the server.

Personally, I like approach 2 and 3.

@dsnet dsnet added NeedsDecision and removed Go2 labels Dec 31, 2019
@networkimprov
Copy link

@networkimprov networkimprov commented Dec 31, 2019

I think the io.EOF issue is a showstopper, unless a new function/method takes a size argument, which could be zero or -1 for read-to-eof.

Maybe add the list of solutions to the issue text, and retitle it "proposal: ..." ? I think that places it on the proposal review committee's agenda...

@josharian
Copy link
Contributor

@josharian josharian commented Jan 1, 2020

An issue with approaches 1 or 2 is that it assumes that io.EOF will always be hit by the io.Reader.

If the code doing the decoding knows the length of the data, then they could use an io.LimitedReader to work around this.

I suspect, though, that the more common case is that the code doesn't know, and doesn't care: It just wants to decode until the first json object is done. If it so happens that you can detect that there's extra (non-whitespace?) data, because it came along with a read you issued anyway, then sure, return an error. But I think most users would be perfectly happy to be a bit sloppy here and have only best-effort "extra data" error detection, as evidenced by the multitude of us happily, obliviously using json.Decoder for years and years.

So I guess I would advocate "fixing" this problem by defining it away.

FWIW, I am also a fan of approaches 2 and 3.

@bcmills
Copy link
Member

@bcmills bcmills commented Feb 19, 2020

@dsnet, I notice that this issue is currently without a milestone. Given #36225 (comment), do you want to turn it into a proposal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.