Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: parser ignores the case of member names #14750

Open
cyberphone opened this issue Mar 10, 2016 · 38 comments
Open

encoding/json: parser ignores the case of member names #14750

cyberphone opened this issue Mar 10, 2016 · 38 comments

Comments

@cyberphone
Copy link

@cyberphone cyberphone commented Mar 10, 2016

  1. What version of Go are you using? 5.3
  2. What operating system and processor architecture are you using? amd64,windows
  3. What did you do?
    Read this: https://mailarchive.ietf.org/arch/msg/json/Ju-bwuRv-bq9IuOGzwqlV3aU9XE
  4. What did you expect to see?
    ...
  5. What did you see instead?
    ...
@bradfitz bradfitz changed the title JSON parser ignores the case of member names encoding/json: parser ignores the case of member names Mar 10, 2016
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Mar 10, 2016

Playgroud link: http://play.golang.org/p/9j0ome9HqK

@rsc, @adg, thoughts? This surprised me. I thought we only did the case insensitive thing when there was no struct tag.

But from the ietf link above:

is looks quite diabolical for security as it is trivial to create valid JSON values that will be interpreted differently by different implementations.

Related to "differently by different implementations", we permit JSON keys twice: http://play.golang.org/p/lPgEj1T6Zk (two "alg"). That probably differs between implementations in either its output or whether it's even successful.

@bradfitz bradfitz added the Security label Mar 10, 2016
@rsc
Copy link
Contributor

@rsc rsc commented Mar 10, 2016

This has been the behavior at least as far back as Go 1.2 (I can't run
earlier on my Mac).
The docs also seem to state quite clearly that this is what happens:

To unmarshal JSON into a struct, Unmarshal matches incoming object keys to
the keys used by Marshal (either the struct field name or its tag),
preferring an exact match but also accepting a case-insensitive match.
Unmarshal will only set exported fields of the struct.

I understand there are security implications if JSON is used in security
contexts, and I was a little surprised too, but the docs are very clear. I
doubt that encoding/json's behavior should be dictated by security
concerns. FWIW, I don't believe we should go out of our way to reject
repeated fields either, especially not now. I might go so far as to argue
that using JSON in security standards is a mistake anyway, but I won't do
that here.

In any event it's too late to change the defaults. If we want to support
the security use case, maybe we could have a UseStrictNames method on
Decoder (like UseNumber).

@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Mar 10, 2016
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Mar 10, 2016

UseStrictNames SGTM.

@cyberphone
Copy link
Author

@cyberphone cyberphone commented Mar 10, 2016

I would consider addressing a bunch of related issue as an option:
#14749
#14135

@rsc
Copy link
Contributor

@rsc rsc commented Mar 10, 2016

@manger
Copy link

@manger manger commented Mar 13, 2016

UseStrictNames does look like a decent way to add case-sensitive decoding support in a backward compatible way.

However, while that would make case-sensitive decodes possible, it wouldn't help make this safer mode common or the default.

How about also defining a "strict" field tag option (c.f. "omitempty")? That should allow types to be safely used via newDecoder, or Unmarshall, or when the decoding is within a library. You can define the safety in the type, without finding every place it might be decoded.

@cespare
Copy link
Contributor

@cespare cespare commented Apr 13, 2016

Shall I send a UseStrictNames CL? Should that apply strict names across the board, whether or not the user supplied a struct tag?

type Foo struct {
        Bar string
        Baz string `json:"baz"`
}

So that would only match exactly "Bar" and "baz".

@rsc
Copy link
Contributor

@rsc rsc commented Apr 15, 2016

@cespare If it's simple, then yes sure. But after a thought on #15314 I wonder if maybe the UseStrictNames method is the wrong approach and instead it should be a tag attribute on the field. Then it can be enforced by the author of the struct instead of the author of the unmarshal call. Specifically, something like json:"Foo,exactname".

@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016
@jessfraz
Copy link
Contributor

@jessfraz jessfraz commented Jun 22, 2016

Would it get super repetitive if the person had to set it on every tag name?

@rsc rsc modified the milestones: Go1.9Early, Go1.8 Oct 26, 2016
@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.9Early May 3, 2017
@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.10 Jul 20, 2017
@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Unplanned Jun 23, 2018
@dsnet
Copy link
Member

@dsnet dsnet commented Sep 18, 2018

In that case should there maybe be an UnmarshalJSONWithOptions or something that passes the options of the decoder to the method and is preferred over the normal UnmarshalJSON?

UnmarshalJSONWithOptions is a pretty costly interface for users to start implementing. While it provides flexibility in pushing down top-level options, it also has downsides:

  • What do you do when a new top-level option is added that a current implementer does not know about?
    • We had to deal with this problem with protobufs (our solution here). The approach taken for protobufs is heavy-weight because we don't anticipate that many people to actually implement their own protobuf messages. However, it seems common for users to implement their own JSON unmarshaler.
  • That new API is tied to the encoding/json package, when that wasn't the case with the simpler UnmarshalJSON method.

how often do people really marshal/unmarshal structs from packages that they have no control over?

Unfortunately, often enough. The situation I came up with above was not a hypothetical, but modeled off real problems I run into.

@mvdan
Copy link
Member

@mvdan mvdan commented Sep 19, 2018

The logic does direct comparison first, then falls back on unicode equal fold logic

That seems like just case insensitive matching to me. I can't come up with a scenario where a case sensitive match takes precedence over a case insensitive one, from the user's perspective. Since both match, it's always the last that wins - independently of which is the case sensitive match.

Perhaps the code was meant as a quick path to avoid equalFold. Otherwise, I can't figure out its purpose.

@liggitt
Copy link
Contributor

@liggitt liggitt commented Sep 19, 2018

Since both match, it's always the last that wins - independently of which is the case sensitive match.

This is demonstrably how the parser works. I agree it functions in a case-insensitive manner.

@erikdubbelboer
Copy link
Contributor

@erikdubbelboer erikdubbelboer commented Sep 20, 2018

@mvdan Yes it's a fast path to try and avoid equalFold which can be expensive in case of unicode strings.

@dsnet I see your point, but making it a struct tag option would mean the package that defined the type has to specify if it should be case sensitive or not. In your example where Foo wasn't intended to be used with encoding/json initially it probably wouldn't have these struct tags and would always be case insensitive.

On the one hand it feels weird if the person who maintains the package of Foo gets to decide if my JSON has to be case sensitive or not. On the other hand that person can already enforce any rule they want by implementing UnmarshalJSON on Foo.

@mvdan
Copy link
Member

@mvdan mvdan commented Sep 20, 2018

Yes it's a fast path to try and avoid equalFold which can be expensive in case of unicode strings.

Then my point is that this piece of godoc should be removed:

preferring an exact match but also accepting a case-insensitive match.

The decoder always prefers the latest match, not the case-sensitive one.

@erikdubbelboer
Copy link
Contributor

@erikdubbelboer erikdubbelboer commented Sep 20, 2018

@mvdan the code is:

for i := range fields {
	if bytes.Equal(ff.nameBytes, key) {
		f = ff
		break
	}
	if f == nil && ff.equalFold(ff.nameBytes, key) {
		f = ff
	}
}

which means that when it's an exact match (bytes.Equal) it doesn't execute the equalFold as we already break out of the loop before that.

Besides this is not the point of this issue and not the point of the discussion here.

@ysmolsky
Copy link
Member

@ysmolsky ysmolsky commented Oct 13, 2018

I think the documentation is clearly misleading for the current situation. The flow of recent tickets shows it. I was and still am confused by this:

Unmarshal matches incoming object keys to the keys used by Marshal ..., preferring an exact match but also accepting a case-insensitive match

I mean, technically and logically it is correct, but still confusing. Should we at least to mention that it will match the last matching value from the json whenever there is more than one possible choice?

@ToadKing
Copy link

@ToadKing ToadKing commented Oct 15, 2018

I don't think it's 100% correct, since even if there's an exact match with the field tag it won't always choose it (like in #28190).

@justinrixx
Copy link

@justinrixx justinrixx commented Feb 4, 2020

@mvdan pointed me here from #36932 (comment)

in my opinion, this is not a doc issue, as it can lead to nondeterministic behavior (json key ordering is not guaranteed, and any golang app consuming json is at the mercy of whoever generated the json, deterministic or not).

@fredgan
Copy link

@fredgan fredgan commented Feb 18, 2020

Today I also encounted this problem. It surprised me. Apparently what the doc describes is not the same as what it performs. If we write field value in JSON string, we need the exact match, not case insensitive match.

Does anybody have some solutions for this? Now I just write all properties match all cases. But I still can not promise whether the result is right in all situation.

@justinrixx
Copy link

@justinrixx justinrixx commented Feb 18, 2020

@fredgan the workaround is to write a custom unmarshaller using a map, and then convert the keys in the map explicitly to the type you care about.

here's an example: https://play.golang.org/p/nAz4Hs56mtB

in the above example, i used a map[string]string, but you may need to use a map[string]interface{} if you have heterogeneous types and use type assertions before setting them on your target struct.

@gopherbot
Copy link

@gopherbot gopherbot commented Feb 26, 2020

Change https://golang.org/cl/221117 mentions this issue: encoding/json: clarify how we decode into structs

@mvdan
Copy link
Member

@mvdan mvdan commented Feb 26, 2020

I've come to realise that pretty much all of my previous comments in this thread were wrong :) The current documentation is technically correct, as @rsc points out in #14750 (comment), but a bit misleading, as @ysmolsky points out in #14750 (comment).

I do think that many parts of the json package could be designed better, and I think the edge cases concerning missing, repeated, or case-insensitive-matching keys are some of them.

Having said that, this issue is largely not about making a breaking change to the JSON decoder, or adding an option to change its behavior. This is a documentation issue; far too many users (myself included!) were misreading the docs. I hope that my CL above fixes that problem.

I think that a proposal to modify the decoder's API or behavior should be filed separately, once this issue is fixed with docs. This thread has become too long to discuss any one particular solution, and proposals have had their own process for a few years now.

@justinrixx
Copy link

@justinrixx justinrixx commented Feb 26, 2020

@mvdan i don't understand how the docs are correct today. they say an exact match is preferred, but it is obviously not at all. that's what throws people off. there is not preferring happening at all, it's a race case of what is encountered first.

i would be happy with a change to the docs to reflect that it's simply a case-insensitive match, with a brief mention of the proper approach if a case-sensitive match is required (writing a custom unmarshaller using a map with string keys). my concerns about it being a bug were a result of the fact that i didn't think there was a workaround to get a case-sensitive match. knowing there is a workaround, i can accept this behavior with the documentation change.

@mvdan
Copy link
Member

@mvdan mvdan commented Feb 26, 2020

@justinrixx please read the comments on the CL above.

@mvdan
Copy link
Member

@mvdan mvdan commented Mar 3, 2020

To summarize my exchange with @dsnet on Gerrit: Yes, the current behavior is technically correct from the point of view of the decoder, as "incoming object" can easily be understood to be the key-value pairs being parsed in the order in which they appear.

However, what Joe argues is that this is an implementation detail - the JSON spec doesn't specify object keys to be ordered in any way. We could change the underlying behavior to be closer to what most users would expect, not to stick to what's most efficient or most logical from the decoder's point of view. For example, by skipping case-insensitive matches of a field if a case-sensitive match for it already happened.

The devil is in the details, of course. Would this change slow down the decoder too much? Would it break too much existing code? That is yet to be determined, and I'd like to find some time to experiment with it in the near future.

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 19, 2020

Change https://golang.org/cl/224079 mentions this issue: encoding/json: skip inexact matches after exact matches

@mvdan
Copy link
Member

@mvdan mvdan commented Mar 19, 2020

The CL above is the first iteration of the experiment I mentioned two weeks ago. Decoding structs is ~1% slower, but we get the benefit we want. Please test the CL and let me know if your code breaks, or if it properly fixes this issue for you.

1% performance loss is unfortunate, but I can't figure out a way around it. Alternative patches without the slow-down are welcome.

@smyrman
Copy link

@smyrman smyrman commented Jun 3, 2020

I think the CL is interesting, but it would, from my perspective, be even more useful to explicitly define an option. It's useful to be able to set that a case-insensitive match be treated as an unknown field.

As a comment on #14750 (comment):

I would vote for allowing case-sensitive matching as a decoder option enabled through a method call. (UseStrictNames, DisableEqualFold, or some better name) over struct-tags.

I want to advocate for this solution not because it's a better solution, but because it's in line with the current design for DisallowUnknownFields and UseNumbers. Having one annoying -- but consistent -- API has a value; it means the same work-around can be used for all cases where a custom json.Unmarshaler implementation is needed.

The DisallowUnknownFields and UseNumbers are in my opinion within the same class of problems, and that puts some restrains on how a strictly case-sensitve parser should be implemented.

Some notes:

  • Projects can easily define their own jsonDecoder(data []byte) json.Decoder helper-functions to reduce the boiler plate in json.Unmarshaler implementations.
  • Making a new json.Unmarshaler interface that accepts passing along options (or better still, accept a pre-initialized (sub) json.Docoder with inherited options) is certainly possible, but it's a separate issue. It is not given that it's always preferred.
@smyrman
Copy link

@smyrman smyrman commented Jun 3, 2020

PS! The Encoder/Decoder interfaces defined in #12001 (old issue) could serve as a way of passing along decoder option. Perhaps that problem can be discussed there, although the motivation here is different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.