New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: parser ignores the case of member names #14750

Open
cyberphone opened this Issue Mar 10, 2016 · 26 comments

Comments

Projects
None yet
@cyberphone

cyberphone commented Mar 10, 2016

  1. What version of Go are you using? 5.3
  2. What operating system and processor architecture are you using? amd64,windows
  3. What did you do?
    Read this: https://mailarchive.ietf.org/arch/msg/json/Ju-bwuRv-bq9IuOGzwqlV3aU9XE
  4. What did you expect to see?
    ...
  5. What did you see instead?
    ...

@bradfitz bradfitz changed the title from JSON parser ignores the case of member names to encoding/json: parser ignores the case of member names Mar 10, 2016

@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 10, 2016

Playgroud link: http://play.golang.org/p/9j0ome9HqK

@rsc, @adg, thoughts? This surprised me. I thought we only did the case insensitive thing when there was no struct tag.

But from the ietf link above:

is looks quite diabolical for security as it is trivial to create valid JSON values that will be interpreted differently by different implementations.

Related to "differently by different implementations", we permit JSON keys twice: http://play.golang.org/p/lPgEj1T6Zk (two "alg"). That probably differs between implementations in either its output or whether it's even successful.

@bradfitz bradfitz added the Security label Mar 10, 2016

@rsc

This comment has been minimized.

Contributor

rsc commented Mar 10, 2016

This has been the behavior at least as far back as Go 1.2 (I can't run
earlier on my Mac).
The docs also seem to state quite clearly that this is what happens:

To unmarshal JSON into a struct, Unmarshal matches incoming object keys to
the keys used by Marshal (either the struct field name or its tag),
preferring an exact match but also accepting a case-insensitive match.
Unmarshal will only set exported fields of the struct.

I understand there are security implications if JSON is used in security
contexts, and I was a little surprised too, but the docs are very clear. I
doubt that encoding/json's behavior should be dictated by security
concerns. FWIW, I don't believe we should go out of our way to reject
repeated fields either, especially not now. I might go so far as to argue
that using JSON in security standards is a mistake anyway, but I won't do
that here.

In any event it's too late to change the defaults. If we want to support
the security use case, maybe we could have a UseStrictNames method on
Decoder (like UseNumber).

@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Mar 10, 2016

@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 10, 2016

UseStrictNames SGTM.

@cyberphone

This comment has been minimized.

cyberphone commented Mar 10, 2016

I would consider addressing a bunch of related issue as an option:
#14749
#14135

@rsc

This comment has been minimized.

Contributor

rsc commented Mar 10, 2016

@manger

This comment has been minimized.

manger commented Mar 13, 2016

UseStrictNames does look like a decent way to add case-sensitive decoding support in a backward compatible way.

However, while that would make case-sensitive decodes possible, it wouldn't help make this safer mode common or the default.

How about also defining a "strict" field tag option (c.f. "omitempty")? That should allow types to be safely used via newDecoder, or Unmarshall, or when the decoding is within a library. You can define the safety in the type, without finding every place it might be decoded.

@cespare

This comment has been minimized.

Contributor

cespare commented Apr 13, 2016

Shall I send a UseStrictNames CL? Should that apply strict names across the board, whether or not the user supplied a struct tag?

type Foo struct {
        Bar string
        Baz string `json:"baz"`
}

So that would only match exactly "Bar" and "baz".

@rsc

This comment has been minimized.

Contributor

rsc commented Apr 15, 2016

@cespare If it's simple, then yes sure. But after a thought on #15314 I wonder if maybe the UseStrictNames method is the wrong approach and instead it should be a tag attribute on the field. Then it can be enforced by the author of the struct instead of the author of the unmarshal call. Specifically, something like json:"Foo,exactname".

@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016

@jessfraz

This comment has been minimized.

Contributor

jessfraz commented Jun 22, 2016

Would it get super repetitive if the person had to set it on every tag name?

@rsc rsc modified the milestones: Go1.9Early, Go1.8 Oct 26, 2016

@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.9Early May 3, 2017

@bradfitz bradfitz modified the milestones: Go1.9Maybe, Go1.10 Jul 20, 2017

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Unplanned Jun 23, 2018

@npenzin

This comment has been minimized.

npenzin commented Jun 29, 2018

I want to implement that thing. It was such a disgrace when I started working with financial and ticker data which has lots of single-char keys in JSON (to be as short as possible) and I have to define all of the keys/fields in order to protect the fields of the struct to be eventually overwritten. Meh.

p.s. I'm new to Go so any assistance or mentoring would be very supportive and motivative :)

@ianlancetaylor @rsc @bradfitz anyone, guys? :)

@mvdan

This comment has been minimized.

Member

mvdan commented Jul 30, 2018

I'm a bit confused by this issue. From the godoc that has already been quoted:

preferring an exact match but also accepting a case-insensitive match.

What part of the decoder's current implementation prefers exact matches? As far as I can tell, it accepts exact and case-insensitive matches all the same, as if the godoc was:

accepting either an exact or a case-insensitive match.

Going back to the very first playground link; if we're decoding both "alg" and "ALG" into Alg string `json:"alg"`, I'd presume that "ALG" would be ignored, as "alg" was previously decoded in the same object and has preference.

@npenzin

This comment has been minimized.

npenzin commented Jul 30, 2018

This is still an issue - https://play.golang.org/p/uyYYa186mez :(

@bradfitz bradfitz modified the milestones: Unplanned, Go1.12 Jul 31, 2018

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jul 31, 2018

@npenzin, that's why this bug is still open.

@erikdubbelboer

This comment has been minimized.

Contributor

erikdubbelboer commented Sep 1, 2018

I took a stab at implementing this: https://go-review.googlesource.com/c/go/+/132735

@mvdan

This comment has been minimized.

Member

mvdan commented Sep 17, 2018

/cc @dsnet to help make a decision

I very much agree that we must do something. I'd personally prefer an encoder option instead of a struct field tag. As far as I've seen, users tend to either care about this for an entire operation, or not at all. Users could always use maps or custom unmarshalers if they want their own case sensitivity rules.

@dsnet

This comment has been minimized.

Member

dsnet commented Sep 18, 2018

What part of the decoder's current implementation prefers exact matches? As far as I can tell, it accepts exact and case-insensitive matches all the same

The logic does direct comparison first, then falls back on unicode equal fold logic:

if bytes.Equal(ff.nameBytes, key) {
f = ff
break
}
if f == nil && ff.equalFold(ff.nameBytes, key) {
f = ff
}

In regards to json API changes, Russ is the ultimate decision maker.

That said, it seems like there's two clear APIs for this:

  • Be a struct field tag option.
    • Has the advantage that the equal fold logic is only relevant on struct fields, and so a field tag is the exact granularity control for this.
    • Has the disadvantage of being very verbose disabling this functionality for all fields.
    • Places the the author of the struct in control of the strictness of casing.
  • Be an json.Encoder option.
    • Has the advantage of being a single place place control this for the entire unmarshal.
    • Has the disadvantage of possibly over-specifying behavior on more than desired.
    • Places the unmarshal caller in control of strictness of casing.

Another thing to consider is that sometimes a type needs to implement UnmarshalJSON themselves, where they call json.Unmarshal again for some sub-value. In such situations it is a bit odd that a "SetStrictCasing" option on the top-level unmarshal has no effect on a sub-tree of the JSON input. However, my observation of that oddity is already true for Decoder.DisallowUnknownFields and Decoder.UseNumber. There is a general issue where top-level unmarshal options are not propagated to recursive calls to UnmarshalJSON and more top-level options will only exacerbate the issue.

It doesn't seem like there's been much discussion to inform which approach to take.

@dsnet

This comment has been minimized.

Member

dsnet commented Sep 18, 2018

Thinking about it more, I want to expand more on the problem of top-level options not propagating through recursive UnmarshalJSON calls as we're hitting similar issues with the increase of various protobuf implementations.

Imagine the sequence of events over the lifetime of a package that I'm the author of:

  • At the initial release of my package, I have a type Foo that is a struct with a set of fields.
  • People use type Foo in their programs with the encoding/json package. It's not a use-case I intend, but fine, Hyrum's law.
  • At a later point, I decide to add better support for json because there are other fields I want to add for which encoding/json cannot handle (e.g., unmarshaling into an unexported field). Thus, I want to add a UnmarshalJSON method.

Am I allowed to add the UnmarshalJSON method?

  • According to typical Go1 compatibility rules, I should have the freedom to add any methods I want.
  • Given that UnmarshalJSON is a magic method, maybe I should at least replicate as much of the behavior of json.Unmarshal being called on my type. However, the presence of top-level options that I don't know about makes it impossible for me to implement UnmarshalJSON in any way that is backwards compatible.

Thus, I'm actually concerned about more top-level options being added to the unmarshaler as that feature does not cooperate well with the fact that the json package also respects the json.Unmarshaler interface. It subverts control away from the author of types.

@erikdubbelboer

This comment has been minimized.

Contributor

erikdubbelboer commented Sep 18, 2018

In that case should there maybe be an UnmarshalJSONWithOptions or something that passes the options of the decoder to the method and is preferred over the normal UnmarshalJSON?

Or should UseNumber be a struct tag option as well? But DisallowUnknownFields is a bit harder to implement as struct tag.

I know Hyrum's law but how often do people really marshal/unmarshal structs from packages that they have no control over? I know I never do since most of the time these structs have internal fields as well that are important for the state as well.

@dsnet

This comment has been minimized.

Member

dsnet commented Sep 18, 2018

In that case should there maybe be an UnmarshalJSONWithOptions or something that passes the options of the decoder to the method and is preferred over the normal UnmarshalJSON?

UnmarshalJSONWithOptions is a pretty costly interface for users to start implementing. While it provides flexibility in pushing down top-level options, it also has downsides:

  • What do you do when a new top-level option is added that a current implementer does not know about?
    • We had to deal with this problem with protobufs (our solution here). The approach taken for protobufs is heavy-weight because we don't anticipate that many people to actually implement their own protobuf messages. However, it seems common for users to implement their own JSON unmarshaler.
  • That new API is tied to the encoding/json package, when that wasn't the case with the simpler UnmarshalJSON method.

how often do people really marshal/unmarshal structs from packages that they have no control over?

Unfortunately, often enough. The situation I came up with above was not a hypothetical, but modeled off real problems I run into.

@mvdan

This comment has been minimized.

Member

mvdan commented Sep 19, 2018

The logic does direct comparison first, then falls back on unicode equal fold logic

That seems like just case insensitive matching to me. I can't come up with a scenario where a case sensitive match takes precedence over a case insensitive one, from the user's perspective. Since both match, it's always the last that wins - independently of which is the case sensitive match.

Perhaps the code was meant as a quick path to avoid equalFold. Otherwise, I can't figure out its purpose.

@liggitt

This comment has been minimized.

liggitt commented Sep 19, 2018

Since both match, it's always the last that wins - independently of which is the case sensitive match.

This is demonstrably how the parser works. I agree it functions in a case-insensitive manner.

@erikdubbelboer

This comment has been minimized.

Contributor

erikdubbelboer commented Sep 20, 2018

@mvdan Yes it's a fast path to try and avoid equalFold which can be expensive in case of unicode strings.

@dsnet I see your point, but making it a struct tag option would mean the package that defined the type has to specify if it should be case sensitive or not. In your example where Foo wasn't intended to be used with encoding/json initially it probably wouldn't have these struct tags and would always be case insensitive.

On the one hand it feels weird if the person who maintains the package of Foo gets to decide if my JSON has to be case sensitive or not. On the other hand that person can already enforce any rule they want by implementing UnmarshalJSON on Foo.

@mvdan

This comment has been minimized.

Member

mvdan commented Sep 20, 2018

Yes it's a fast path to try and avoid equalFold which can be expensive in case of unicode strings.

Then my point is that this piece of godoc should be removed:

preferring an exact match but also accepting a case-insensitive match.

The decoder always prefers the latest match, not the case-sensitive one.

@erikdubbelboer

This comment has been minimized.

Contributor

erikdubbelboer commented Sep 20, 2018

@mvdan the code is:

for i := range fields {
	if bytes.Equal(ff.nameBytes, key) {
		f = ff
		break
	}
	if f == nil && ff.equalFold(ff.nameBytes, key) {
		f = ff
	}
}

which means that when it's an exact match (bytes.Equal) it doesn't execute the equalFold as we already break out of the loop before that.

Besides this is not the point of this issue and not the point of the discussion here.

@ysmolsky

This comment has been minimized.

Member

ysmolsky commented Oct 13, 2018

I think the documentation is clearly misleading for the current situation. The flow of recent tickets shows it. I was and still am confused by this:

Unmarshal matches incoming object keys to the keys used by Marshal ..., preferring an exact match but also accepting a case-insensitive match

I mean, technically and logically it is correct, but still confusing. Should we at least to mention that it will match the last matching value from the json whenever there is more than one possible choice?

@ToadKing

This comment has been minimized.

ToadKing commented Oct 15, 2018

I don't think it's 100% correct, since even if there's an exact match with the field tag it won't always choose it (like in #28190).

@rsc rsc modified the milestones: Go1.12, Go1.13 Nov 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment