Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/json: add Decoder.DisallowDuplicateFields #48298

Open
dsnet opened this issue Sep 9, 2021 · 11 comments
Open

encoding/json: add Decoder.DisallowDuplicateFields #48298

dsnet opened this issue Sep 9, 2021 · 11 comments

Comments

@dsnet
Copy link
Member

dsnet commented Sep 9, 2021

The presence of duplicate fields in JSON input is almost always a bug from the sender and the behavior across various implementations is highly inconsistent. It's too late to switch the current behavior to always reject duplicate fields in the current package, but we can provide an option to enforce stricter checks. As such, I propose adding a Decoder.DisallowDuplicateFields option.


Background

Per RFC 8259, section 4, the handling of duplicate names is left as undefined behavior. Rejecting such inputs is within the realm of valid behavior. Tim Bray, the author of RFC 8259, actually recommends going beyond RFC 8259 and that implementations should instead target compliance with RFC 7493. RFC 7493 is a fully compatible subset of RFC 8259, which makes strict decisions about behavior that RFC 8259 leaves undefined (including the rejection of duplicate names).

The lack of duplicate name rejection has correctness implications where roundtrip unmarshal/marshal does not result in semantically equivalent JSON, and surprising behavior for users when they accidentally send JSON objects with duplicate names. In such a case, the current behavior is actually somewhat inconsistent and difficult to explain.

The lack of duplicate name rejection may have security implications since it becomes difficult for a security tool to validate the semantic meaning of a JSON object since meaning is inherently undefined in the presence of duplicate names.


Implementation

A naive implementation can remember all seen names in a Go map. A more clever implementation can take advantage of the fact that we are almost always unmarshaling into a Go map or Go struct. In the case of a Go map, we can use the Go map itself as a means to detect duplicate names. In the case of a Go struct, we can convert a JSON name into an index (i.e., the field index in the Go struct), and then use a an efficient bitmap to detect whether we saw the name before.

In the common case, there would be no performance slow downs to enabling checks for duplicate names.


Aside: I'm not fond of the name Fields since JSON terminology calls this either a "name" or "member" (per RFC 8259, section 4). However, it is consistent with the existing DisallowUnknownFields option.

\cc @bradfitz @crawshaw @mvdan

@gopherbot gopherbot added this to the Proposal milestone Sep 9, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Sep 9, 2021
@rsc
Copy link
Contributor

rsc commented Oct 13, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Oct 13, 2021
@rsc rsc moved this from Active to Likely Accept in Proposals (old) Oct 20, 2021
@rsc
Copy link
Contributor

rsc commented Oct 20, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Likely Accept to Accepted in Proposals (old) Oct 27, 2021
@rsc
Copy link
Contributor

rsc commented Oct 27, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: encoding/json: add Decoder.DisallowDuplicateFields encoding/json: add Decoder.DisallowDuplicateFields Oct 27, 2021
@rsc rsc modified the milestones: Proposal, Backlog Oct 27, 2021
@go101
Copy link

go101 commented Oct 28, 2021

Is this for #14750?

@dsnet
Copy link
Member Author

dsnet commented Oct 28, 2021

Is this for #14750?

No, but they are related. Enabling DisallowDuplicateFields in a sense rejects case-insensitive names (e.g., foo and FoO) that map to the same Go struct field.

For #14750, we probably want an option for StrictNameMatching or something.

@carlmjohnson
Copy link
Contributor

Since duplicate names are UB, even Hyrum's Rule aside, can we treat duplicate names as blank unless equal? ISTM, if you're relying on getting the first name or the last name, you're in for trouble, so we should just return blank (but not an error unless DisallowDuplicateFields is set, due to the Hyrum's Law issue).

@dsnet
Copy link
Member Author

dsnet commented Dec 1, 2021

Since duplicate names are UB ... we should just return blank (but not an error unless DisallowDuplicateFields is set, due to the Hyrum's Law issue).

Precisely due to Hyrum's Law, I don't see how we can change the default behavior without breaking existing usages.

aydinmercan added a commit to aydinmercan/dumb-jose that referenced this issue Dec 4, 2021
Currently encoding/json accepts duplicate fields in the json.
Since golang/go#48298 got accepted,
we should use the decoder interface with Decoder.DisallowDuplicateFields
turned on when available. Its exact behavior will determine whether
json.RawMessage's will be re-unmarshaled or will follow the byte reader
path.
@zamicol
Copy link
Contributor

zamicol commented Aug 18, 2022

Looks like it's been a few months since there was any update. Is there anything I can do to help move a fix along?

@mvdan
Copy link
Member

mvdan commented Aug 30, 2022

I would ask for patience; I know encoding/json is moving very slowly, but we are busy and there will be more to share soon. The blocker in this thread is certainly not a patch :)

@disconnect3d
Copy link

disconnect3d commented Oct 14, 2022

Would be happy to see this added finally. If you need an example of duplicate JSON fields causing troubles, e.g. to priority this more, here is one:

[Update 2017-11-18] A RCE vulnerability was found in CouchDB because two JSON parsers handle duplicate key differently. The same JSON object, when parsed in JavaScript, contains "roles": []', but when parsed in Erlang it contains "roles": ["_admin"].

From http://seriot.ch/projects/parsing_json.html

@tzvatot

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests

9 participants