Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/json: add Decoder.DisallowDuplicateFields #48298

Open
dsnet opened this issue Sep 9, 2021 · 1 comment
Open

proposal: encoding/json: add Decoder.DisallowDuplicateFields #48298

dsnet opened this issue Sep 9, 2021 · 1 comment
Labels
Projects
Milestone

Comments

@dsnet
Copy link
Member

@dsnet dsnet commented Sep 9, 2021

The presence of duplicate fields in JSON input is almost always a bug from the sender and the behavior across various implementations is highly inconsistent. It's too late to switch the current behavior to always reject duplicate fields in the current package, but we can provide an option to enforce stricter checks. As such, I propose adding a Decoder.DisallowDuplicateFields option.


Background

Per RFC 8259, section 4, the handling of duplicate names is left as undefined behavior. Rejecting such inputs is within the realm of valid behavior. Tim Bray, the author of RFC 8259, actually recommends going beyond RFC 8259 and that implementations should instead target compliance with RFC 7493. RFC 7493 is a fully compatible subset of RFC 8259, which makes strict decisions about behavior that RFC 8259 leaves undefined (including the rejection of duplicate names).

The lack of duplicate name rejection has correctness implications where roundtrip unmarshal/marshal does not result in semantically equivalent JSON, and surprising behavior for users when they accidentally send JSON objects with duplicate names. In such a case, the current behavior is actually somewhat inconsistent and difficult to explain.

The lack of duplicate name rejection may have security implications since it becomes difficult for a security tool to validate the semantic meaning of a JSON object since meaning is inherently undefined in the presence of duplicate names.


Implementation

A naive implementation can remember all seen names in a Go map. A more clever implementation can take advantage of the fact that we are almost always unmarshaling into a Go map or Go struct. In the case of a Go map, we can use the Go map itself as a means to detect duplicate names. In the case of a Go struct, we can convert a JSON name into an index (i.e., the field index in the Go struct), and then use a an efficient bitmap to detect whether we saw the name before.

In the common case, there would be no performance slow downs to enabling checks for duplicate names.


Aside: I'm not fond of the name Fields since JSON terminology calls this either a "name" or "member" (per RFC 8259, section 4). However, it is consistent with the existing DisallowUnknownFields option.

\cc @bradfitz @crawshaw @mvdan

@gopherbot gopherbot added this to the Proposal milestone Sep 9, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Sep 9, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Oct 13, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants