Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

de: ensure unique keys when deserializing to maps and sets #916

Closed
wants to merge 2 commits into from

Conversation

lucab
Copy link

@lucab lucab commented May 6, 2017

BREAKING: deserializing inputs with duplicate keys to maps and sets now fails.

Sets and maps do not allow insertion of duplicate entries, which are
instead skipped or replaced.
This updates serde to check for duplicate entries when deserializing,
erroring when such cases are encourented.

Closes #913

lucab added 2 commits May 6, 2017 19:06
BREAKING: deserializing inputs with duplicate keys to maps and sets now fails.

Sets and maps do not allow insertion of duplicate entries, which are
instead skipped or replaced.
This updates serde to check for duplicate entries when deserializing,
erroring when such cases are encourented.

Signed-off-by: Luca Bruno <lucab@debian.org>
Signed-off-by: Luca Bruno <lucab@debian.org>
@clarfonthey
Copy link
Contributor

Perhaps this could be made not-breaking by adding an opt-in attribute?

@dtolnay
Copy link
Member

dtolnay commented May 6, 2017

The current behavior matches both JSON.parse in JavaScript and encoding/json in Go. Do you know of any JSON implementations that behave the way your PR does? I worry that this would break expectations.

JSON.parse('{"a":0,"a":1}')
package main

import (
	"encoding/json"
	"fmt"
)

func main() {
	var m map[string]int
	err := json.Unmarshal([]byte(`{"a":0,"a":1}`), &m)
	fmt.Println(m, err)
}

@lucab
Copy link
Author

lucab commented May 6, 2017

@clarcharr As I said in #913, I would be fine with some way to opt-in this behavior (even if I personally believe this should be opt-out instead). But I think this should be per-deserializer instead of per-field, as library authors and consumers may have different usecases at hand.

@dtolnay Your comment is on-spot. I'm actually here because this behavior in Go is resulting in higher-level pain there: a seemingly random logic bug was traced down to a producer emitting a manifest with duplicate fields, with occasional re-ordering. Even if important/widespread, I won't take those as my ideal references.

Some related datapoints:

  • Haskell aeson is having a similar discussion right now Duplicate keys haskell/aeson#531
  • using the same JSON dict with duplicate keys as input, currently serde deserializes it as a map/set (discarding previous keys) but refuses to deserialize it as a struct with named fields (erroring on duplicate keys). This uniforms the behavior toward the early-error side.

@dtolnay
Copy link
Member

dtolnay commented May 6, 2017

Thanks for the link but it looks like they are pretty far from a consensus that treating this as an error is the right behavior.

For your use case would you be able to use your own map type for which you define Deserialize, or a deserialize_with attribute on the field to specify this logic for the deserialize behavior?

@lucab
Copy link
Author

lucab commented May 7, 2017

Sorry, I didn't mean to imply that aeson is moving in this direction, just that a better comparison would be that haskell discussion (instead of golang/js stdlib legacy).

And yes, I can encode this in my own deserializers. I'm am however concerned about the default behavior of serde here, as I was surprised when I first encountered #913. I'm inclined to close this PR as I understand this requires a bit more thoughts, but I'd still would like to see something better than "redefine all maps/sets deserializers" for this directly in serde.

@dtolnay
Copy link
Member

dtolnay commented May 7, 2017

I added this to the collection of deserialize_with helpers I intend to provide in #553. Thanks!

@lucab
Copy link
Author

lucab commented May 8, 2017

@dtolnay thank you for the feedback. With context, I now understand your previous question better and I think it is a good middleground solution.

@petreeftime
Copy link

As far as I can tell this issue is still not resolved, or the solution is somewhere in a crate I can't really find. Is there a nicer way to Error on duplicate keys except by re-implementing the Deserialize function for the specific container that might have duplicates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

hashset: duplicate entries are silently discarded when deserializing
4 participants