Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json module should issue warning about duplicate keys #89217

Open
Zeturic mannequin opened this issue Aug 31, 2021 · 6 comments
Open

json module should issue warning about duplicate keys #89217

Zeturic mannequin opened this issue Aug 31, 2021 · 6 comments
Assignees
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@Zeturic
Copy link
Mannequin

Zeturic mannequin commented Aug 31, 2021

BPO 45054
Nosy @rhettinger, @etrepum, @corona10, @Zeturic, @akulakov

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/etrepum'
closed_at = None
created_at = <Date 2021-08-31.00:30:37.913>
labels = ['type-feature', 'library', '3.11']
title = 'json module should issue warning about duplicate keys'
updated_at = <Date 2021-11-14.01:49:22.181>
user = 'https://github.com/Zeturic'

bugs.python.org fields:

activity = <Date 2021-11-14.01:49:22.181>
actor = 'rhettinger'
assignee = 'bob.ippolito'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2021-08-31.00:30:37.913>
creator = 'Zeturic'
dependencies = []
files = []
hgrepos = []
issue_num = 45054
keywords = []
message_count = 5.0
messages = ['400678', '400696', '406181', '406182', '406302']
nosy_count = 5.0
nosy_names = ['rhettinger', 'bob.ippolito', 'corona10', 'Zeturic', 'andrei.avk']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue45054'
versions = ['Python 3.11']

@Zeturic
Copy link
Mannequin Author

Zeturic mannequin commented Aug 31, 2021

The json module will allow the following without complaint:

import json
d1 = {1: "fromstring", "1": "fromnumber"}
string = json.dumps(d1)
print(string)
d2 = json.loads(string)
print(d2)

And it prints:

{"1": "fromstring", "1": "fromnumber"}
{'1': 'fromnumber'}

This would be extremely confusing to anyone who doesn't already know that JSON keys have to be strings. Not only does d1 != d2 (which the documentation does mention as a possibility after a round trip through JSON), but len(d1) != len(d2) and d1['1'] != d2['1'], even though '1' is in both.

I suggest that if json.dump or json.dumps notices that it is producing a JSON document with duplicate keys, it should issue a warning. Similarly, if json.load or json.loads notices that it is reading a JSON document with duplicate keys, it should also issue a warning.

@Zeturic Zeturic mannequin added 3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Aug 31, 2021
@Zeturic
Copy link
Mannequin Author

Zeturic mannequin commented Aug 31, 2021

Sorry to the people I'm pinging, but I just noticed the initial dictionary in my example code is wrong. I figured I should fix it before anybody tested it and got confused about it not matching up with my description of the results.

It should've been:

import json
d1 = {"1": "fromstring", 1: "fromnumber"}
string = json.dumps(d1)
print(string)
d2 = json.loads(string)
print(d2)

@akulakov
Copy link
Contributor

In general this sounds reasonable; - but a couple of thoughts / comments:

  • If you have a dict with mixed numbers in str format and in number format (i.e. ints as numbers and ints as strings in your case), you are creating problems in many potential places. The core of the problem is logically inconsistent keys rather than the step of conversion to JSON. So the most useful place for warning would be when adding a new key, but that wouldn't be practical.

  • Even if something is to be done at conversion to JSON, it's not clear if it should be a warning (would that be enough when the conversion is a logical bug?), or it should be some kind of strict=True mode that raises a ValueError?

@akulakov
Copy link
Contributor

Another good option would be to use typed dict like mydict : dict[int,str] = {}; and use typed values when populating the dict; this way a type checker will warn you of inconsistent key types.

@rhettinger
Copy link
Contributor

-0 on doing this. The suggested warning/error adds overhead that everyone would pay for but would almost never be of benefit. I haven't seen this particular problem arise in practice. The likely reasons it doesn't come up are 1) that generated data doesn't normally produce mixed type keys, 2) because mixed type keys don't round-trip, and 3) even using numeric keys only (not mixed) is uncommon because it results in poor outcomes that fail round-trip invariants.

Andrei Kulakov is right in saying that such data suggests deeper problems with the design and that static typing would be beneficial.

One last thought: Even with regular dicts, we don't normally warn about encountering duplicate keys:

    >>> dict([(1, 'run'), (1, 'zoo'), (3, 'tree')])
    {1: 'zoo', 3: 'tree'}

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@ptth222
Copy link

ptth222 commented Apr 12, 2024

I second this. Especially on load. I have ran into this issue more than once when dealing with JSON from less experienced sources. Yes it's a bad choice in keys, but if someone is doing that it is hard to detect since you aren't even warned about it.

I also don't think it's fair to compare internal behavior of Python data types to the behavior when pulling in external structures into an internal structure. It's also maybe bad practice that dict doesn't warn about the example @rhettinger gave.

It seems reasonable to me to expect some kind of warning when imperfect mapping occurs between data structures, especially when data is lost. I find this particularly annoying here with JSON because I use the jsonschema package to validate JSON quite a bit and this is something you essentially have to validate on read in.

For now, I use the solution in this SO post, but it feels bad having to opt in rather than out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Status: No status
Development

No branches or pull requests

3 participants