json module should issue warning about duplicate keys #89217

Zeturic · 2021-08-31T00:30:38Z

BPO	45054
Nosy	@rhettinger, @etrepum, @corona10, @Zeturic, @akulakov

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/etrepum'
closed_at = None
created_at = <Date 2021-08-31.00:30:37.913>
labels = ['type-feature', 'library', '3.11']
title = 'json module should issue warning about duplicate keys'
updated_at = <Date 2021-11-14.01:49:22.181>
user = 'https://github.com/Zeturic'

bugs.python.org fields:

activity = <Date 2021-11-14.01:49:22.181>
actor = 'rhettinger'
assignee = 'bob.ippolito'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2021-08-31.00:30:37.913>
creator = 'Zeturic'
dependencies = []
files = []
hgrepos = []
issue_num = 45054
keywords = []
message_count = 5.0
messages = ['400678', '400696', '406181', '406182', '406302']
nosy_count = 5.0
nosy_names = ['rhettinger', 'bob.ippolito', 'corona10', 'Zeturic', 'andrei.avk']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue45054'
versions = ['Python 3.11']

Zeturic · 2021-08-31T00:30:38Z

The json module will allow the following without complaint:

import json
d1 = {1: "fromstring", "1": "fromnumber"}
string = json.dumps(d1)
print(string)
d2 = json.loads(string)
print(d2)

And it prints:

{"1": "fromstring", "1": "fromnumber"}
{'1': 'fromnumber'}

This would be extremely confusing to anyone who doesn't already know that JSON keys have to be strings. Not only does d1 != d2 (which the documentation does mention as a possibility after a round trip through JSON), but len(d1) != len(d2) and d1['1'] != d2['1'], even though '1' is in both.

I suggest that if json.dump or json.dumps notices that it is producing a JSON document with duplicate keys, it should issue a warning. Similarly, if json.load or json.loads notices that it is reading a JSON document with duplicate keys, it should also issue a warning.

Zeturic · 2021-08-31T07:08:50Z

Sorry to the people I'm pinging, but I just noticed the initial dictionary in my example code is wrong. I figured I should fix it before anybody tested it and got confused about it not matching up with my description of the results.

It should've been:

import json
d1 = {"1": "fromstring", 1: "fromnumber"}
string = json.dumps(d1)
print(string)
d2 = json.loads(string)
print(d2)

akulakov · 2021-11-12T00:14:55Z

In general this sounds reasonable; - but a couple of thoughts / comments:

If you have a dict with mixed numbers in str format and in number format (i.e. ints as numbers and ints as strings in your case), you are creating problems in many potential places. The core of the problem is logically inconsistent keys rather than the step of conversion to JSON. So the most useful place for warning would be when adding a new key, but that wouldn't be practical.
Even if something is to be done at conversion to JSON, it's not clear if it should be a warning (would that be enough when the conversion is a logical bug?), or it should be some kind of strict=True mode that raises a ValueError?

akulakov · 2021-11-12T01:06:24Z

Another good option would be to use typed dict like mydict : dict[int,str] = {}; and use typed values when populating the dict; this way a type checker will warn you of inconsistent key types.

rhettinger · 2021-11-14T01:10:43Z

-0 on doing this. The suggested warning/error adds overhead that everyone would pay for but would almost never be of benefit. I haven't seen this particular problem arise in practice. The likely reasons it doesn't come up are 1) that generated data doesn't normally produce mixed type keys, 2) because mixed type keys don't round-trip, and 3) even using numeric keys only (not mixed) is uncommon because it results in poor outcomes that fail round-trip invariants.

Andrei Kulakov is right in saying that such data suggests deeper problems with the design and that static typing would be beneficial.

One last thought: Even with regular dicts, we don't normally warn about encountering duplicate keys:

    >>> dict([(1, 'run'), (1, 'zoo'), (3, 'tree')])
    {1: 'zoo', 3: 'tree'}

ptth222 · 2024-04-12T19:30:26Z

I second this. Especially on load. I have ran into this issue more than once when dealing with JSON from less experienced sources. Yes it's a bad choice in keys, but if someone is doing that it is hard to detect since you aren't even warned about it.

I also don't think it's fair to compare internal behavior of Python data types to the behavior when pulling in external structures into an internal structure. It's also maybe bad practice that dict doesn't warn about the example @rhettinger gave.

It seems reasonable to me to expect some kind of warning when imperfect mapping occurs between data structures, especially when data is lost. I find this particularly annoying here with JSON because I use the jsonschema package to validate JSON quite a bit and this is something you essentially have to validate on read in.

For now, I use the solution in this SO post, but it feels bad having to opt in rather than out.

Zeturic mannequin added 3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Aug 31, 2021

rhettinger assigned etrepum Nov 14, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

json module should issue warning about duplicate keys #89217

json module should issue warning about duplicate keys #89217

Zeturic mannequin commented Aug 31, 2021

Zeturic mannequin commented Aug 31, 2021

Zeturic mannequin commented Aug 31, 2021

akulakov commented Nov 12, 2021

akulakov commented Nov 12, 2021

rhettinger commented Nov 14, 2021

ptth222 commented Apr 12, 2024

json module should issue warning about duplicate keys #89217

json module should issue warning about duplicate keys #89217

Comments

Zeturic mannequin commented Aug 31, 2021

Zeturic mannequin commented Aug 31, 2021

Zeturic mannequin commented Aug 31, 2021

akulakov commented Nov 12, 2021

akulakov commented Nov 12, 2021

rhettinger commented Nov 14, 2021

ptth222 commented Apr 12, 2024