Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json dump silently converts int keys to string #79153

Closed
My-TienNguyen mannequin opened this issue Oct 13, 2018 · 9 comments
Closed

json dump silently converts int keys to string #79153

My-TienNguyen mannequin opened this issue Oct 13, 2018 · 9 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@My-TienNguyen
Copy link
Mannequin

My-TienNguyen mannequin commented Oct 13, 2018

BPO 34972
Nosy @facundobatista, @ericvsmith, @zooba, @tirkarthi

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2018-10-14.11:33:13.954>
created_at = <Date 2018-10-13.10:16:46.343>
labels = ['invalid', 'type-bug', 'library']
title = 'json dump silently converts int keys to string'
updated_at = <Date 2020-04-06.06:14:52.103>
user = 'https://bugs.python.org/My-TienNguyen'

bugs.python.org fields:

activity = <Date 2020-04-06.06:14:52.103>
actor = 'stub'
assignee = 'none'
closed = True
closed_date = <Date 2018-10-14.11:33:13.954>
closer = 'My-Tien Nguyen'
components = ['Library (Lib)']
creation = <Date 2018-10-13.10:16:46.343>
creator = 'My-Tien Nguyen'
dependencies = []
files = []
hgrepos = []
issue_num = 34972
keywords = []
message_count = 9.0
messages = ['327649', '327650', '327654', '327684', '327688', '327703', '361302', '365838', '365840']
nosy_count = 7.0
nosy_names = ['facundobatista', 'eric.smith', 'stub', 'steve.dower', 'xtreak', 'My-Tien Nguyen', 'Stub2']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue34972'
versions = ['Python 3.6']

@My-TienNguyen
Copy link
Mannequin Author

My-TienNguyen mannequin commented Oct 13, 2018

When int keys are silently converted to string on json serialization, the user needs to remember to convert it back to int on loading.
I think that a warning should be shown at least.

In my case I serialize a dict to json with int keys, later load it back into a dict (resulting in a dict with string keys) and test for existence of an int key in the dict which will then return False incorrectly.

I am aware that json does not support int keys, but this can be easily forgotten.

@My-TienNguyen My-TienNguyen mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Oct 13, 2018
@tirkarthi
Copy link
Member

Thanks for the report. There was a related issue few days back bpo-32816. I think this is a documented behavior at https://docs.python.org/3.8/library/json.html#json.dumps . Having a warning in place might break code and I don't know if there is a safe way to introduce this as a code level warning given that this is a documented behavior in Python 2 and 3. I think this is the case with other languages too like JavaScript itself converting int to string without warning adhering to JSON standard. Correct me if I am wrong or other languages have a warning related to this

Note: Keys in key/value pairs of JSON are always of the type str. When a dictionary is converted into JSON, all the keys of the dictionary are coerced to strings. As a result of this, if a dictionary is converted into JSON and then back into a dictionary, the dictionary may not equal the original one. That is, loads(dumps(x)) != x if x has non-string keys

You can try doing json.loads(data, parse_int=int) but it will try converting the values.

>>> json.loads(json.dumps({1:'1'}), parse_int=int)
{'1': '1'}
>>> json.loads(json.dumps({1:1}), parse_int=int) 
{'1': 1}

Thanks

@My-TienNguyen
Copy link
Mannequin Author

My-TienNguyen mannequin commented Oct 13, 2018

I don’t think, “other languages do that too” is a good argument here. This would apply if behaving differently would break user expectation. But here we would do nothing more than explicitly inform the user of a relevant operation. If they already expected that behaviour, they can disregard the warning.

I don’t see how parse_intwould help me here, I would need a parse_str=int, but then it would try to parse every string, and I don’t see the use case for that.

I would suggest a warning similar to this:

--- json/encoder.py
+++ json/encoder.py
@@ -1,6 +1,7 @@
 """Implementation of JSONEncoder
 """
 import re
+import warnings
 
 try:
     from _json import encode_basestring_ascii as c_encode_basestring_ascii
@@ -353,7 +354,9 @@
             items = sorted(dct.items(), key=lambda kv: kv[0])
         else:
             items = dct.items()
+        non_str_key = False
         for key, value in items:
+            non_str_key = non_str_key or not isinstance(key, str)
             if isinstance(key, str):
                 pass
             # JavaScript is weakly typed for these, so it makes sense to
@@ -403,6 +406,8 @@
                 else:
                     chunks = _iterencode(value, _current_indent_level)
                 yield from chunks
+        if non_str_key:
+            warnings.warn("Encountered non-string key(s), converted to string.", RuntimeWarning)
         if newline_indent is not None:
             _current_indent_level -= 1
             yield '\n' + _indent * _current_indent_level

@ericvsmith
Copy link
Member

I can't think of another place where we issue a warning for anything similar. I'm opposed to any changes here: it's clearly documented behavior.

It's like being surprised .ini files convert to strings: it's just how that format works.

@zooba
Copy link
Member

zooba commented Oct 14, 2018

Agreed with Eric. json.dump needs to produce valid JSON, which requires keys to be strings.

Try using pickle if you need to preserve full Python semantics.

@zooba zooba closed this as completed Oct 14, 2018
@zooba zooba added the invalid label Oct 14, 2018
@My-TienNguyen
Copy link
Mannequin Author

My-TienNguyen mannequin commented Oct 14, 2018

Sure, I can do that, but wanted to propose this regardless. I guess this is a disagreement on a language design level.
As a proponent of strong typing I wouldn’t have allowed non-string keys in the first place, and if they are allowed I would warn about conversion. This is also more aligned with the “explicit is better than implicit” principle.

@My-TienNguyen My-TienNguyen mannequin reopened this Oct 14, 2018
@My-TienNguyen My-TienNguyen mannequin removed the invalid label Oct 14, 2018
@My-TienNguyen My-TienNguyen mannequin closed this as completed Oct 14, 2018
@facundobatista
Copy link
Member

I understand (and agree with) the merits of automatically converting the int to str when dumping to a string.

However, this result really surprised me:

>>> json.dumps({1:2, "1":3})
'{"1": 2, "1": 3}'

Is it a valid JSON?

@Stub2
Copy link
Mannequin

Stub2 mannequin commented Apr 6, 2020

Similarly, keys can be lost entirely:

>>> json.dumps({1:2, 1.0:3})
'{"1": 3}'

@stub
Copy link
Mannequin

stub mannequin commented Apr 6, 2020

(sorry, my example is normal Python behavior. {1:1, 1.0:2} == {1:2} , {1.0:1} == {1:1} )

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants