gh-117391: Add `convert_keys` to JSON encoder to enable encoding of dict keys not supported in JSON #117392

ccazabon · 2024-03-30T17:58:22Z

This adds convert_keys as an optional parameter to the JSON encoder methods, which causes unsupported dict key types to be passed to the encoder's .default() method. They can then be encoded if .default() returns one of the supported JSON types.

Issue: Allow JSONEncoder to handle passing unsupported dict keys through .default() before throwing TypeError #117391

📚 Documentation preview 📚: https://cpython-previews--117392.org.readthedocs.build/

… dictionary keys through the `.default()` method for conversion to a supported type.

…ry keys that are not of a JSON-supported simple type through the encoder's .default() method.

…vert_keys` as documented.

…ys` parameter.

ghost · 2024-03-30T17:58:24Z

All commit authors signed the Contributor License Agreement.

bedevere-app · 2024-03-30T17:58:28Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

bedevere-app · 2024-03-30T17:58:53Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Eclips4 · 2024-03-30T18:01:20Z

Misc/NEWS.d/next/Library/2024-03-30-11-42-00-13.gh-issue-117391.u7KmA9.rst

+Add `convert_keys` parameter to JSON encoding, to allow passing dict keys that
+are not supported types in JSON through the encoder's `.default()` method.


Suggested change

Add `convert_keys` parameter to JSON encoding, to allow passing dict keys that

are not supported types in JSON through the encoder's `.default()` method.

Add ``convert_keys`` parameter to JSON encoding, to allow passing dict keys that

are not supported types in JSON through the encoder's ``.default()`` method.

…on into json-encoder-convert-keys

ccazabon · 2024-03-30T19:48:06Z

Sorry for the noise; it took me a few tries to get the docs formatting correct.

Example of use: bytes, pathlib.Path, uuid.UUID, and ipaddress objects are "enough like" str that they are used as keys. Dumping such a dict to JSON doesn't work without this feature, which can be used like:

class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, bytes):
            return obj.hex(":")
        elif isinstance(obj, (pathlib.Path, uuid.UUID, ipaddress._IPAddressBase)):
            return str(obj)

        # Let the base class default method raise the TypeError
        return super().default(obj)

d = {
    b"foobar": "a bytes key",
    pathlib.Path("/tmp"): "a Path key",
    uuid.uuid4(): "a UUID key",
    ipaddress.ip_address("127.0.0.1"): "an IP address key",
}

>>> print(json.dumps(d, cls=CustomEncoder, indent=4, convert_keys=True))
{
    "66:6f:6f:62:61:72": "a bytes key",
    "/tmp": "a Path key",
    "724cd681-d8ea-4946-98bd-aecca19c0311": "a UUID key",
    "127.0.0.1": "an IP address key"
}

I've contributed before, many moons ago, but this is the first time I've added something to a C speedup module, so I may not have the refcounting correct. I'd appreciate corrections on this.

gaogaotiantian

Put whether we should have this feature aside, I don't believe your python implementation and C implementation are equivalent. In your Python encoder, the keys could be converted by the default function to bool or None, but in _make_iterencode they should all be converted to str. You did that in the C implementation.

There's some twisted logic here - basically whether you allow default to convert to "serializable" objects like int or bool. As the function is currently used for values, it's fine, but if you want to use it for both keys and values, that would raise some issue. Do you want to double-convert from those to strings? Then you'll have another corner case - there are serializable objects that can't be converted to "basic types", for example, tuples.

If we truly want to do this, I think a better approach is to simply pass a function for convert_keys which converts everything to a string. The implementation would be much cleaner as well with the clearer restriction.

ccazabon · 2024-03-30T23:39:48Z

Put whether we should have this feature aside, I don't believe your python implementation and C implementation are equivalent. In your Python encoder, the keys could be converted by the default function to bool or None, but in _make_iterencode they should all be converted to str. You did that in the C implementation.

Thank you, you're right that the code in _iterencode_dict() should be producing strings for everything. Bbut I'm not completely clear on your comments about the default method. That method is supposed to produce anything that can be natively JSON serialized. So I think default is fine, but the new code calling it for keys needs to stringify the result, if one of the other supported types was returned.

That's easy enough to do - but would involve either repeating some code, or factoring it out into a separate method, and I'm not sure which would be preferred here. Thoughts?

There's some twisted logic here - basically whether you allow default to convert to "serializable" objects like int or bool. As the function is currently used for values, it's fine, but if you want to use it for both keys and values, that would raise some issue. Do you want to double-convert from those to strings?

Yes. The way default() is documented is clear and straightforward, and Python users are used to it. If they want to customize handling of a new type, they know how to do it via that method, and returning anything natively JSON-able is normal.

Then you'll have another corner case - there are serializable objects that can't be converted to "basic types", for example, tuples.

I don't really see this as a corner case. Presumably the author who is customizing serialization of a dict knows that if they want to convert keys from other types to int, bool, None, float, or str, they know how to handle this. The only way it would seem to be a problem is if they wanted to handle a specific type differently when it occurs as a value (serializing to a tuple/list, say) than when it occurs as a key (serialize to str). That seems like a pretty rare case, in which case the author can handle it differently (say, by not setting convert_keys and doing key conversion themselves, like they already have to do today).

If we truly want to do this, I think a better approach is to simply pass a function for convert_keys which converts everything to a string. The implementation would be much cleaner as well with the clearer restriction.

Separate conversion functions are certainly possible, but it looks to me like a rare case for which the additional complexity is perhaps more than necessary.

…ssing the key through `.default()` to a str.

gaogaotiantian · 2024-03-31T01:37:28Z

That's easy enough to do - but would involve either repeating some code, or factoring it out into a separate method, and I'm not sure which would be preferred here. Thoughts?

The reason I thought about alternatives was because this, the duplicated code is quite ugly.

Yes. The way default() is documented is clear and straightforward, and Python users are used to it.

Your implementation actually deviates the documentation a bit:

If specified, default should be a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError. If not specified, TypeError is raised.

With convert_keys=True, the default function is used even when the object can be serialized, it's just that the object can't be serialized to a basic types that can be converted to keys.

For example,

def default(val):
    if isinstance(val, tuple):
        return str(val)
s = json.dumps(v, convert_keys=True, default=default)
print(s)

will be {"(1, 2)": [1, 2]}. The default function applied to the key, but not the value. I don't think this is what current Python user is used to - the default function never applies to tuple before.

I believe a separate convert function will make the code for CPython cleaner, not more complex - because you don't have to do the double conversion anymore.

For the user code, the only scenario that it's worse than the current implementation is when the user needs to convert a certain type to a basic type other than str and that type exists in both keys and values - then you lose the capability to "auto-convert". However, you also do not have the capability to "disable" the "auto-convert".

Maybe when using with cls set, it's better because there's an implicit default function assignment from the encoder class.

Personally I do not use customized json encoder that much so I believe we should hear from the json experts.

ccazabon · 2024-03-31T03:53:48Z

Thank you, Tian - I appreciate your thoughts and the time you spent on this.

ccazabon added 5 commits March 26, 2024 03:41

Add convert_keys param to JSONEncoder to support passing non-simple…

1ccc65f

… dictionary keys through the `.default()` method for conversion to a supported type.

Implement JSONEncoder convert_keys option to allow passing dictiona…

9cb2095

…ry keys that are not of a JSON-supported simple type through the encoder's .default() method.

Clean up and reorganize. Ensure skipkeys takes preference over `con…

c7dd457

…vert_keys` as documented.

Update module-level dump functions to support JSONEncoder `convert_ke…

dd29b85

…ys` parameter.

Add NEWS item.

3fff0e3

bedevere-app bot added the awaiting review label Mar 30, 2024

bedevere-app bot mentioned this pull request Mar 30, 2024

Allow JSONEncoder to handle passing unsupported dict keys through .default() before throwing TypeError #117391

Open

Merge branch 'main' into json-encoder-convert-keys

9f601fa

Eclips4 reviewed Mar 30, 2024

View reviewed changes

ccazabon added 4 commits March 30, 2024 12:01

Fix NEWS filename.

efbc753

Merge branch 'json-encoder-convert-keys' of github.com:ccazabon/cpyth…

28620a2

…on into json-encoder-convert-keys

Fix incorrect markup in docs.

626d140

Use double-ticks around parameter name in NEWS file.

b00ab88

gaogaotiantian reviewed Mar 30, 2024

View reviewed changes

Fix Python version of convert_key not always turning the result of pa…

f09af2e

…ssing the key through `.default()` to a str.

gaogaotiantian mentioned this pull request Apr 7, 2024

json's default callable/method ignores keys. #117592

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-117391: Add `convert_keys` to JSON encoder to enable encoding of dict keys not supported in JSON #117392

gh-117391: Add `convert_keys` to JSON encoder to enable encoding of dict keys not supported in JSON #117392

Uh oh!

ccazabon commented Mar 30, 2024 •

edited by github-actions bot

Loading

Uh oh!

ghost commented Mar 30, 2024 •

edited by ghost

Loading

Uh oh!

bedevere-app bot commented Mar 30, 2024

Uh oh!

bedevere-app bot commented Mar 30, 2024

Uh oh!

Eclips4 Mar 30, 2024

Uh oh!

ccazabon commented Mar 30, 2024 •

edited

Loading

Uh oh!

gaogaotiantian left a comment

Uh oh!

ccazabon commented Mar 30, 2024

Uh oh!

gaogaotiantian commented Mar 31, 2024

Uh oh!

ccazabon commented Mar 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		Add `convert_keys` parameter to JSON encoding, to allow passing dict keys that
		are not supported types in JSON through the encoder's `.default()` method.

Uh oh!

gh-117391: Add convert_keys to JSON encoder to enable encoding of dict keys not supported in JSON #117392

Are you sure you want to change the base?

gh-117391: Add convert_keys to JSON encoder to enable encoding of dict keys not supported in JSON #117392

Uh oh!

Conversation

ccazabon commented Mar 30, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Mar 30, 2024 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-app bot commented Mar 30, 2024

Uh oh!

bedevere-app bot commented Mar 30, 2024

Uh oh!

Eclips4 Mar 30, 2024

Choose a reason for hiding this comment

Uh oh!

ccazabon commented Mar 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gaogaotiantian left a comment

Choose a reason for hiding this comment

Uh oh!

ccazabon commented Mar 30, 2024

Uh oh!

gaogaotiantian commented Mar 31, 2024

Uh oh!

ccazabon commented Mar 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gh-117391: Add `convert_keys` to JSON encoder to enable encoding of dict keys not supported in JSON #117392

gh-117391: Add `convert_keys` to JSON encoder to enable encoding of dict keys not supported in JSON #117392

ccazabon commented Mar 30, 2024 •

edited by github-actions bot

Loading

ghost commented Mar 30, 2024 •

edited by ghost

Loading

ccazabon commented Mar 30, 2024 •

edited

Loading