Skip to content

Conversation

@ccazabon
Copy link

@ccazabon ccazabon commented Mar 30, 2024

This adds convert_keys as an optional parameter to the JSON encoder methods, which causes unsupported dict key types to be passed to the encoder's .default() method. They can then be encoded if .default() returns one of the supported JSON types.


📚 Documentation preview 📚: https://cpython-previews--117392.org.readthedocs.build/

… dictionary keys through the `.default()` method for conversion to a supported type.
…ry keys that are not of a JSON-supported simple type through the encoder's .default() method.
@ghost
Copy link

ghost commented Mar 30, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Mar 30, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bedevere-app
Copy link

bedevere-app bot commented Mar 30, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Comment on lines 1 to 2
Add `convert_keys` parameter to JSON encoding, to allow passing dict keys that
are not supported types in JSON through the encoder's `.default()` method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add `convert_keys` parameter to JSON encoding, to allow passing dict keys that
are not supported types in JSON through the encoder's `.default()` method.
Add ``convert_keys`` parameter to JSON encoding, to allow passing dict keys that
are not supported types in JSON through the encoder's ``.default()`` method.

@ccazabon
Copy link
Author

ccazabon commented Mar 30, 2024

Sorry for the noise; it took me a few tries to get the docs formatting correct.

Example of use: bytes, pathlib.Path, uuid.UUID, and ipaddress objects are "enough like" str that they are used as keys. Dumping such a dict to JSON doesn't work without this feature, which can be used like:

class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, bytes):
            return obj.hex(":")
        elif isinstance(obj, (pathlib.Path, uuid.UUID, ipaddress._IPAddressBase)):
            return str(obj)

        # Let the base class default method raise the TypeError
        return super().default(obj)

d = {
    b"foobar": "a bytes key",
    pathlib.Path("/tmp"): "a Path key",
    uuid.uuid4(): "a UUID key",
    ipaddress.ip_address("127.0.0.1"): "an IP address key",
}

>>> print(json.dumps(d, cls=CustomEncoder, indent=4, convert_keys=True))
{
    "66:6f:6f:62:61:72": "a bytes key",
    "/tmp": "a Path key",
    "724cd681-d8ea-4946-98bd-aecca19c0311": "a UUID key",
    "127.0.0.1": "an IP address key"
}

I've contributed before, many moons ago, but this is the first time I've added something to a C speedup module, so I may not have the refcounting correct. I'd appreciate corrections on this.

Copy link
Member

@gaogaotiantian gaogaotiantian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put whether we should have this feature aside, I don't believe your python implementation and C implementation are equivalent. In your Python encoder, the keys could be converted by the default function to bool or None, but in _make_iterencode they should all be converted to str. You did that in the C implementation.

There's some twisted logic here - basically whether you allow default to convert to "serializable" objects like int or bool. As the function is currently used for values, it's fine, but if you want to use it for both keys and values, that would raise some issue. Do you want to double-convert from those to strings? Then you'll have another corner case - there are serializable objects that can't be converted to "basic types", for example, tuples.

If we truly want to do this, I think a better approach is to simply pass a function for convert_keys which converts everything to a string. The implementation would be much cleaner as well with the clearer restriction.

@ccazabon
Copy link
Author

Put whether we should have this feature aside, I don't believe your python implementation and C implementation are equivalent. In your Python encoder, the keys could be converted by the default function to bool or None, but in _make_iterencode they should all be converted to str. You did that in the C implementation.

Thank you, you're right that the code in _iterencode_dict() should be producing strings for everything. Bbut I'm not completely clear on your comments about the default method. That method is supposed to produce anything that can be natively JSON serialized. So I think default is fine, but the new code calling it for keys needs to stringify the result, if one of the other supported types was returned.

That's easy enough to do - but would involve either repeating some code, or factoring it out into a separate method, and I'm not sure which would be preferred here. Thoughts?

There's some twisted logic here - basically whether you allow default to convert to "serializable" objects like int or bool. As the function is currently used for values, it's fine, but if you want to use it for both keys and values, that would raise some issue. Do you want to double-convert from those to strings?

Yes. The way default() is documented is clear and straightforward, and Python users are used to it. If they want to customize handling of a new type, they know how to do it via that method, and returning anything natively JSON-able is normal.

Then you'll have another corner case - there are serializable objects that can't be converted to "basic types", for example, tuples.

I don't really see this as a corner case. Presumably the author who is customizing serialization of a dict knows that if they want to convert keys from other types to int, bool, None, float, or str, they know how to handle this. The only way it would seem to be a problem is if they wanted to handle a specific type differently when it occurs as a value (serializing to a tuple/list, say) than when it occurs as a key (serialize to str). That seems like a pretty rare case, in which case the author can handle it differently (say, by not setting convert_keys and doing key conversion themselves, like they already have to do today).

If we truly want to do this, I think a better approach is to simply pass a function for convert_keys which converts everything to a string. The implementation would be much cleaner as well with the clearer restriction.

Separate conversion functions are certainly possible, but it looks to me like a rare case for which the additional complexity is perhaps more than necessary.

…ssing the key through `.default()` to a str.
@gaogaotiantian
Copy link
Member

That's easy enough to do - but would involve either repeating some code, or factoring it out into a separate method, and I'm not sure which would be preferred here. Thoughts?

The reason I thought about alternatives was because this, the duplicated code is quite ugly.

Yes. The way default() is documented is clear and straightforward, and Python users are used to it.

Your implementation actually deviates the documentation a bit:

If specified, default should be a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError. If not specified, TypeError is raised.

With convert_keys=True, the default function is used even when the object can be serialized, it's just that the object can't be serialized to a basic types that can be converted to keys.

For example,

def default(val):
    if isinstance(val, tuple):
        return str(val)
s = json.dumps(v, convert_keys=True, default=default)
print(s)

will be {"(1, 2)": [1, 2]}. The default function applied to the key, but not the value. I don't think this is what current Python user is used to - the default function never applies to tuple before.

I believe a separate convert function will make the code for CPython cleaner, not more complex - because you don't have to do the double conversion anymore.

For the user code, the only scenario that it's worse than the current implementation is when the user needs to convert a certain type to a basic type other than str and that type exists in both keys and values - then you lose the capability to "auto-convert". However, you also do not have the capability to "disable" the "auto-convert".

Maybe when using with cls set, it's better because there's an implicit default function assignment from the encoder class.

Personally I do not use customized json encoder that much so I believe we should hear from the json experts.

@ccazabon
Copy link
Author

Thank you, Tian - I appreciate your thoughts and the time you spent on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants