-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Use singledispatch for registering additional custom serializer functions #203
Conversation
I’ll fix the builds for Python < 3.4 tonight. |
Subclassing JSONEncoder/JSONDecoder has been effectively deprecated for years, I wouldn't recommend going back to that approach. The reason this hasn't been done in the past is that having a global registry for these sorts of things tends to end in disaster, as different libraries have different and often conflicting expectations of what should happen for a given type. I would recommend that each framework have its own registry, separate from the simplejson or json library. |
@etrepum I’m glad to hear that it’s been deprecated, but the docstrings direct users to subclass (in the original I agree that a global registry is the wrong way to do things - that’s not what this PR implements. This PR makes it easier for each framework to create their own serialization registry with a minimum of boilerplate. |
Can you explain to me how this is not a global registry? |
This PR does not force any serialization choices on any users of simplejson. Users are still free to define whatever serializations they wish, or to not define anything and use all of their previous serializers, or any of their frameworks' serializers. I'm not prescribing a certain serialization format here, I'm giving users of simplejson the ability to set their own serialization formats for themselves.
Yep, I agree. That is precisely what this PR enables. This PR has the added benefit of allowing the user to very elegantly write an extension for serializing an object their framework doesn't know how to handle. Django's JSON serialization code could be rewritten to be: @json.JSONEncoder.default.register(datetime.datetime)
def _(encoder, o):
# See "Date Time String Format" in the ECMA-262 specification.
r = o.isoformat()
if o.microsecond:
r = r[:23] + r[26:]
if r.endswith('+00:00'):
r = r[:-6] + 'Z'
return r
@json.JSONEncoder.default.register(datetime.date)
def _(encoder, o):
return o.isoformat()
@json.JSONEncoder.default.register(datetime.time)
def _(encoder, o):
if is_aware(o):
raise ValueError("JSON can't represent timezone-aware times.")
r = o.isoformat()
if o.microsecond:
r = r[:12]
return r
@json.JSONEncoder.default.register(datetime.timedelta)
def _(encoder, o):
return duration_iso_string(o)
@json.JSONEncoder.default.register(decimal.Decimal)
@json.JSONEncoder.default.register(uuid.UUID)
@json.JSONEncoder.default.register(Promise)
def _(encoder, o):
return str(o) Note that they don't have to subclass Then if a user needed to serialize a custom field type, they would simply have to: @json.JSONEncoder.default.register(MyCustomType)
def _(encoder, o):
return do_something_smart(o) And ALL code in their entire project - even code that wasn't written to explicitly know how to serialize I hope that makes sense. I'm happy to respond to any more questions. |
The global registry issue is such that if you have two parts of the same application that have different demands then you're at an impasse. The code you're proposing will make that the status quo. There are many types that have more than one "obvious" encoding and libraries can and do disagree. In some cases, you may need some part of your program to behave a bit differently to accommodate a third party protocol. I like the idea of having a nicer way to "compose" default functions, and to have it open for further extension, but putting a singleton registry in the simplejson or json library is going to cause problems for people. Consider also that many people are going to have both the json and simplejson modules installed and most libraries are only going to import and register handlers for one of them. I think the way forward for this proposal is to implement a separate package that provides a factory for, and maybe even a default singleton instance of, such a registry. This way, it can be passed down to either json or simplejson and would be expected to work correctly in either case. You can even provide a wrapper methods for loads/dumps that dispatch to whichever library is preferred to save people the trouble of explicitly passing in the default argument. |
cdc96bb
to
66b155e
Compare
66b155e
to
026c183
Compare
No, one or both parts could still specify the
Incorrect - this PR preserves backwards compatibility with both subclassing, and with custom
Sure, that's why I'm not prescribing any way of doing so. And end users still be able to control this, either by re-registering their own "correct" encoding function, or by subclassing
Yes. And in all existing code, this is accomplished by subclassing and/or use of
I will accept this criticism, however, I think it's a little weird to be using both libraries. Also, consider the fact that you can overwrite the @singledispatchmethod
def default(self, obj):
raise TypeError('Object of type %s is not JSON serializable' % obj.__class__.__name__)
json.JSONEncoder.default = default
simplejson.JSONEncoder.default = default
@default.register(datetime)
def _(encoder, dt):
return dt.isoformat()
@default.register(uuid.UUID)
def _(encoder, uuid_):
return str(uuid_) Serializing with both json.dumps({'datetime': datetime.now(), 'uuid': uuid.uuid4()})
# '{"datetime": "2018-01-28T19:24:51.343936", "uuid": "51571cbe-9462-41ab-be6a-5002aad1d6e3"}'
simplejson.dumps({'datetime': datetime.now(), 'uuid': uuid.uuid4()})
# '{"datetime": "2018-01-28T19:25:35.301389", "uuid": "323f2138-00d3-4ce8-81d0-2f4b19706319"}' Passing in a def explode(obj):
raise NotImplementedError()
json.dumps({'datetime': datetime.now(), 'uuid': uuid.uuid4()}, default=explode)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File ".../python3.6/json/__init__.py", line 238, in dumps
# **kw).encode(obj)
# File ".../python3.6/json/encoder.py", line 199, in encode
# chunks = self.iterencode(o, _one_shot=True)
# File ".../python3.6/json/encoder.py", line 257, in iterencode
# return _iterencode(o, 0)
# File "<stdin>", line 2, in explode
# NotImplementedError
simplejson.dumps({'datetime': datetime.now(), 'uuid': uuid.uuid4()}, default=explode)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File ".../simplejson/__init__.py", line 399, in dumps
# **kw).encode(obj)
# File ".../simplejson/encoder.py", line 387, in encode
# chunks = list(chunks)
# File ".../simplejson/encoder.py", line 772, in _iterencode
# for chunk in _iterencode_dict(o, _current_indent_level):
# File ".../simplejson/encoder.py", line 729, in _iterencode_dict
# for chunk in chunks:
# File ".../simplejson/encoder.py", line 792, in _iterencode
# o = _default(o)
# File "<stdin>", line 2, in explode
# NotImplementedError And subclassing still works as expected: class HalfDoneJSONEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, uuid.UUID):
return str(o)
else:
raise TypeError('Object of type %s is not JSON serializable' % o.__class__.__name__)
HalfDoneJSONEncoder().encode({'datetime': datetime.now(), 'uuid': uuid.uuid4()})
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File ".../python3.6/json/encoder.py", line 199, in encode
# chunks = self.iterencode(o, _one_shot=True)
# File ".../python3.6/json/encoder.py", line 257, in iterencode
# return _iterencode(o, 0)
# File "<stdin>", line 6, in default
# TypeError: Object of type datetime is not JSON serializable
class HalfDoneSimpleJSONEncoder(simplejson.JSONEncoder):
def default(self, o):
if isinstance(o, datetime):
return o.isoformat()
else:
raise TypeError('Object of type %s is not JSON serializable' % o.__class__.__name__)
HalfDoneSimpleJSONEncoder().encode({'datetime': datetime.now(), 'uuid': uuid.uuid4()})
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File ".../simplejson/encoder.py", line 387, in encode
# chunks = list(chunks)
# File ".../simplejson/encoder.py", line 772, in _iterencode
# for chunk in _iterencode_dict(o, _current_indent_level):
# File ".../simplejson/encoder.py", line 729, in _iterencode_dict
# for chunk in chunks:
# File ".../simplejson/encoder.py", line 792, in _iterencode
# o = _default(o)
# File "<stdin>", line 6, in default
# TypeError: Object of type UUID is not JSON serializable So it is absolutely possible to support both already.
I think this would be all that's necessary: def singledispatchmethod(...):
...
@singledispatchmethod
def default(encoder, obj):
raise TypeError('Object of type %s is not JSON serializable' % obj.__class__.__name__)
import json
try:
import simplejson
except ImportError:
pass
else:
simplejson.JSONEncoder.default = default
json.JSONEncoder.default = default |
I think you misunderstand the global registry issue. Include library A which registers a global default for SomeType (and likely does not expose whatever code it has to register it to a different registry) Your application needs to pick between library A and library B's custom serialization for SomeType depending on which third party service you're talking to. You're stuck. |
This style of code is what's typically referred to as monkeypatching and is not a very common Python idiom, since it's not composable.
|
This is possible.
At least for this PR, this code does not work that way. Here, library B would not register an additional hook, it would overwrite the hook that was registered by library A. There's only ever one dispatch registered for
Nope, you aren't stuck. This would only be an issue if you cannot choose the order in which libraries are imported. For this PR, whichever library was loaded last would overwrite the previous registrations for the same types. Also, this is easily fixed - just register your own function after library A and library B are both imported. You can make this guaranteed by: import libraryA # Ensure libraryA is imported (subsequent imports are cached in CPython)
import libraryB # Ensure libraryB is imported
import json
chosen_library = libraryA # or libraryB
@json.JSONEncoder.default.register(SomeType)
def _(encoder, sometype_obj):
return chosen_library.encode_json(sometype_obj) Furthermore, I think this is largely a red herring. This feature is most useful for frameworks, and it's very unlikely users will be using more than one framework at a time. But, for the small amount of users who may be using both (maybe they're transitioning a Django project to a Flask or Falcon project and for some reason need to encode things differently), they can still force the framework to use a specific encoder by either passing in their own
That's true, and that's a valid criticism, but all I was illustrating is that users can workaround the fact that both |
Relying on import order does not solve any problem nor should it ever be considered an acceptable workaround. Imports are only evaluated once (except for explicit reloads) so it’s not possible to ever have one program to choose both behaviors depending on needs. This is a fictional example, but consider if you have a Stripe library that wants timestamps serialized one way and a HelloSign library that wants them another you can’t easily implement both we hook endpoints correctly. Given what I’ve seen with other languages I think you’re underestimating the problem here. It won’t be only used by frameworks, it’ll potentially be used by anything that needs to deal in JSON. I’m going to go ahead and close this, I think the best path forward is to distribute this functionality separately from simplejson. We can consider something more closely integrated after a year or two of proven demand after best practices are well established. |
This is an interesting discussion, and goes to the heart of the usability of singledispatch for decentralized serialization. I would suggest writing to the python-ideas mailing list to have a discussion about using singledispatch for the stdlib json module. |
Using
json.dumps()
gets slightly complicated when you try to serialize more complex objects, likedatetime.datetime
oruuid.UUID
objects, as simplejson and the built-in json module do not know how to serialize these objects:There are two workarounds for this:
Subclass
simplejson.encoder.JSONEncoder
and override itsdefault
method, then explicitly callJSONEncoder.encode()
everywhere (this is exactly what Django does).Define a function that accepts a single argument and serializes it appropriately, then call
json.dumps()
with thedefault
argument everywhere:Both of these workarounds require users to modify/override the custom serializer function rather than extending it, and then to use the modified code everywhere instead of directly using the built-in json module or simplejson. This generally means frameworks like Django force all users to use their own serializers with their own serializer functions. It's also rather confusing for new Django users, who ask questions like this on Stack Overflow "How do I encode a UUID to make it JSON serializable?".
Furthermore, for a non-trivial project, the custom
default
function can become unwieldy and begins to look exactly like the usecase forsingledispatch
.This PR simply uses a modified
singledispatch
(calledsingledispatchmethod
) decorator to extend thedefault
function for specific object types. This makes thesimplejson.encoder.JSONEncoder.default
method better adhere to the Open/Closed Principle - open to extension but closed to modification (or override). This decorator should also be completely backwards compatible - subclasses ofJSONEncoder
can still override it, and calls tosimplejson.dumps()
with adefault=
argument will still use the given custom encoder function.This change allows frameworks to define and register those functions anywhere that is automatically imported (like, for instance,
django/__init__.py
):That's it - now every user of Django can simply call
simplejson.dumps()
directly and it will be able to properly serialize datetimes and UUIDs:Now, I realize that Django is a huge framework, and it has DRF, which implements its own framework for registering serializers. But I think this PR has the potential to greatly simplify framework code.
The
singledispatch
decorator does not work on instance methods, but there is a PR to fix this. I have included the two implementations (one naïve, one that is more complex but handles corner cases well) in this PR FOR REFERENCE ONLY. I will be removing them and replacing them the proper imports from functools once PR python/cpython#4987 or something like it is merged.This PR should also fix or obviate the needs to fix the following issues and PRs:
I'm happy to respond to any criticisms or answer any questions. I would eventually like to push this API into Python's built-in
json
module, but it's written in C and I wanted to get feedback on this Python implementation before I do that.Thanks!