Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Use singledispatch for registering additional custom serializer functions #203

Closed
wants to merge 1 commit into from

Conversation

blag
Copy link

@blag blag commented Jan 22, 2018

Using json.dumps() gets slightly complicated when you try to serialize more complex objects, like datetime.datetime or uuid.UUID objects, as simplejson and the built-in json module do not know how to serialize these objects:

import simplejson as json
import uuid
from datetime import datetime

json.dumps({'now': datetime.now()})

# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File ".../simplejson/__init__.py", line 382, in dumps
#     return _default_encoder.encode(obj)
#   File ".../simplejson/encoder.py", line 385, in encode
#     chunks = list(chunks)
#   File ".../simplejson/encoder.py", line 770, in _iterencode
#     for chunk in _iterencode_dict(o, _current_indent_level):
#   File ".../simplejson/encoder.py", line 727, in _iterencode_dict
#     for chunk in chunks:
#   File ".../simplejson/encoder.py", line 790, in _iterencode
#     o = _default(o)
#   File ".../simplejson/encoder.py", line 88, in _method
#     return method.__get__(instance, owner)(*args, **kwargs)
#   File ".../simplejson/encoder.py", line 360, in default
#     o.__class__.__name__)
# TypeError: Object of type datetime is not JSON serializable

json.dumps({'uuid': uuid.uuid4()})
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File ".../simplejson/__init__.py", line 382, in dumps
#     return _default_encoder.encode(obj)
#   File ".../simplejson/encoder.py", line 385, in encode
#     chunks = list(chunks)
#   File ".../simplejson/encoder.py", line 770, in _iterencode
#     for chunk in _iterencode_dict(o, _current_indent_level):
#   File ".../simplejson/encoder.py", line 727, in _iterencode_dict
#     for chunk in chunks:
#   File ".../simplejson/encoder.py", line 790, in _iterencode
#     o = _default(o)
#   File ".../simplejson/encoder.py", line 88, in _method
#     return method.__get__(instance, owner)(*args, **kwargs)
#   File ".../simplejson/encoder.py", line 360, in default
#     o.__class__.__name__)
# TypeError: Object of type UUID is not JSON serializable

There are two workarounds for this:

  1. Subclass simplejson.encoder.JSONEncoder and override its default method, then explicitly call JSONEncoder.encode() everywhere (this is exactly what Django does).

  2. Define a function that accepts a single argument and serializes it appropriately, then call json.dumps() with the default argument everywhere:

    def my_encoder(obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        elif isinstance(obj, uuid.UUID):
            return str(obj)
        else:
            raise TypeError("Cannot serialize object of type '%s'" % (type(obj)))
    
    json.dumps({'now': datetime.now(), 'uuid': uuid.uuid4()}, default=my_encoder)
    
    # '{"now": "2018-01-21T12:34:42.993445", "uuid": "5f9a8bbe-d3b2-4a32-8bc0-83ac87cb195c"}'

Both of these workarounds require users to modify/override the custom serializer function rather than extending it, and then to use the modified code everywhere instead of directly using the built-in json module or simplejson. This generally means frameworks like Django force all users to use their own serializers with their own serializer functions. It's also rather confusing for new Django users, who ask questions like this on Stack Overflow "How do I encode a UUID to make it JSON serializable?".

Furthermore, for a non-trivial project, the custom default function can become unwieldy and begins to look exactly like the usecase for singledispatch.

This PR simply uses a modified singledispatch (called singledispatchmethod) decorator to extend the default function for specific object types. This makes the simplejson.encoder.JSONEncoder.default method better adhere to the Open/Closed Principle - open to extension but closed to modification (or override). This decorator should also be completely backwards compatible - subclasses of JSONEncoder can still override it, and calls to simplejson.dumps() with a default= argument will still use the given custom encoder function.

This change allows frameworks to define and register those functions anywhere that is automatically imported (like, for instance, django/__init__.py):

import simplejson as json
import uuid
from datetime import datetime

@json.JSONEncoder.default.register(datetime)
def jsonify_datetime(encoder, dt):
    return dt.isoformat()

@json.JSONEncoder.default.register(uuid.UUID)
def jsonify_uuid(encoder, uuid_):
    return str(uuid_)

That's it - now every user of Django can simply call simplejson.dumps() directly and it will be able to properly serialize datetimes and UUIDs:

import simplejson as json
import uuid
from datetime import datetime


json.dumps({'now': datetime.now(), 'uuid': uuid.uuid4()})
# '{"now": "2018-01-22T03:00:49.847345", "uuid": "629a7f1e-3eb6-4bf3-a6cf-6717910e4a92"}'

Now, I realize that Django is a huge framework, and it has DRF, which implements its own framework for registering serializers. But I think this PR has the potential to greatly simplify framework code.

The singledispatch decorator does not work on instance methods, but there is a PR to fix this. I have included the two implementations (one naïve, one that is more complex but handles corner cases well) in this PR FOR REFERENCE ONLY. I will be removing them and replacing them the proper imports from functools once PR python/cpython#4987 or something like it is merged.

This PR should also fix or obviate the needs to fix the following issues and PRs:

I'm happy to respond to any criticisms or answer any questions. I would eventually like to push this API into Python's built-in json module, but it's written in C and I wanted to get feedback on this Python implementation before I do that.

Thanks!

@blag
Copy link
Author

blag commented Jan 22, 2018

I’ll fix the builds for Python < 3.4 tonight.

@etrepum
Copy link
Member

etrepum commented Jan 22, 2018

Subclassing JSONEncoder/JSONDecoder has been effectively deprecated for years, I wouldn't recommend going back to that approach.

The reason this hasn't been done in the past is that having a global registry for these sorts of things tends to end in disaster, as different libraries have different and often conflicting expectations of what should happen for a given type. I would recommend that each framework have its own registry, separate from the simplejson or json library.

@blag
Copy link
Author

blag commented Jan 22, 2018

@etrepum I’m glad to hear that it’s been deprecated, but the docstrings direct users to subclass (in the original encoder.py, lines 127-130 and 250-264, sorry I’m on mobile or else I would link those lines). Those should be changed to whatever is considered best practice nowadays. I assume that it’s passing in the default argument to json.dumps(), is that correct?

I agree that a global registry is the wrong way to do things - that’s not what this PR implements. This PR makes it easier for each framework to create their own serialization registry with a minimum of boilerplate.

@etrepum
Copy link
Member

etrepum commented Jan 23, 2018

Can you explain to me how this is not a global registry?

@blag
Copy link
Author

blag commented Jan 23, 2018

Can you explain to me how this is not a global registry?

This PR does not force any serialization choices on any users of simplejson. Users are still free to define whatever serializations they wish, or to not define anything and use all of their previous serializers, or any of their frameworks' serializers. I'm not prescribing a certain serialization format here, I'm giving users of simplejson the ability to set their own serialization formats for themselves.

I would recommend that each framework have its own registry, separate from the simplejson or json library.

Yep, I agree. That is precisely what this PR enables. This PR has the added benefit of allowing the user to very elegantly write an extension for serializing an object their framework doesn't know how to handle.

Django's JSON serialization code could be rewritten to be:

@json.JSONEncoder.default.register(datetime.datetime)
def _(encoder, o):
    # See "Date Time String Format" in the ECMA-262 specification.
    r = o.isoformat()
    if o.microsecond:
        r = r[:23] + r[26:]
    if r.endswith('+00:00'):
        r = r[:-6] + 'Z'
    return r

@json.JSONEncoder.default.register(datetime.date)
def _(encoder, o):
    return o.isoformat()

@json.JSONEncoder.default.register(datetime.time)
def _(encoder, o):
    if is_aware(o):
        raise ValueError("JSON can't represent timezone-aware times.")
    r = o.isoformat()
    if o.microsecond:
        r = r[:12]
    return r

@json.JSONEncoder.default.register(datetime.timedelta)
def _(encoder, o):
    return duration_iso_string(o)

@json.JSONEncoder.default.register(decimal.Decimal)
@json.JSONEncoder.default.register(uuid.UUID)
@json.JSONEncoder.default.register(Promise)
def _(encoder, o):
    return str(o)

Note that they don't have to subclass JSONEncoder anywhere, and they no longer need to have their custom serialization API and force all users to use it and only it. Django users could simply call json.dumps() with their objects.

Then if a user needed to serialize a custom field type, they would simply have to:

@json.JSONEncoder.default.register(MyCustomType)
def _(encoder, o):
    return do_something_smart(o)

And ALL code in their entire project - even code that wasn't written to explicitly know how to serialize MyCustomType objects - would be able to serialize them without ANY changes.

I hope that makes sense. I'm happy to respond to any more questions.

@etrepum
Copy link
Member

etrepum commented Jan 23, 2018

The global registry issue is such that if you have two parts of the same application that have different demands then you're at an impasse. The code you're proposing will make that the status quo. There are many types that have more than one "obvious" encoding and libraries can and do disagree. In some cases, you may need some part of your program to behave a bit differently to accommodate a third party protocol.

I like the idea of having a nicer way to "compose" default functions, and to have it open for further extension, but putting a singleton registry in the simplejson or json library is going to cause problems for people.

Consider also that many people are going to have both the json and simplejson modules installed and most libraries are only going to import and register handlers for one of them.

I think the way forward for this proposal is to implement a separate package that provides a factory for, and maybe even a default singleton instance of, such a registry. This way, it can be passed down to either json or simplejson and would be expected to work correctly in either case. You can even provide a wrapper methods for loads/dumps that dispatch to whichever library is preferred to save people the trouble of explicitly passing in the default argument.

@blag blag force-pushed the singledispatch-default branch 2 times, most recently from cdc96bb to 66b155e Compare January 29, 2018 02:52
@blag
Copy link
Author

blag commented Jan 29, 2018

The global registry issue is such that if you have two parts of the same application that have different demands then you're at an impasse.

No, one or both parts could still specify the default function to use, which would overwrite the built-in singledispatched one when the encoder is instantiated. And keep in mind that all existing code is either doing this already or is subclassing JSONEncoder (which would also overwrite the singledispatched method).

The code you're proposing will make that the status quo.

Incorrect - this PR preserves backwards compatibility with both subclassing, and with custom default arguments.

There are many types that have more than one "obvious" encoding and libraries can and do disagree.

Sure, that's why I'm not prescribing any way of doing so. And end users still be able to control this, either by re-registering their own "correct" encoding function, or by subclassing JSONEncoder, or passing in a default function. This PR simply makes it easier for frameworks to help their users.

In some cases, you may need some part of your program to behave a bit differently to accommodate a third party protocol.

Yes. And in all existing code, this is accomplished by subclassing and/or use of default, which this PR still supports in a backwards compatible way. For all future code, that functionality remains fully supported, useful, and usable.

Consider also that many people are going to have both the json and simplejson modules installed and most libraries are only going to import and register handlers for one of them.

I will accept this criticism, however, I think it's a little weird to be using both libraries. Also, consider the fact that you can overwrite the default methods in both json and simplejson:

@singledispatchmethod
def default(self, obj):
    raise TypeError('Object of type %s is not JSON serializable' % obj.__class__.__name__)

json.JSONEncoder.default = default
simplejson.JSONEncoder.default = default

@default.register(datetime)
def _(encoder, dt):
    return dt.isoformat()

@default.register(uuid.UUID)
def _(encoder, uuid_):
    return str(uuid_)

Serializing with both json and simplejson produce the same result:

json.dumps({'datetime': datetime.now(), 'uuid': uuid.uuid4()})
# '{"datetime": "2018-01-28T19:24:51.343936", "uuid": "51571cbe-9462-41ab-be6a-5002aad1d6e3"}'
simplejson.dumps({'datetime': datetime.now(), 'uuid': uuid.uuid4()})
# '{"datetime": "2018-01-28T19:25:35.301389", "uuid": "323f2138-00d3-4ce8-81d0-2f4b19706319"}'

Passing in a default argument still works as expected:

def explode(obj):
    raise NotImplementedError()

json.dumps({'datetime': datetime.now(), 'uuid': uuid.uuid4()}, default=explode)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File ".../python3.6/json/__init__.py", line 238, in dumps
#     **kw).encode(obj)
#   File ".../python3.6/json/encoder.py", line 199, in encode
#     chunks = self.iterencode(o, _one_shot=True)
#   File ".../python3.6/json/encoder.py", line 257, in iterencode
#     return _iterencode(o, 0)
#   File "<stdin>", line 2, in explode
# NotImplementedError

simplejson.dumps({'datetime': datetime.now(), 'uuid': uuid.uuid4()}, default=explode)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File ".../simplejson/__init__.py", line 399, in dumps
#     **kw).encode(obj)
#   File ".../simplejson/encoder.py", line 387, in encode
#     chunks = list(chunks)
#   File ".../simplejson/encoder.py", line 772, in _iterencode
#     for chunk in _iterencode_dict(o, _current_indent_level):
#   File ".../simplejson/encoder.py", line 729, in _iterencode_dict
#     for chunk in chunks:
#   File ".../simplejson/encoder.py", line 792, in _iterencode
#     o = _default(o)
#   File "<stdin>", line 2, in explode
# NotImplementedError

And subclassing still works as expected:

class HalfDoneJSONEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, uuid.UUID):
            return str(o)
        else:
            raise TypeError('Object of type %s is not JSON serializable' % o.__class__.__name__)

HalfDoneJSONEncoder().encode({'datetime': datetime.now(), 'uuid': uuid.uuid4()})
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File ".../python3.6/json/encoder.py", line 199, in encode
#     chunks = self.iterencode(o, _one_shot=True)
#   File ".../python3.6/json/encoder.py", line 257, in iterencode
#     return _iterencode(o, 0)
#   File "<stdin>", line 6, in default
# TypeError: Object of type datetime is not JSON serializable

class HalfDoneSimpleJSONEncoder(simplejson.JSONEncoder):
    def default(self, o):
        if isinstance(o, datetime):
            return o.isoformat()
        else:
            raise TypeError('Object of type %s is not JSON serializable' % o.__class__.__name__)

HalfDoneSimpleJSONEncoder().encode({'datetime': datetime.now(), 'uuid': uuid.uuid4()})
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File ".../simplejson/encoder.py", line 387, in encode
#     chunks = list(chunks)
#   File ".../simplejson/encoder.py", line 772, in _iterencode
#     for chunk in _iterencode_dict(o, _current_indent_level):
#   File ".../simplejson/encoder.py", line 729, in _iterencode_dict
#     for chunk in chunks:
#   File ".../simplejson/encoder.py", line 792, in _iterencode
#     o = _default(o)
#   File "<stdin>", line 6, in default
# TypeError: Object of type UUID is not JSON serializable

So it is absolutely possible to support both already.

I think the way forward for this proposal is to implement a separate package that provides a factory for, and maybe even a default singleton instance of, such a registry. This way, it can be passed down to either json or simplejson and would be expected to work correctly in either case. You can even provide a wrapper methods for loads/dumps that dispatch to whichever library is preferred to save people the trouble of explicitly passing in the default argument.

I think this would be all that's necessary:

def singledispatchmethod(...):
    ...

@singledispatchmethod
def default(encoder, obj):
    raise TypeError('Object of type %s is not JSON serializable' % obj.__class__.__name__)

import json
try:
    import simplejson
except ImportError:
    pass
else:
    simplejson.JSONEncoder.default = default

json.JSONEncoder.default = default

@etrepum
Copy link
Member

etrepum commented Jan 29, 2018

I think you misunderstand the global registry issue.

Include library A which registers a global default for SomeType (and likely does not expose whatever code it has to register it to a different registry)
Include library B which registers a global default for SomeType (and likely does not expose whatever code it has to register it to a different registry)

Your application needs to pick between library A and library B's custom serialization for SomeType depending on which third party service you're talking to. You're stuck.

@etrepum
Copy link
Member

etrepum commented Jan 29, 2018

This style of code is what's typically referred to as monkeypatching and is not a very common Python idiom, since it's not composable.

json.JSONEncoder.default = default
simplejson.JSONEncoder.default = default

@blag
Copy link
Author

blag commented Feb 4, 2018

I think you misunderstand the global registry issue.

This is possible.

Include library A which registers a global default for SomeType (and likely does not expose whatever code it has to register it to a different registry)
Include library B which registers a global default for SomeType (and likely does not expose whatever code it has to register it to a different registry)

At least for this PR, this code does not work that way. Here, library B would not register an additional hook, it would overwrite the hook that was registered by library A. There's only ever one dispatch registered for SomeType objects - subsequent registrations overwrite previous registrations.

Your application needs to pick between library A and library B's custom serialization for SomeType depending on which third party service you're talking to. You're stuck.

Nope, you aren't stuck. This would only be an issue if you cannot choose the order in which libraries are imported. For this PR, whichever library was loaded last would overwrite the previous registrations for the same types.

Also, this is easily fixed - just register your own function after library A and library B are both imported. You can make this guaranteed by:

import libraryA  # Ensure libraryA is imported (subsequent imports are cached in CPython)
import libraryB  # Ensure libraryB is imported
import json

chosen_library = libraryA  # or libraryB

@json.JSONEncoder.default.register(SomeType)
def _(encoder, sometype_obj):
    return chosen_library.encode_json(sometype_obj)

Furthermore, I think this is largely a red herring. This feature is most useful for frameworks, and it's very unlikely users will be using more than one framework at a time.

But, for the small amount of users who may be using both (maybe they're transitioning a Django project to a Flask or Falcon project and for some reason need to encode things differently), they can still force the framework to use a specific encoder by either passing in their own default argument or by subclassing simplejson.JSONEncoder and using that.

This style of code is what's typically referred to as monkeypatching and is not a very common Python idiom, since it's not composable.

That's true, and that's a valid criticism, but all I was illustrating is that users can workaround the fact that both json and simplejson need a common registry when they are both used in a project. I'm not sure how common that is. You would have a better idea of how common that is.

@etrepum
Copy link
Member

etrepum commented Feb 5, 2018

Relying on import order does not solve any problem nor should it ever be considered an acceptable workaround. Imports are only evaluated once (except for explicit reloads) so it’s not possible to ever have one program to choose both behaviors depending on needs. This is a fictional example, but consider if you have a Stripe library that wants timestamps serialized one way and a HelloSign library that wants them another you can’t easily implement both we hook endpoints correctly.

Given what I’ve seen with other languages I think you’re underestimating the problem here. It won’t be only used by frameworks, it’ll potentially be used by anything that needs to deal in JSON.

I’m going to go ahead and close this, I think the best path forward is to distribute this functionality separately from simplejson. We can consider something more closely integrated after a year or two of proven demand after best practices are well established.

@etrepum etrepum closed this Feb 5, 2018
@merwok
Copy link

merwok commented Feb 5, 2018

This is an interesting discussion, and goes to the heart of the usability of singledispatch for decentralized serialization. I would suggest writing to the python-ideas mailing list to have a discussion about using singledispatch for the stdlib json module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants