Added 'default' decoder to ujson.dumps #37

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
3 participants

aelse commented Mar 9, 2012

Hi,

This series of patches adds the default option to ujson.dumps.

You're probably already aware that in other libraries the default option allows you to provide a function to have objects decoded by prior to serialization. eg.

>>> import json
>>> class Message(object):
...     msg = ''
...     def __init__(self, msg):
...         self.msg = msg
... 
>>> m = Message('test')
>>> json.dumps(m)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <__main__.Message object at 0xf5b3d0> is not JSON serializable
>>> json.dumps(m, default=lambda x: x.__dict__)
'{"msg": "test"}'

ujson already handles this particular object natively. Programmers who wish to decode their own objects (for whatever reason) do not get the convenience of default with ujson currently. This is more problematic for other libraries which support multiple json implementations and rely on this feature being available. Hence this request originated from work on the ZeroMQ python bindings. zeromq/pyzmq#188

I have included the patch, benchmark and unit tests in separate commits for convenience.

aelse added some commits Mar 9, 2012

Added 'default' object decoder arg to ujson.dumps
Allows providing an alternative function to decode an object
prior to JSON encoding. This is present in other JSON modules
(eg. simplejson, json).
Member

jskorpan commented Mar 9, 2012

Thanks for your contribution and your interest in ultrajson.
I'm deeply aware of that this kind of functionality should be available in ultrajson, I simply haven't found the time to implement it.

Would you be so kind as to add more tests:
What happens if default() throws exception?
What are the contracts of what value and types default() must return and what happens if it doesn't?
Sadly the design of ultrajson is prone to memory leaks if you change types during serialization

I'm also interest in the performance implications of this patch? Did you run the benchmarks?

aelse commented Mar 9, 2012

I can add further tests in a few days when i have time. To answer your questions,

  • The expected behaviour of default is that it should return the decoded object or raise TypeError
  • If default() throws an exception it is passed back up the chain

From help(json.dumps)

``default(obj)`` is a function that should return a serializable version
of obj or raise TypeError. The default simply raises TypeError.

I now believe that i've done the wrong thing in this implementation by always calling the default function. It is only meant to be called if the object isn't otherwise serializable.

From the simplejson code:

    If specified, default is a function that gets called for objects
    that can't otherwise be serialized.  It should return a JSON encodable
    version of the object or raise a ``TypeError``.

The patch needs to be modified to be called only if JSON_EncodeObject returns an exception (perhaps specifically a TypeError), in which case the object must be decoded and JSON_EncodeObject called again with the newly decoded object.

Regarding memory leaks. Correct me if i'm wrong, but the buffers are allocated within JSON_EncodeObject and encode in lib/ultrajsonenc.c. These functions never see anything but the already decoded object output by the default function, so in that sense we are not pulling the rug out from under it.

Obviously this patch needs some more work. If you have any other thoughts please let me know. I'll work on this during the week when I have a chance.

Member

jskorpan commented Mar 9, 2012

Memory leaks would more likely be reference leaks as we may forget to
release references to objects held in the TypeContext if we go and unwind
the stack chain.

//JT

Den 9 mars 2012 15:00 skrev Alexander Else <
reply@reply.github.com

:

I can add further tests in a few days when i have time. To answer your
questions,

  • The expected behaviour of default is that it should return the decoded
    object or raise TypeError
  • If default() throws an exception it is passed back up the chain

From help(json.dumps)

default(obj) is a function that should return a serializable version
of obj or raise TypeError. The default simply raises TypeError.

I now believe that i've done this wrong thing in this implementation by
always calling the default function. It is only meant to be called if the
object isn't otherwise serializable.

From the simplejson code:

   If specified, default is a function that gets called for objects
   that can't otherwise be serialized.  It should return a JSON

encodable
version of the object or raise a TypeError.

The patch needs to be modified to be called only if JSON_EncodeObject
returns an exception (perhaps specifically a TypeError), in which case the
object must be decoded and JSON_EncodeObject called again with the newly
decoded object.

Regarding memory leaks. Correct me if i'm wrong, but the buffers are
allocated within JSON_EncodeObject and encode in
lib/ultrajsonenc.c. These functions never see anything but the already
decoded object output by the default function, so in that sense we are not
pulling the rug out from under it.

Obviously this patch needs some more work. If you have any other thoughts
please let me know. I'll work on this during the week when I have a chance.


Reply to this email directly or view it on GitHub:
#37 (comment)

Jonas Trnstrm
Product Manager
e-mail: jonas.tarnstrom@esn.me
skype: full name "Jonas Trnstrm"
phone: +46 (0)734 231 552

ESN Social Software AB
www.esn.me

@jskorpan jskorpan closed this Apr 20, 2012

craigds commented May 24, 2012

What was the rationale for closing this? It looks like a really useful addition to ultrajson, without this it's difficult to use as a drop-in replacement for other json libs. Could the issues with memory leaks be resolved?

Member

jskorpan commented May 24, 2012

Probably,
The best solution how ever would be that the encoder step properly
supported this better then it does today. This would require some
refactoring which I'm currently not able to devote time to, sorry.

I'm in complete agreement that the feature would indeed be valuable.

//JT

2012/5/24 Craig de Stigter <
reply@reply.github.com

What was the rationale for closing this? It looks like a really useful
addition to ultrajson, without this it's difficult to use as a drop-in
replacement for other json libs. Could the issues with memory leaks be
resolved?


Reply to this email directly or view it on GitHub:
#37 (comment)

Jonas Tärnström
Product Manager
• e-mail: jonas.tarnstrom@esn.me
• skype: full name "Jonas Tärnström"
• phone: +46 (0)734 231 552

ESN Social Software AB
www.esn.me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment