Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json needs object_pairs_hook #49631

Closed
rhettinger opened this issue Feb 27, 2009 · 15 comments
Closed

json needs object_pairs_hook #49631

rhettinger opened this issue Feb 27, 2009 · 15 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@rhettinger
Copy link
Contributor

BPO 5381
Nosy @rhettinger, @etrepum, @mitsuhiko
Files
  • json_hook.diff: proof-of-concept patch: object_pair_hook()
  • json_hook.diff: pairs hook patch with tests and docs
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/etrepum'
    closed_at = <Date 2009-03-19.19:19:28.327>
    created_at = <Date 2009-02-27.08:37:54.593>
    labels = ['type-feature', 'library']
    title = 'json needs object_pairs_hook'
    updated_at = <Date 2009-03-29.22:37:55.282>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2009-03-29.22:37:55.282>
    actor = 'bob.ippolito'
    assignee = 'bob.ippolito'
    closed = True
    closed_date = <Date 2009-03-19.19:19:28.327>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2009-02-27.08:37:54.593>
    creator = 'rhettinger'
    dependencies = []
    files = ['13201', '13362']
    hgrepos = []
    issue_num = 5381
    keywords = ['patch']
    message_count = 15.0
    messages = ['82825', '82860', '82864', '82865', '82870', '82872', '82885', '83164', '83165', '83166', '83170', '83733', '83819', '83820', '84441']
    nosy_count = 4.0
    nosy_names = ['rhettinger', 'bob.ippolito', 'aronacher', 'cheeaun']
    pr_nums = []
    priority = 'high'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue5381'
    versions = ['Python 3.1', 'Python 2.7']

    @rhettinger
    Copy link
    Contributor Author

    If PEP-372 goes through, Python is going to gain an ordered dict soon.

    The json module's encoder works well with it:

    >>> items = [('one', 1), ('two', 2), ('three',3), ('four',4), ('five',5)]
    >>> json.dumps(OrderedDict(items))
    '{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'

    But the decoder doesn't fare so well. The existing object_hook for the
    decoder passes in a dictionary instead of a list of pairs. So, all the
    ordering information is lost:

    >>> jtext = '{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'
    >>> json.loads(jtext, object_hook=OrderedDict)
    OrderedDict({u'four': 4, u'three': 3, u'five': 5, u'two': 2, u'one': 1})

    A solution is to provide an alternate hook that emits a sequence of
    pairs. If present, that hook should run instead of object_hook. A
    rough proof-of-concept patch is attached.

    FWIW, sample ordered dict code is at:
    http://code.activestate.com/recipes/576669/

    @rhettinger rhettinger added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 27, 2009
    @etrepum
    Copy link
    Mannequin

    etrepum mannequin commented Feb 27, 2009

    Why? According to RFC (emphasis mine):

    An object is an *unordered* collection of zero or more name/value
    pairs, where a name is a string and a value is a string, number,
    boolean, null, object, or array.

    @etrepum etrepum mannequin added the invalid label Feb 27, 2009
    @rhettinger
    Copy link
    Contributor Author

    Same reason as for config files and yaml files. Sometimes those files
    represent human edited input and if a machine re-edits, filters, or
    copies, it is nice to keep the original order (though it may make no
    semantic difference to the computer).

    For example, jsonrpc method invocations are done with objects having
    three properties (method, params, id). The machine doesn't care about
    the order of the properties but a human reader prefers the order listed:

    --> {"method": "postMessage", "params": ["Hello all!"], "id": 99}
    <-- {"result": 1, "error": null, "id": 99}

    If you're testing a program that filters json data (like a typical xml
    task), it is nice to write-out data in the same order received (failing
    to do that is a common complaint about misdesigned xml filters):

    --> {{"title": "awk", "author":"aho", "isbn":"123456789X"},
    {"title": "taocp", "author":"knuth", "isbn":"987654321X"}"
    <-- {{"title": "awk", "author":"aho"},
    {"title": "taocp", "author":"knuth"}}

    Semantically, those entries can be scrambled; however, someone reading
    the filtered result desires that the input and output visually
    correspond as much as possible. An object_pairs_hook makes this possible.

    @rhettinger
    Copy link
    Contributor Author

    FWIW, here's the intended code for the filter in the last post:

        books = json.loads(infile, object_hook=OrderedDict)
        for book in books:
            del book['isbn']
        json.dumps(books, outfile)

    @etrepum
    Copy link
    Mannequin

    etrepum mannequin commented Feb 27, 2009

    Fair enough, but the patch isn't usable because the decoder was rewritten
    in a later version of simplejson. There's another issue with patch to
    backport those back into Python http://bugs.python.org/issue4136 or you
    could just use the simplejson source here http://code.google.com/p/simplejson/

    @etrepum etrepum mannequin removed the invalid label Feb 27, 2009
    @rhettinger
    Copy link
    Contributor Author

    Thanks. I'll write-up a patch against
    http://code.google.com/p/simplejson/ and assign it back to you for review.

    @rhettinger rhettinger assigned rhettinger and unassigned etrepum Feb 27, 2009
    @mitsuhiko
    Copy link
    Member

    Motivation:

    Yes. JSON says it's unordered. However Hashes in Ruby are ordered
    since 1.9 and they were since the very beginning in JavaScript and PHP.

    @rhettinger
    Copy link
    Contributor Author

    After enhancing namedtuple and ConfigParser, I found a simpler approach
    that doesn't involve extending the API. The simple way is to use
    ordered dictionaries directly.

    With a small tweak to OD's repr, it is fully substitutable for a dict
    without changing any client code or doctests (the OD loses its own
    eval/repr order-preserving roundtrip but what json already gives now).

    See attached patch.

    @etrepum
    Copy link
    Mannequin

    etrepum mannequin commented Mar 4, 2009

    Unfortunately this is a patch for the old json lib... the new one has a C
    API and an entirely different method of parsing documents (for performance
    reasons).

    @rhettinger
    Copy link
    Contributor Author

    When do you expect the new C version to go in? I'm looking forward to it.

    @etrepum
    Copy link
    Mannequin

    etrepum mannequin commented Mar 5, 2009

    Whenever someone applies the patch for http://bugs.python.org/issue4136 --
    I don't know when that will happen.

    @rhettinger rhettinger changed the title json need object_pairs_hook json needs object_pairs_hook Mar 5, 2009
    @rhettinger
    Copy link
    Contributor Author

    Bob would you please take a look at the attached patch.

    @rhettinger rhettinger assigned etrepum and unassigned rhettinger Mar 18, 2009
    @etrepum
    Copy link
    Mannequin

    etrepum mannequin commented Mar 19, 2009

    This patch looks good to me, my only comment is that the patch mixes tabs
    and spaces in the C code in a file that had no tabs previously

    @rhettinger
    Copy link
    Contributor Author

    Thanks for looking at this.
    Fixed the tab/space issue.
    Committed in r70471

    @etrepum
    Copy link
    Mannequin

    etrepum mannequin commented Mar 29, 2009

    I fixed two problems with this that didn't show up in the test suite, this
    feature didn't work in load() and there was a problem with the pure python
    code path because the Python scanner needed a small change. Unfortunately
    I'm not sure how to best test the pure python code path with Python's test
    suite, but I ran across it when backporting to simplejson.

    r70702

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants