Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: v4/nbjson.py: json.loads(object_pairs_hook=collections.OrderedDict) #30

Closed
wants to merge 2 commits into from

Conversation

westurner
Copy link
Contributor

  • cross-link w/ nbformat changes, ipython/jupyter version requirements & changelog
  • PLS ("please test")
    • TODO: enqueue a Travis-CI build of notebook

@westurner
Copy link
Contributor Author

See also: ipython/ipython_genutils#4 "ENH,PRF: ipstruct.py: class Struct(collections.OrderedDict or dict)"

@@ -26,6 +27,7 @@ class JSONReader(NotebookReader):

def reads(self, s, **kwargs):
"""Read a JSON string into a Notebook object"""
kwargs['object_pairs_hook'] = collections.OrderedDict() # >= py2.7
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: should be collections.OrderedDict

@minrk
Copy link
Member

minrk commented Dec 13, 2015

Can you describe the benefit of using OrderedDict in the Python API as opposed to a regular dict?

@westurner
Copy link
Contributor Author

Can you describe the benefit of using OrderedDict in the Python API as opposed to a regular dict?

  • An OrderedDict is an Ordered Map.
  • An OrderedMap is order-preseving (and therein identity-preserving).
  • "Read nbformat v4 notebooks JSON in order; to preserving nb cell ordering"
  • By specifying object_pairs_hook=collections.OrderedDict, the .ipynb cells should be read into memory in the same order in which they are stored.
    • [edit] By 'read into memory', here, I mean, read-out-able in [~deduplicated] FIFO insertion order
      • By specifying object_pairs_hook=collections.OrderedDict, the .ipynb JSON data as read into nbformat.notebooknode.NotebookNode should then be iterated over in source order; so that .ipynb on-disk output is more sensibly diff-able.

@westurner
Copy link
Contributor Author

Can you describe the benefit of using OrderedDict in the Python API as opposed to a regular dict?

There may be (downstream?) test errors for string comparisons, as collections.OrderedDict.__repr__ is not 1:1 compatible with dict.__repr__ (or pprint.pformat, or json.dumps(indent=) with a json.JSONEncoder subclass.)

@westurner
Copy link
Contributor Author

json.load(*, **, object_pairs_hook=collections.OrderedDict)

json.JSONDecoder(*, **, object_pairs_hook=collections.OrderedDict)

@minrk
Copy link
Member

minrk commented Dec 13, 2015

I see, it's so that a dict-diff will see keys in the file-order. However, the serialized order of keys shouldn't be relevant to the diff, since it isn't relevant to the format. A notebook-diff shouldn't be sensitive to the key ordering on-disk. If an nbdiff algorithm produces different results for OrderedDict and dict, it's not getting the correct result.

@westurner
Copy link
Contributor Author

I see, it's so that a dict-diff will see keys in the file-order.

It also reduces noise from PYTHONHASHSEED in dict.__repr__ output.

However, the serialized order of keys shouldn't be relevant to the diff, since it isn't relevant to the format.

There is a presentation order and a [numerical/collated] key order for each node.

This change should not change the [numerical/collated] key order.

A notebook-diff shouldn't be sensitive to the key ordering on-disk. If an nbdiff algorithm produces different results for OrderedDict and dict, it's not getting the correct result.

It sounds like what you are suggesting is that the interface/contract of e.g. INotebookNode.iteritems does not / has not specified order preservation (or thus, identity)?

@westurner
Copy link
Contributor Author

For convenience only, v4 -> v5 could then later call a transform function to sort by key so that the presentation order and key order are the same.

@takluyver
Copy link
Member

takluyver commented Dec 13, 2015 via email

@westurner
Copy link
Contributor Author

@takluyver Don't we write notebooks with the flag to ensure keys are sorted? If we
don't already, I think that's a simple solution to ensuring the on disk
format is stable.

If so, I'm not sure where that is [ipython_genutils.ipstruct.Struct is not accepting new code: https://github.com/ipython/ipython_genutils/pull/4 ]. It would seem that preserving on-disk ordering with an OrderedDict linked list would be more optimal.

  • v4: presentation order != key order
  • v5: presentation order == key order

@takluyver
Copy link
Member

takluyver commented Dec 13, 2015 via email

@westurner
Copy link
Contributor Author

Yes, we do. The flag is sort_keys, and it's in nbformat.v4.nbjson. I don't
see any need to preserve key order in memory, since the order has no
significance.

So, with json.load(object_pairs_hook=collections.OrderedDict), json.dump(sort_keys=True) would then be sorting an already sorted list, wouldn't that be better?

@takluyver
Copy link
Member

It sounds superfluous. Dictionaries are unordered collections. There's no need for these dictionaries to have a particular order. Tools that want to check them should:

a) Do so in a way that doesn't use key order - which is easy, because a straightforward a == b equality check in Python will work.
b) If they really need to compare serialised data, ensure that the serialised form is appropriately normalised, e.g. by using the sort_keys parameter when dumping json, or by using a pretty printer that sorts on keys.

If this is about doctests: it's a general problem with any doctests that display dictionaries. Doctests are quite brittle and annoying. I don't see a need for us to work around that specifically for notebooks by modifying nbformat.

@minrk
Copy link
Member

minrk commented Dec 14, 2015

Sorted order with OrderedDict only works for read+write with no modifications. If the object is ever modified (e.g. new keys added) in-memory, then the sorting would be violated, so we will always want to write with sort_keys=True, whether we adopt OrderedDict in memory or not. Plus, in the most common case of the notebook web-application, roundtrip includes sending the notebook to Javascript and back, so we aren't guaranteed (nor should we be) that the ordering coming from the Javascript client is the same as the one we saved in the filesystem.

@takluyver
Copy link
Member

I'm closing this for now as we don't see a need for it. We can reopen if we change our minds, but as @minrk and I seem to be saying the same thing, that seems unlikely. Sorry @westurner!

@takluyver takluyver closed this Dec 14, 2015
@westurner
Copy link
Contributor Author

To clarify, you don't see a need to not modify the document every time it
is read?

You don't see a need to maintain insertion order (so that sort_keys is
sorting the original key order)?

Do you need a performance benchmark justification here?

The more optimal approach is to build an ordered nested set, and maintain
that property all of the way through the application.
On Dec 14, 2015 4:42 AM, "Thomas Kluyver" notifications@github.com wrote:

I'm closing this for now as we don't see a need for it. We can reopen if
we change our minds, but as @minrk https://github.com/minrk and I seem
to be saying the same thing, that seems unlikely. Sorry @westurner
https://github.com/westurner!


Reply to this email directly or view it on GitHub
#30 (comment).

@westurner
Copy link
Contributor Author

Currently, the application is effectively unsorting the notebook JSON on
every read.
On Dec 14, 2015 11:20 AM, "Wes Turner" wes.turner@gmail.com wrote:

To clarify, you don't see a need to not modify the document every time it
is read?

You don't see a need to maintain insertion order (so that sort_keys is
sorting the original key order)?

Do you need a performance benchmark justification here?

The more optimal approach is to build an ordered nested set, and maintain
that property all of the way through the application.
On Dec 14, 2015 4:42 AM, "Thomas Kluyver" notifications@github.com
wrote:

I'm closing this for now as we don't see a need for it. We can reopen if
we change our minds, but as @minrk https://github.com/minrk and I seem
to be saying the same thing, that seems unlikely. Sorry @westurner
https://github.com/westurner!


Reply to this email directly or view it on GitHub
#30 (comment).

@takluyver
Copy link
Member

I'm not quite sure what you mean. We read JSON objects to dictionaries, which are conceptually unordered. We have no need to preserve order in memory, so why would we?

There's no performance gain to be had from using ordered dictionaries - I've just tried it:

In [26]: %timeit json.dumps(ordered, sort_keys=True)
10000 loops, best of 3: 124 µs per loop

In [27]: %timeit json.dumps(regular, sort_keys=True)
10000 loops, best of 3: 64.8 µs per loop

I suspect that it's mostly because OrderedDict is implemented largely in Python, so there's more overhead in manipulating it. But even if it wasn't, these dictionaries typically have 5-10 keys each, so any performance benefit of storing their order to avoid the need to sort is going to be inconsequentially tiny.

@westurner
Copy link
Contributor Author

]Where[ can I add JSONLD metadata without losing the key sequence {e.g.
@id, @type, name, [...]}?
On Dec 14, 2015 11:54 AM, "Thomas Kluyver" notifications@github.com wrote:

I'm not quite sure what you mean. We read JSON objects to dictionaries,
which are conceptually unordered. We have no need to preserve order in
memory, so why would we?

There's no performance gain to be had from using ordered dictionaries -
I've just tried it:

In [26]: %timeit json.dumps(ordered, sort_keys=True)
10000 loops, best of 3: 124 µs per loop

In [27]: %timeit json.dumps(regular, sort_keys=True)
10000 loops, best of 3: 64.8 µs per loop

I suspect that it's mostly because OrderedDict is implemented largely in
Python, so there's more overhead in manipulating it. But even if it wasn't,
these dictionaries typically have 5-10 keys each, so any performance
benefit of storing their order to avoid the need to sort is going to be
inconsequentially tiny.


Reply to this email directly or view it on GitHub
#30 (comment).

@takluyver
Copy link
Member

You can put arbitrary fields in notebook.metadata and notebook.cells[N].metadata (preferably with some namespacing to avoid collisions, in either case). You can't preserve the order of keys, because that's not meaningful in JSON (json.org: "An object is an unordered set of name/value pairs."). Put the keys in there and don't worry about the order!

@westurner
Copy link
Contributor Author

westurner commented Dec 15, 2015

On Dec 14, 2015 4:44 PM, "Thomas Kluyver" notifications@github.com wrote:

You can put arbitrary fields in notebook.metadata and
notebook.cells[N].metadata (preferably with some namespacing to avoid
collisions, in either case). You can't preserve the order of keys, because
that's not meaningful in JSON (json.org: "An object is an unordered set of
name/value pairs."). Put the keys in there and don't worry about the order!

The order of attributes on disk is meaningful. There is no reason to sort
user-provided metadata every time. Why would you add noise (by failing to
specify object_pairs_hook on read and inappropriately sorting with
sort_keys at every write) where it isn't needed

I just read about the ES6 map, which is also an ordered set.


Reply to this email directly or view it on GitHub.

@minrk
Copy link
Member

minrk commented Dec 15, 2015

you don't see a need to not modify the document every time it is read?

Using OrderedDict doesn't provide this either, because notebooks are typically round-tripped to the browser, in which case the OrderedDict doesn't survive the lifetime of the notebook from read to write.

The order of attributes on disk is meaningful. There is no reason to sort user-provided metadata every time.

Not to the content of the notebook, but they are to things like version control. That's why we sort, to ensure that whenever nbformat writes an equivalent notebook, it does it in the same way. Relying on OrderedDict for ordering instead of sort-on-write actually makes this impossible, because it makes the ordering sensitive to how a notebook is constructed, rather than only the content itself.

Take, for example:

nb1.metadata.key1 = 'value'
nb1.metadata.key2 = 'value'

nb2.metadata.key2 = 'value'
nb2.metadata.key1 = 'value'

In this case, nb1 and nb2 have identical content. If we write them with sort_keys=True, they will be identical on-disk. If, instead, we rely on OrderedDict for key ordering, they will not be equivalent on-disk, so even with OrderedDict, we would still want to use sort_keys=True on write.

@westurner
Copy link
Contributor Author

westurner commented Dec 15, 2015

On Dec 15, 2015 3:25 AM, "Min RK" notifications@github.com wrote:

you don't see a need to not modify the document every time it is read?

Using OrderedDict doesn't provide this either, because notebooks are
typically round-tripped to the browser, in which case the OrderedDict
doesn't survive the lifetime of the notebook from read to write.

So this isn't a partial solution because it already losses the sort at
least twice?

The order of attributes on disk is meaningful. There is no reason to
sort user-provided metadata every time.

Not to the content of the notebook, but they are to things like version
control. That's why we sort, to ensure that whenever nbformat writes an
equivalent notebook, it does it in the same way. Relying on OrderedDict for
ordering instead of sort-on-write actually makes this impossible, because
it makes the ordering sensitive to how a notebook is constructed, rather
than only the content itself.

Take, for example:

nb1.metadata.key1 = 'value'
nb1.metadata.key2 = 'value'

nb2.metadata.key2 = 'value'
nb2.metadata.key1 = 'value'

In this case, nb1 and nb2 have identical content. If we write them with
sort_keys=True, they will be identical on-disk. If, instead, we rely on
OrderedDict for key ordering, they will not be equivalent on-disk, so even
with OrderedDict, we would still want to use sort_keys=True on write.

In ["metadata"], with user supplied metadata, how do I preserve
source-order key ordering?


Reply to this email directly or view it on GitHub.

@westurner
Copy link
Contributor Author

westurner commented Dec 15, 2015

On Dec 15, 2015 3:37 AM, "Wes Turner" wes.turner@gmail.com wrote:

On Dec 15, 2015 3:25 AM, "Min RK" notifications@github.com wrote:

you don't see a need to not modify the document every time it is read?

Using OrderedDict doesn't provide this either, because notebooks are
typically round-tripped to the browser, in which case the OrderedDict
doesn't survive the lifetime of the notebook from read to write.

There must be a shim for the new ES6 Map.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map

A Map object iterates its elements in insertion order — a for...of loop
returns an array of [key, value] for each iteration.

And then post-sort, again.

So this isn't a partial solution because it already losses the sort at
least twice?

The order of attributes on disk is meaningful. There is no reason to
sort user-provided metadata every time.

Not to the content of the notebook, but they are to things like version
control. That's why we sort, to ensure that whenever nbformat writes an
equivalent notebook, it does it in the same way. Relying on OrderedDict for
ordering instead of sort-on-write actually makes this impossible, because
it makes the ordering sensitive to how a notebook is constructed, rather
than only the content itself.

Take, for example:

nb1.metadata.key1 = 'value'
nb1.metadata.key2 = 'value'

nb2.metadata.key2 = 'value'
nb2.metadata.key1 = 'value'

In this case, nb1 and nb2 have identical content. If we write them with
sort_keys=True, they will be identical on-disk. If, instead, we rely on
OrderedDict for key ordering, they will not be equivalent on-disk, so even
with OrderedDict, we would still want to use sort_keys=True on write.

In ["metadata"], with user supplied metadata, how do I preserve
source-order key ordering?


Reply to this email directly or view it on GitHub.

@westurner
Copy link
Contributor Author

In ["metadata"], with user supplied metadata, how do I preserve
source-order key ordering?

For example, what does json.dumps(sort_keys=True) do with this document:
http://pypi.python.org/pypi/notebook/json :

{
    "info": {
        "maintainer": "",
        "docs_url": null,
        "requires_python": "",
        "maintainer_email": "",
        "cheesecake_code_kwalitee_id": null,
        "keywords": "Interactive,Interpreter,Shell,Web",
        "package_url": "http://pypi.python.org/pypi/notebook",
        "author": "Jupyter Development Team",
        "author_email": "jupyter@googlegroups.com",
        "download_url": "",
        "platform": "Linux,Mac OS X,Windows",
        "version": "4.1.0b1",
        "cheesecake_documentation_id": null,
        "_pypi_hidden": true,
        "description": "The Jupyter Notebook is a web application that
allows you to create and\nshare documents that contain live code,
equations, visualizations, and\nexplanatory text. The Notebook has support
for multiple programming\nlanguages, sharing, and interactive
widgets.\n\nRead `the documentation <
https://jupyter-notebook.readthedocs.org>`_\nfor more information.",
        "release_url": "http://pypi.python.org/pypi/notebook/4.1.0b1",
        "downloads": {
            "last_month": 77566,
            "last_week": 13631,
            "last_day": 2147
        },
        "_pypi_ordering": 7,
        "requires_dist": [
            "mock; python_version == \"2.7\" and extra == 'test'",
            "requests; extra == 'test'",
            "nose; extra == 'test'",
            "Sphinx (>=1.1); extra == 'doc'",
            "terminado (>=0.3.3); sys_platform != \"win32\"",
            "traitlets",
            "tornado (>=4)",
            "nbformat",
            "nbconvert",
            "jupyter-core",
            "jupyter-client",
            "jinja2",
            "ipython-genutils",
            "ipykernel"
        ],
        "classifiers": [
            "Intended Audience :: Developers",
            "Intended Audience :: Science/Research",
            "Intended Audience :: System Administrators",
            "License :: OSI Approved :: BSD License",
            "Programming Language :: Python",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3"
        ],
        "name": "notebook",
        "bugtrack_url": "",
        "license": "BSD",
        "summary": "A web-based notebook environment for interactive
computing",
        "home_page": "http://jupyter.org",
        "cheesecake_installability_id": null
    },
    "releases": {
        "4.0.1": [
            {
                "has_sig": false,
                "upload_time": "2015-07-30T21:51:27",
                "comment_text": "",
                "python_version": "2.7",
                "url": "
https://pypi.python.org/packages/2.7/n/notebook/notebook-4.0.1-py2.py3-none-any.whl",

                "md5_digest": "bf361e35af890ae871da2236585cb699",
                "downloads": 8422,
                "filename": "notebook-4.0.1-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5503563
            },
            {
                "has_sig": false,
                "upload_time": "2015-07-30T21:50:09",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.1.tar.gz",
                "md5_digest": "a3c188dae35dbb0e6b78d53a9028a40e",
                "downloads": 950,
                "filename": "notebook-4.0.1.tar.gz",
                "packagetype": "sdist",
                "size": 6676955
            },
            {
                "has_sig": false,
                "upload_time": "2015-07-30T21:50:01",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.1.zip",
                "md5_digest": "4508a52bcfa4d5bfa1c69d2d3d9431d2",
                "downloads": 704,
                "filename": "notebook-4.0.1.zip",
                "packagetype": "sdist",
                "size": 7309674
            }
        ],
        "4.0.0": [
            {
                "has_sig": false,
                "upload_time": "2015-07-30T21:45:08",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.0.tar.gz",
                "md5_digest": "dd20afbb3dac490ffc82ed2a929b9fe6",
                "downloads": 517,
                "filename": "notebook-4.0.0.tar.gz",
                "packagetype": "sdist",
                "size": 6676908
            },
            {
                "has_sig": false,
                "upload_time": "2015-07-30T21:46:50",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.0.zip",
                "md5_digest": "cc2ec89ef65d82792adcab63f0300a8d",
                "downloads": 497,
                "filename": "notebook-4.0.0.zip",
                "packagetype": "sdist",
                "size": 7309672
            }
        ],
        "4.0.2": [
            {
                "has_sig": false,
                "upload_time": "2015-08-14T22:34:03",
                "comment_text": "",
                "python_version": "3.4",
                "url": "
https://pypi.python.org/packages/3.4/n/notebook/notebook-4.0.2-py2.py3-none-any.whl",

                "md5_digest": "481a3f65c0dd6149e735ca5a6be1bbe9",
                "downloads": 3217,
                "filename": "notebook-4.0.2-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5508821
            },
            {
                "has_sig": false,
                "upload_time": "2015-08-14T22:22:35",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.2.tar.gz",
                "md5_digest": "92e0caeced4ccb8873e51539c9b7796a",
                "downloads": 612,
                "filename": "notebook-4.0.2.tar.gz",
                "packagetype": "sdist",
                "size": 6681345
            },
            {
                "has_sig": false,
                "upload_time": "2015-08-14T22:22:51",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.2.zip",
                "md5_digest": "8c09ff0830cf96f06d1190f8b1bf4b51",
                "downloads": 1979,
                "filename": "notebook-4.0.2.zip",
                "packagetype": "sdist",
                "size": 7315190
            }
        ],
        "4.0.5": [
            {
                "has_sig": false,
                "upload_time": "2015-09-22T07:59:45",
                "comment_text": "",
                "python_version": "3.5",
                "url": "
https://pypi.python.org/packages/3.5/n/notebook/notebook-4.0.5-py2.py3-none-any.whl",

                "md5_digest": "4394b6785b37db01878cd8d2b2214dfb",
                "downloads": 39214,
                "filename": "notebook-4.0.5-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5513308
            },
            {
                "has_sig": false,
                "upload_time": "2015-09-23T10:20:17",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.5.ZIP",
                "md5_digest": "14f0b5578b0d36faea7ed61d4b3d2d47",
                "downloads": 448,
                "filename": "notebook-4.0.5.ZIP",
                "packagetype": "sdist",
                "size": 7320539
            },
            {
                "has_sig": false,
                "upload_time": "2015-09-23T10:18:55",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.5.tgz",
                "md5_digest": "efc9563674c099b1868f024a8fc75948",
                "downloads": 2066,
                "filename": "notebook-4.0.5.tgz",
                "packagetype": "sdist",
                "size": 6710606
            }
        ],
        "4.0.4": [
            {
                "has_sig": false,
                "upload_time": "2015-08-21T04:14:59",
                "comment_text": "",
                "python_version": "3.4",
                "url": "
https://pypi.python.org/packages/3.4/n/notebook/notebook-4.0.4-py2.py3-none-any.whl",

                "md5_digest": "7c6e37ae43c72a4935abe65ced661b09",
                "downloads": 46583,
                "filename": "notebook-4.0.4-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5511709
            },
            {
                "has_sig": false,
                "upload_time": "2015-08-21T04:13:45",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.4.tar.gz",
                "md5_digest": "91d6d780832f1bec7545c66e367d34f3",
                "downloads": 2243,
                "filename": "notebook-4.0.4.tar.gz",
                "packagetype": "sdist",
                "size": 6682601
            },
            {
                "has_sig": false,
                "upload_time": "2015-08-21T04:13:58",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.4.zip",
                "md5_digest": "e98ab81a972289b777569f9b59b60f02",
                "downloads": 874,
                "filename": "notebook-4.0.4.zip",
                "packagetype": "sdist",
                "size": 7318049
            }
        ],
        "4.0.6": [
            {
                "has_sig": false,
                "upload_time": "2015-10-09T12:14:03",
                "comment_text": "",
                "python_version": "2.7",
                "url": "
https://pypi.python.org/packages/2.7/n/notebook/notebook-4.0.6-py2.py3-none-any.whl",

                "md5_digest": "2d9fab4a9b4e0cbf692fb42b917c71c8",
                "downloads": 168438,
                "filename": "notebook-4.0.6-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5551276
            },
            {
                "has_sig": false,
                "upload_time": "2015-10-09T12:12:49",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.6.tar.gz",
                "md5_digest": "81477df381599455ef7529c5d6a18439",
                "downloads": 4976,
                "filename": "notebook-4.0.6.tar.gz",
                "packagetype": "sdist",
                "size": 6705277
            },
            {
                "has_sig": false,
                "upload_time": "2015-10-09T12:12:58",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.6.zip",
                "md5_digest": "cadc5189d77f6081124b96e154e4d687",
                "downloads": 1617,
                "filename": "notebook-4.0.6.zip",
                "packagetype": "sdist",
                "size": 7358080
            }
        ],
        "4.1.0b1": [
            {
                "has_sig": false,
                "upload_time": "2015-12-11T18:13:38",
                "comment_text": "",
                "python_version": "py2.py3",
                "url": "
https://pypi.python.org/packages/py2.py3/n/notebook/notebook-4.1.0b1-py2.py3-none-any.whl",

                "md5_digest": "65a3713f2ad929e7f5fa144506a8c417",
                "downloads": 44,
                "filename": "notebook-4.1.0b1-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5657135
            }
        ],
        "0.0.0": []
    },
    "urls": [
        {
            "has_sig": false,
            "upload_time": "2015-12-11T18:13:38",
            "comment_text": "",
            "python_version": "py2.py3",
            "url": "
https://pypi.python.org/packages/py2.py3/n/notebook/notebook-4.1.0b1-py2.py3-none-any.whl",

            "md5_digest": "65a3713f2ad929e7f5fa144506a8c417",
            "downloads": 44,
            "filename": "notebook-4.1.0b1-py2.py3-none-any.whl",
            "packagetype": "bdist_wheel",
            "size": 5657135
        }
    ]
}

Really, sorting all of this shouldn't be necessary.
Because not maintaining key order breaks the identity function, if I was to
try and compare this after storage in ["metadata"], a line-based diff would
fail,
Because json.loads without ordered_pairs_hook=collections.OrderedDict is a
destructive operation with unnecessarily noisy side effects
Which almost certainly are not a required property of any downstream
application.

@westurner
Copy link
Contributor Author

Re: OrderedDict and OrderedDefaultDict functional equivalency and
performance (PRF):

See Also:

In ["metadata"], with user supplied metadata, how do I preserve
source-order key ordering?

For example, what does json.dumps(sort_keys=True) do with this document:
http://pypi.python.org/pypi/notebook/json :

{
    "info": {
        "maintainer": "",
        "docs_url": null,
        "requires_python": "",
        "maintainer_email": "",
        "cheesecake_code_kwalitee_id": null,
        "keywords": "Interactive,Interpreter,Shell,Web",
        "package_url": "http://pypi.python.org/pypi/notebook",
        "author": "Jupyter Development Team",
        "author_email": "jupyter@googlegroups.com",
        "download_url": "",
        "platform": "Linux,Mac OS X,Windows",
        "version": "4.1.0b1",
        "cheesecake_documentation_id": null,
        "_pypi_hidden": true,
        "description": "The Jupyter Notebook is a web application that
allows you to create and\nshare documents that contain live code,
equations, visualizations, and\nexplanatory text. The Notebook has support
for multiple programming\nlanguages, sharing, and interactive
widgets.\n\nRead `the documentation <
https://jupyter-notebook.readthedocs.org>`_\nfor more information.",
        "release_url": "http://pypi.python.org/pypi/notebook/4.1.0b1",
        "downloads": {
            "last_month": 77566,
            "last_week": 13631,
            "last_day": 2147
        },
        "_pypi_ordering": 7,
        "requires_dist": [
            "mock; python_version == \"2.7\" and extra == 'test'",
            "requests; extra == 'test'",
            "nose; extra == 'test'",
            "Sphinx (>=1.1); extra == 'doc'",
            "terminado (>=0.3.3); sys_platform != \"win32\"",
            "traitlets",
            "tornado (>=4)",
            "nbformat",
            "nbconvert",
            "jupyter-core",
            "jupyter-client",
            "jinja2",
            "ipython-genutils",
            "ipykernel"
        ],
        "classifiers": [
            "Intended Audience :: Developers",
            "Intended Audience :: Science/Research",
            "Intended Audience :: System Administrators",
            "License :: OSI Approved :: BSD License",
            "Programming Language :: Python",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3"
        ],
        "name": "notebook",
        "bugtrack_url": "",
        "license": "BSD",
        "summary": "A web-based notebook environment for interactive
computing",
        "home_page": "http://jupyter.org",
        "cheesecake_installability_id": null
    },
    "releases": {
        "4.0.1": [
            {
                "has_sig": false,
                "upload_time": "2015-07-30T21:51:27",
                "comment_text": "",
                "python_version": "2.7",
                "url": "
https://pypi.python.org/packages/2.7/n/notebook/notebook-4.0.1-py2.py3-none-any.whl",

                "md5_digest": "bf361e35af890ae871da2236585cb699",
                "downloads": 8422,
                "filename": "notebook-4.0.1-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5503563
            },
            {
                "has_sig": false,
                "upload_time": "2015-07-30T21:50:09",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.1.tar.gz",

                "md5_digest": "a3c188dae35dbb0e6b78d53a9028a40e",
                "downloads": 950,
                "filename": "notebook-4.0.1.tar.gz",
                "packagetype": "sdist",
                "size": 6676955
            },
            {
                "has_sig": false,
                "upload_time": "2015-07-30T21:50:01",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.1.zip",
                "md5_digest": "4508a52bcfa4d5bfa1c69d2d3d9431d2",
                "downloads": 704,
                "filename": "notebook-4.0.1.zip",
                "packagetype": "sdist",
                "size": 7309674
            }
        ],
        "4.0.0": [
            {
                "has_sig": false,
                "upload_time": "2015-07-30T21:45:08",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.0.tar.gz",

                "md5_digest": "dd20afbb3dac490ffc82ed2a929b9fe6",
                "downloads": 517,
                "filename": "notebook-4.0.0.tar.gz",
                "packagetype": "sdist",
                "size": 6676908
            },
            {
                "has_sig": false,
                "upload_time": "2015-07-30T21:46:50",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.0.zip",
                "md5_digest": "cc2ec89ef65d82792adcab63f0300a8d",
                "downloads": 497,
                "filename": "notebook-4.0.0.zip",
                "packagetype": "sdist",
                "size": 7309672
            }
        ],
        "4.0.2": [
            {
                "has_sig": false,
                "upload_time": "2015-08-14T22:34:03",
                "comment_text": "",
                "python_version": "3.4",
                "url": "
https://pypi.python.org/packages/3.4/n/notebook/notebook-4.0.2-py2.py3-none-any.whl",

                "md5_digest": "481a3f65c0dd6149e735ca5a6be1bbe9",
                "downloads": 3217,
                "filename": "notebook-4.0.2-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5508821
            },
            {
                "has_sig": false,
                "upload_time": "2015-08-14T22:22:35",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.2.tar.gz",

                "md5_digest": "92e0caeced4ccb8873e51539c9b7796a",
                "downloads": 612,
                "filename": "notebook-4.0.2.tar.gz",
                "packagetype": "sdist",
                "size": 6681345
            },
            {
                "has_sig": false,
                "upload_time": "2015-08-14T22:22:51",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.2.zip",
                "md5_digest": "8c09ff0830cf96f06d1190f8b1bf4b51",
                "downloads": 1979,
                "filename": "notebook-4.0.2.zip",
                "packagetype": "sdist",
                "size": 7315190
            }
        ],
        "4.0.5": [
            {
                "has_sig": false,
                "upload_time": "2015-09-22T07:59:45",
                "comment_text": "",
                "python_version": "3.5",
                "url": "
https://pypi.python.org/packages/3.5/n/notebook/notebook-4.0.5-py2.py3-none-any.whl",

                "md5_digest": "4394b6785b37db01878cd8d2b2214dfb",
                "downloads": 39214,
                "filename": "notebook-4.0.5-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5513308
            },
            {
                "has_sig": false,
                "upload_time": "2015-09-23T10:20:17",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.5.ZIP",
                "md5_digest": "14f0b5578b0d36faea7ed61d4b3d2d47",
                "downloads": 448,
                "filename": "notebook-4.0.5.ZIP",
                "packagetype": "sdist",
                "size": 7320539
            },
            {
                "has_sig": false,
                "upload_time": "2015-09-23T10:18:55",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.5.tgz",
                "md5_digest": "efc9563674c099b1868f024a8fc75948",
                "downloads": 2066,
                "filename": "notebook-4.0.5.tgz",
                "packagetype": "sdist",
                "size": 6710606
            }
        ],
        "4.0.4": [
            {
                "has_sig": false,
                "upload_time": "2015-08-21T04:14:59",
                "comment_text": "",
                "python_version": "3.4",
                "url": "
https://pypi.python.org/packages/3.4/n/notebook/notebook-4.0.4-py2.py3-none-any.whl",

                "md5_digest": "7c6e37ae43c72a4935abe65ced661b09",
                "downloads": 46583,
                "filename": "notebook-4.0.4-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5511709
            },
            {
                "has_sig": false,
                "upload_time": "2015-08-21T04:13:45",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.4.tar.gz",

                "md5_digest": "91d6d780832f1bec7545c66e367d34f3",
                "downloads": 2243,
                "filename": "notebook-4.0.4.tar.gz",
                "packagetype": "sdist",
                "size": 6682601
            },
            {
                "has_sig": false,
                "upload_time": "2015-08-21T04:13:58",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.4.zip",
                "md5_digest": "e98ab81a972289b777569f9b59b60f02",
                "downloads": 874,
                "filename": "notebook-4.0.4.zip",
                "packagetype": "sdist",
                "size": 7318049
            }
        ],
        "4.0.6": [
            {
                "has_sig": false,
                "upload_time": "2015-10-09T12:14:03",
                "comment_text": "",
                "python_version": "2.7",
                "url": "
https://pypi.python.org/packages/2.7/n/notebook/notebook-4.0.6-py2.py3-none-any.whl",

                "md5_digest": "2d9fab4a9b4e0cbf692fb42b917c71c8",
                "downloads": 168438,
                "filename": "notebook-4.0.6-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5551276
            },
            {
                "has_sig": false,
                "upload_time": "2015-10-09T12:12:49",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.6.tar.gz",

                "md5_digest": "81477df381599455ef7529c5d6a18439",
                "downloads": 4976,
                "filename": "notebook-4.0.6.tar.gz",
                "packagetype": "sdist",
                "size": 6705277
            },
            {
                "has_sig": false,
                "upload_time": "2015-10-09T12:12:58",
                "comment_text": "",
                "python_version": "source",
                "url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.6.zip",
                "md5_digest": "cadc5189d77f6081124b96e154e4d687",
                "downloads": 1617,
                "filename": "notebook-4.0.6.zip",
                "packagetype": "sdist",
                "size": 7358080
            }
        ],
        "4.1.0b1": [
            {
                "has_sig": false,
                "upload_time": "2015-12-11T18:13:38",
                "comment_text": "",
                "python_version": "py2.py3",
                "url": "
https://pypi.python.org/packages/py2.py3/n/notebook/notebook-4.1.0b1-py2.py3-none-any.whl",

                "md5_digest": "65a3713f2ad929e7f5fa144506a8c417",
                "downloads": 44,
                "filename": "notebook-4.1.0b1-py2.py3-none-any.whl",
                "packagetype": "bdist_wheel",
                "size": 5657135
            }
        ],
        "0.0.0": []
    },
    "urls": [
        {
            "has_sig": false,
            "upload_time": "2015-12-11T18:13:38",
            "comment_text": "",
            "python_version": "py2.py3",
            "url": "
https://pypi.python.org/packages/py2.py3/n/notebook/notebook-4.1.0b1-py2.py3-none-any.whl",

            "md5_digest": "65a3713f2ad929e7f5fa144506a8c417",
            "downloads": 44,
            "filename": "notebook-4.1.0b1-py2.py3-none-any.whl",
            "packagetype": "bdist_wheel",
            "size": 5657135
        }
    ]
}

Really, sorting all of this shouldn't be necessary.
Because not maintaining key order breaks the identity function, if I was
to try and compare this after storage in ["metadata"], a line-based diff
would fail,
Because json.loads without ordered_pairs_hook=collections.OrderedDict is a
destructive operation with unnecessarily noisy side effects
Which almost certainly are not a required property of any downstream
application.

@minrk
Copy link
Member

minrk commented Dec 15, 2015

Which almost certainly are not a required property of any downstream application.

Nor can key-ordering in an unordered-by-definition collection ever be meaningful to any downstream application.

When a notebook goes to the browser, it is loaded into actual Javascript objects, not kept in primitive dict/map structures. The serialization form of that and key-ordering are lost, because they are actual objects, to which the ordering is utterly meaningless. Assuming we even could preserve the key order on the client side, we would have to keep the serialized form around, and check key-order on every save, to ensure that the new JSON is reconstructed in the serialized order, rather than the natural, internal order, which should safely be allowed to change.

@westurner
Copy link
Contributor Author

westurner commented Dec 15, 2015

On Dec 15, 2015 4:31 AM, "Min RK" notifications@github.com wrote:

Which almost certainly are not a required property of any downstream
application.

Nor can key-ordering in an unordered-by-definition collection ever be
meaningful to any downstream application.

Source ordering of keys in an in-memory collection is definitely
meaningful (e.g. Python class definitions: dict.__iter__ is why you must
include numbers in unittest test_ function names)

Compare this presentation/source-ordered list of JSONLD Schema.org keys:

@id
@type
name
url
description
author
educationalAlignment

With this sorted (collated; see ICU) list of JSONLD schema.org keys:

@id
@type
author
description
educationalAlignment
name
url

When a notebook goes to the browser, it is loaded into actual Javascript
objects, not kept in primitive dict/map structures. The serialization form
of that and key-ordering are lost, because they are actual objects, to
which the ordering is utterly meaningless.

this.data would then be a Map (ES6 Map),
and then data and code would be distinct; simplifying serialization back to
non-executable JSON.

(Is the JS JSON parser properly order-preserving, as well?)

It makes a difference when things are already sorted, and when comparing
two things line-by-line (diff), rather than testing for isomorphism,
which is harder.

Assuming we even could preserve the key order on the client side, we
would have to keep the serialized form around,

The (ordered) JSON is the serialized form.

Maps, OrderedDicts, and JSON (w/ ordered_pairs_hook) all have defined
order-preserving key iteration.

and check key-order on every save, to ensure that the new JSON is
reconstructed in the serialized order, rather than the natural, internal
order, which should safely be allowed to change.

  • Why do you need to check key order on every save?
  • There's a list of keys, and an indexed storage data structure (where we
    can be comfortable with openly-addressed, non-input-order-preserving sparse
    hash maps like python.dict and js.object).


Reply to this email directly or view it on GitHub.

@takluyver
Copy link
Member

I think we have now discussed this in far more length and detail than should be necessary. Once more: we do not care about the in memory order of keys in notebook data structures. In both the Python and the JS, key/value mappings are stored in standard unordered collections. We do not consider this a problem. We do care about having a stable order for on disk, and we consider that problem adequately solved by sorting keys on save.

So far, the arguments you've advanced, as I understand them, are:

  1. Performance benefits by maintaining keys in a sorted order: this is probably negligible, and in any case it's swamped by the overhead of using OrderedDict in place of dict.
  2. Metadata authors might care about the order of keys: in short, tough. JSON objects are unordered collections, so preserving order is not something you get to do. If the order is really important, you can use a JSON array, which is an ordered collection.

If you have new reasons to advance, please do so, but it feels like this discussion is going round in circles, and I'm sure you have other things you could be doing. ;-)

@westurner
Copy link
Contributor Author

westurner commented Dec 15, 2015

On Dec 15, 2015 5:25 AM, "Thomas Kluyver" notifications@github.com wrote:

I think we have now discussed this in far more length and detail than
should be necessary. Once more: we do not care about the in memory order of
keys in notebook data structures. In both the Python and the JS, key/value
mappings are stored in standard unordered collections. We do not consider
this a problem.

An unfortunate conclusion.

We do care about having a stable order for on disk, and we consider that
problem adequately solved by sorting keys on save.

I disagree.

@westurner
Copy link
Contributor Author

  • inductive logic is predicated upon ordered premises; which do not need to
    be unsorted and resorted because speed.
  • JSON > python json.load() is lossy without object_pairs_hook.
  • JSON > JS object() is lossy.
    • ES6 Maps are not lossy.

Lossy:
Does not preserve input ordering. Loses the linear source-order. Breaks
string equality due to loss of key ordering.

Noisy:
JSON > dict adds noise (due to PYTHONHASHSEED)
On Dec 15, 2015 5:27 AM, "Wes Turner" wes.turner@gmail.com wrote:

On Dec 15, 2015 5:25 AM, "Thomas Kluyver" notifications@github.com
wrote:

I think we have now discussed this in far more length and detail than
should be necessary. Once more: we do not care about the in memory order of
keys in notebook data structures. In both the Python and the JS, key/value
mappings are stored in standard unordered collections. We do not consider
this a problem.

An unfortunate conclusion.

We do care about having a stable order for on disk, and we consider that
problem adequately solved by sorting keys on save.

I disagree.

@westurner
Copy link
Contributor Author

... [edit] I fixed email -> github quoting
in the previous messages for readability.

@westurner
Copy link
Contributor Author

westurner commented May 28, 2016

nbdiff

A tool for diffing and merging IPython Notebook files

@westurner
Copy link
Contributor Author

westurner commented May 28, 2016

nbdime

Tools for diffing and merging of Jupyter notebooks.

@minrk minrk added this to the no action milestone May 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants