-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: v4/nbjson.py: json.loads(object_pairs_hook=collections.OrderedDict) #30
Conversation
westurner
commented
Dec 13, 2015
- cross-link w/ nbformat changes, ipython/jupyter version requirements & changelog
- PLS ("please test")
- TODO: enqueue a Travis-CI build of notebook
See also: ipython/ipython_genutils#4 "ENH,PRF: ipstruct.py: class Struct(collections.OrderedDict or dict)" |
@@ -26,6 +27,7 @@ class JSONReader(NotebookReader): | |||
|
|||
def reads(self, s, **kwargs): | |||
"""Read a JSON string into a Notebook object""" | |||
kwargs['object_pairs_hook'] = collections.OrderedDict() # >= py2.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BUG: should be collections.OrderedDict
Can you describe the benefit of using OrderedDict in the Python API as opposed to a regular dict? |
|
There may be (downstream?) test errors for string comparisons, as |
|
I see, it's so that a dict-diff will see keys in the file-order. However, the serialized order of keys shouldn't be relevant to the diff, since it isn't relevant to the format. A notebook-diff shouldn't be sensitive to the key ordering on-disk. If an nbdiff algorithm produces different results for OrderedDict and dict, it's not getting the correct result. |
It also reduces noise from
There is a presentation order and a [numerical/collated] key order for each node. This change should not change the [numerical/collated] key order.
It sounds like what you are suggesting is that the interface/contract of e.g. |
For convenience only, v4 -> v5 could then later call a transform function to sort by key so that the presentation order and key order are the same. |
Don't we write notebooks with the flag to ensure keys are sorted? If we
don't already, I think that's a simple solution to ensuring the on disk
format is stable.
|
If so, I'm not sure where that is [ipython_genutils.ipstruct.Struct is not accepting new code: https://github.com/ipython/ipython_genutils/pull/4 ]. It would seem that preserving on-disk ordering with an OrderedDict linked list would be more optimal.
|
Yes, we do. The flag is sort_keys, and it's in nbformat.v4.nbjson. I don't
see any need to preserve key order in memory, since the order has no
significance.
|
So, with |
It sounds superfluous. Dictionaries are unordered collections. There's no need for these dictionaries to have a particular order. Tools that want to check them should: a) Do so in a way that doesn't use key order - which is easy, because a straightforward If this is about doctests: it's a general problem with any doctests that display dictionaries. Doctests are quite brittle and annoying. I don't see a need for us to work around that specifically for notebooks by modifying nbformat. |
Sorted order with OrderedDict only works for read+write with no modifications. If the object is ever modified (e.g. new keys added) in-memory, then the sorting would be violated, so we will always want to write with |
I'm closing this for now as we don't see a need for it. We can reopen if we change our minds, but as @minrk and I seem to be saying the same thing, that seems unlikely. Sorry @westurner! |
To clarify, you don't see a need to not modify the document every time it You don't see a need to maintain insertion order (so that sort_keys is Do you need a performance benchmark justification here? The more optimal approach is to build an ordered nested set, and maintain
|
Currently, the application is effectively unsorting the notebook JSON on
|
I'm not quite sure what you mean. We read JSON objects to dictionaries, which are conceptually unordered. We have no need to preserve order in memory, so why would we? There's no performance gain to be had from using ordered dictionaries - I've just tried it:
I suspect that it's mostly because OrderedDict is implemented largely in Python, so there's more overhead in manipulating it. But even if it wasn't, these dictionaries typically have 5-10 keys each, so any performance benefit of storing their order to avoid the need to sort is going to be inconsequentially tiny. |
]Where[ can I add JSONLD metadata without losing the key sequence {e.g.
|
You can put arbitrary fields in |
On Dec 14, 2015 4:44 PM, "Thomas Kluyver" notifications@github.com wrote:
The order of attributes on disk is meaningful. There is no reason to sort I just read about the ES6 map, which is also an ordered set.
|
Using OrderedDict doesn't provide this either, because notebooks are typically round-tripped to the browser, in which case the OrderedDict doesn't survive the lifetime of the notebook from read to write.
Not to the content of the notebook, but they are to things like version control. That's why we sort, to ensure that whenever nbformat writes an equivalent notebook, it does it in the same way. Relying on OrderedDict for ordering instead of sort-on-write actually makes this impossible, because it makes the ordering sensitive to how a notebook is constructed, rather than only the content itself. Take, for example: nb1.metadata.key1 = 'value'
nb1.metadata.key2 = 'value'
nb2.metadata.key2 = 'value'
nb2.metadata.key1 = 'value' In this case, nb1 and nb2 have identical content. If we write them with |
On Dec 15, 2015 3:25 AM, "Min RK" notifications@github.com wrote:
So this isn't a partial solution because it already losses the sort at
In ["metadata"], with user supplied metadata, how do I preserve
|
There must be a shim for the new ES6 Map. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map
And then post-sort, again.
|
For example, what does json.dumps(sort_keys=True) do with this document: {
"info": {
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"cheesecake_code_kwalitee_id": null,
"keywords": "Interactive,Interpreter,Shell,Web",
"package_url": "http://pypi.python.org/pypi/notebook",
"author": "Jupyter Development Team",
"author_email": "jupyter@googlegroups.com",
"download_url": "",
"platform": "Linux,Mac OS X,Windows",
"version": "4.1.0b1",
"cheesecake_documentation_id": null,
"_pypi_hidden": true,
"description": "The Jupyter Notebook is a web application that
allows you to create and\nshare documents that contain live code,
equations, visualizations, and\nexplanatory text. The Notebook has support
for multiple programming\nlanguages, sharing, and interactive
widgets.\n\nRead `the documentation <
https://jupyter-notebook.readthedocs.org>`_\nfor more information.",
"release_url": "http://pypi.python.org/pypi/notebook/4.1.0b1",
"downloads": {
"last_month": 77566,
"last_week": 13631,
"last_day": 2147
},
"_pypi_ordering": 7,
"requires_dist": [
"mock; python_version == \"2.7\" and extra == 'test'",
"requests; extra == 'test'",
"nose; extra == 'test'",
"Sphinx (>=1.1); extra == 'doc'",
"terminado (>=0.3.3); sys_platform != \"win32\"",
"traitlets",
"tornado (>=4)",
"nbformat",
"nbconvert",
"jupyter-core",
"jupyter-client",
"jinja2",
"ipython-genutils",
"ipykernel"
],
"classifiers": [
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
"Intended Audience :: System Administrators",
"License :: OSI Approved :: BSD License",
"Programming Language :: Python",
"Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3"
],
"name": "notebook",
"bugtrack_url": "",
"license": "BSD",
"summary": "A web-based notebook environment for interactive
computing",
"home_page": "http://jupyter.org",
"cheesecake_installability_id": null
},
"releases": {
"4.0.1": [
{
"has_sig": false,
"upload_time": "2015-07-30T21:51:27",
"comment_text": "",
"python_version": "2.7",
"url": "
https://pypi.python.org/packages/2.7/n/notebook/notebook-4.0.1-py2.py3-none-any.whl",
"md5_digest": "bf361e35af890ae871da2236585cb699",
"downloads": 8422,
"filename": "notebook-4.0.1-py2.py3-none-any.whl",
"packagetype": "bdist_wheel",
"size": 5503563
},
{
"has_sig": false,
"upload_time": "2015-07-30T21:50:09",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.1.tar.gz",
"md5_digest": "a3c188dae35dbb0e6b78d53a9028a40e",
"downloads": 950,
"filename": "notebook-4.0.1.tar.gz",
"packagetype": "sdist",
"size": 6676955
},
{
"has_sig": false,
"upload_time": "2015-07-30T21:50:01",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.1.zip",
"md5_digest": "4508a52bcfa4d5bfa1c69d2d3d9431d2",
"downloads": 704,
"filename": "notebook-4.0.1.zip",
"packagetype": "sdist",
"size": 7309674
}
],
"4.0.0": [
{
"has_sig": false,
"upload_time": "2015-07-30T21:45:08",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.0.tar.gz",
"md5_digest": "dd20afbb3dac490ffc82ed2a929b9fe6",
"downloads": 517,
"filename": "notebook-4.0.0.tar.gz",
"packagetype": "sdist",
"size": 6676908
},
{
"has_sig": false,
"upload_time": "2015-07-30T21:46:50",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.0.zip",
"md5_digest": "cc2ec89ef65d82792adcab63f0300a8d",
"downloads": 497,
"filename": "notebook-4.0.0.zip",
"packagetype": "sdist",
"size": 7309672
}
],
"4.0.2": [
{
"has_sig": false,
"upload_time": "2015-08-14T22:34:03",
"comment_text": "",
"python_version": "3.4",
"url": "
https://pypi.python.org/packages/3.4/n/notebook/notebook-4.0.2-py2.py3-none-any.whl",
"md5_digest": "481a3f65c0dd6149e735ca5a6be1bbe9",
"downloads": 3217,
"filename": "notebook-4.0.2-py2.py3-none-any.whl",
"packagetype": "bdist_wheel",
"size": 5508821
},
{
"has_sig": false,
"upload_time": "2015-08-14T22:22:35",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.2.tar.gz",
"md5_digest": "92e0caeced4ccb8873e51539c9b7796a",
"downloads": 612,
"filename": "notebook-4.0.2.tar.gz",
"packagetype": "sdist",
"size": 6681345
},
{
"has_sig": false,
"upload_time": "2015-08-14T22:22:51",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.2.zip",
"md5_digest": "8c09ff0830cf96f06d1190f8b1bf4b51",
"downloads": 1979,
"filename": "notebook-4.0.2.zip",
"packagetype": "sdist",
"size": 7315190
}
],
"4.0.5": [
{
"has_sig": false,
"upload_time": "2015-09-22T07:59:45",
"comment_text": "",
"python_version": "3.5",
"url": "
https://pypi.python.org/packages/3.5/n/notebook/notebook-4.0.5-py2.py3-none-any.whl",
"md5_digest": "4394b6785b37db01878cd8d2b2214dfb",
"downloads": 39214,
"filename": "notebook-4.0.5-py2.py3-none-any.whl",
"packagetype": "bdist_wheel",
"size": 5513308
},
{
"has_sig": false,
"upload_time": "2015-09-23T10:20:17",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.5.ZIP",
"md5_digest": "14f0b5578b0d36faea7ed61d4b3d2d47",
"downloads": 448,
"filename": "notebook-4.0.5.ZIP",
"packagetype": "sdist",
"size": 7320539
},
{
"has_sig": false,
"upload_time": "2015-09-23T10:18:55",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.5.tgz",
"md5_digest": "efc9563674c099b1868f024a8fc75948",
"downloads": 2066,
"filename": "notebook-4.0.5.tgz",
"packagetype": "sdist",
"size": 6710606
}
],
"4.0.4": [
{
"has_sig": false,
"upload_time": "2015-08-21T04:14:59",
"comment_text": "",
"python_version": "3.4",
"url": "
https://pypi.python.org/packages/3.4/n/notebook/notebook-4.0.4-py2.py3-none-any.whl",
"md5_digest": "7c6e37ae43c72a4935abe65ced661b09",
"downloads": 46583,
"filename": "notebook-4.0.4-py2.py3-none-any.whl",
"packagetype": "bdist_wheel",
"size": 5511709
},
{
"has_sig": false,
"upload_time": "2015-08-21T04:13:45",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.4.tar.gz",
"md5_digest": "91d6d780832f1bec7545c66e367d34f3",
"downloads": 2243,
"filename": "notebook-4.0.4.tar.gz",
"packagetype": "sdist",
"size": 6682601
},
{
"has_sig": false,
"upload_time": "2015-08-21T04:13:58",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.4.zip",
"md5_digest": "e98ab81a972289b777569f9b59b60f02",
"downloads": 874,
"filename": "notebook-4.0.4.zip",
"packagetype": "sdist",
"size": 7318049
}
],
"4.0.6": [
{
"has_sig": false,
"upload_time": "2015-10-09T12:14:03",
"comment_text": "",
"python_version": "2.7",
"url": "
https://pypi.python.org/packages/2.7/n/notebook/notebook-4.0.6-py2.py3-none-any.whl",
"md5_digest": "2d9fab4a9b4e0cbf692fb42b917c71c8",
"downloads": 168438,
"filename": "notebook-4.0.6-py2.py3-none-any.whl",
"packagetype": "bdist_wheel",
"size": 5551276
},
{
"has_sig": false,
"upload_time": "2015-10-09T12:12:49",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.6.tar.gz",
"md5_digest": "81477df381599455ef7529c5d6a18439",
"downloads": 4976,
"filename": "notebook-4.0.6.tar.gz",
"packagetype": "sdist",
"size": 6705277
},
{
"has_sig": false,
"upload_time": "2015-10-09T12:12:58",
"comment_text": "",
"python_version": "source",
"url": "
https://pypi.python.org/packages/source/n/notebook/notebook-4.0.6.zip",
"md5_digest": "cadc5189d77f6081124b96e154e4d687",
"downloads": 1617,
"filename": "notebook-4.0.6.zip",
"packagetype": "sdist",
"size": 7358080
}
],
"4.1.0b1": [
{
"has_sig": false,
"upload_time": "2015-12-11T18:13:38",
"comment_text": "",
"python_version": "py2.py3",
"url": "
https://pypi.python.org/packages/py2.py3/n/notebook/notebook-4.1.0b1-py2.py3-none-any.whl",
"md5_digest": "65a3713f2ad929e7f5fa144506a8c417",
"downloads": 44,
"filename": "notebook-4.1.0b1-py2.py3-none-any.whl",
"packagetype": "bdist_wheel",
"size": 5657135
}
],
"0.0.0": []
},
"urls": [
{
"has_sig": false,
"upload_time": "2015-12-11T18:13:38",
"comment_text": "",
"python_version": "py2.py3",
"url": "
https://pypi.python.org/packages/py2.py3/n/notebook/notebook-4.1.0b1-py2.py3-none-any.whl",
"md5_digest": "65a3713f2ad929e7f5fa144506a8c417",
"downloads": 44,
"filename": "notebook-4.1.0b1-py2.py3-none-any.whl",
"packagetype": "bdist_wheel",
"size": 5657135
}
]
} Really, sorting all of this shouldn't be necessary. |
Re: OrderedDict and OrderedDefaultDict functional equivalency and See Also:
|
Nor can key-ordering in an unordered-by-definition collection ever be meaningful to any downstream application. When a notebook goes to the browser, it is loaded into actual Javascript objects, not kept in primitive dict/map structures. The serialization form of that and key-ordering are lost, because they are actual objects, to which the ordering is utterly meaningless. Assuming we even could preserve the key order on the client side, we would have to keep the serialized form around, and check key-order on every save, to ensure that the new JSON is reconstructed in the serialized order, rather than the natural, internal order, which should safely be allowed to change. |
On Dec 15, 2015 4:31 AM, "Min RK" notifications@github.com wrote:
Source ordering of keys in an in-memory collection is definitely Compare this presentation/source-ordered list of JSONLD Schema.org keys:
With this sorted (collated; see ICU) list of JSONLD schema.org keys:
this.data would then be a Map (ES6 Map), (Is the JS JSON parser properly order-preserving, as well?) It makes a difference when things are already sorted, and when comparing
The (ordered) JSON is the serialized form. Maps, OrderedDicts, and JSON (w/ ordered_pairs_hook) all have defined
|
I think we have now discussed this in far more length and detail than should be necessary. Once more: we do not care about the in memory order of keys in notebook data structures. In both the Python and the JS, key/value mappings are stored in standard unordered collections. We do not consider this a problem. We do care about having a stable order for on disk, and we consider that problem adequately solved by sorting keys on save. So far, the arguments you've advanced, as I understand them, are:
If you have new reasons to advance, please do so, but it feels like this discussion is going round in circles, and I'm sure you have other things you could be doing. ;-) |
On Dec 15, 2015 5:25 AM, "Thomas Kluyver" notifications@github.com wrote:
An unfortunate conclusion.
I disagree. |
Lossy: Noisy:
|
... [edit] I fixed email -> github quoting |
nbdiff
|
nbdime
|