New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nbformat v4 #6045
nbformat v4 #6045
Conversation
found via validation in ipython#6045
General validation question: When a notebook is converted from one version to another, it makes sense to me to validate the input notebook prior to conversion, and validate the result after conversion. The question is If either of these validations fails, should we warn or raise? The argument for raising, at least on the second validation, is that we shouldn't be creating invalid notebooks. The argument against raising is that most 'invalid' notebooks aren't actually problematic - they may have some extra keys defined, or some empty fields that are safely handled by default values. Another argument against raising is that, since there aren't any v4 notebooks, once we make the switch, all notebooks will go through an upgrade step, so raise on invalid in upgrade means all notebooks that fail validation will not open. |
I think that would be a disaster. If we are able to upgrade a notebook, invalid or not, we should do so and warn if it's invalid. I think it's important to allow the user to attempt to open the invalid notebook it and salvage what converted. The only time I think raising would be appropriate is if no output was produced. |
We should figure out the order of operations on this one. Various tasks:
And figure out exactly what order to do these things, and in how many PRs. |
@minrk do you need feedback from us on the process for this? IIRC we discussed the order of operations at a dev meeting, but didn't put the conclusion here or there. |
It's been a while since I looked at this one. I'll need to better enumerate the current state and One question: Following the msg_spec : output consistency fixes, I updated the stream output in the file format to match the msg spec, so stream output changes from having |
Do we want to add any UI for 'save a copy as v3' or otherwise upgrade/downgrade? We have Python APIs for it, but no CLI or GUI access to it. |
There are some questions about UI for up/downgrade, stream outputs, etc. that I have enumerated as tasks in the descriptions, but at least I have all of the tests passing with nbformat.current = v4. |
I've rebased this one on #5938, since there were conflicts to resolve. I'll clean it back up once that one is merged. |
one of the changes this introduces that may be worth noting is that because the 'worksheets' have been flattened into the root namespace, the |
'new_code_cell', 'new_markdown_cell', 'new_notebook', 'new_output', | ||
'new_heading_cell', 'nbformat', 'nbformat_minor', 'output_from_msg', | ||
'to_notebook_json', 'convert', 'validate', 'nbformat_schema', | ||
'reads_json', 'writes_json', 'reads', 'writes', 'read', 'write', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is explicit public API, I think we should leave new_text_cell
as a deprecated alias for one release.
given that we're changing the file format, @fperez had the suggestion of sticking with the singular "worksheet" key name in place of "cells" in order to preserve "metadata" being at the top of the sorted json file (so we can count on being able to peak at just the head of the file down the line to find out kernel information for a given notebook, for example) |
'count on' seems a bit strong to me, since sorting keys is not actually a part of the file format specification, so we cannot assume that metadata comes first, but we can at least ensure that it is for notebook files that we do write. I'm not thrilled by the idea of picking names based on their lexicographical sorting, but I don't have a big problem with We could call it |
Meh... I woudl prefer a solution based on extended metadata when possible, like xattr, it exist with ext4, probably ntfs but I don't know of package that can use it, and it is not the end of the world if we can't distinguish kernel by peeking at file header. so -0.5 for worksheet. -0.4 for 'ze cell' :-) |
I don't particularly like 'worksheet', but I'm even less thrilled about solutions that rely on filesystem-support for extended metadata. Our notebooks may well be stored on non-FS backends, so we shouldn't make assumptions about what properties a FS has. |
That's why I say "when possible", if it is non-FS, then it is trivial to store the kernel along the file content.
We shouldn't rely on any. |
Moreover as metadata can be arbitrarily big, you are not even sure the kernel spec will be in the first X lines of the file. I don't think we should design the fileformat on such things. |
The trouble with external metadata isn't just storage, it's also I think it is worth trying to keep metadata at the top when we write files. |
And do you really need to peek at the kernel? If we do something like that it should be in the spec. Otherwise we define a "kernel hint" class that can be overwritten. I'm just not convince baking that into the file format before needing it is wise. You know like ... Multiple worksheet :-) I know that you are attached to ipynb but why not file extensions also? Envoyé de mon iPhone
|
File extensions for different kernels would be a horrible mess, and I The metadata is there, and it's conceivable that tools might want to load |
I understand that. But keep in mind that this would also allow for different icons in GUI. (Note I, personally am -0.6 on changing extension, just raising the point) Now, if it is a question of just sorting metadata at top, just use an ordered dict at write time which can be arbitrarily sorted and dumped in the correct order: Would that suit @fperez ? In [18]: d = OrderedDict({'cell':{},'metadata':{}});d
Out[18]: OrderedDict([('cell', {}), ('metadata', {})])
In [19]: json.dumps(d)
Out[19]: '{"cell": {}, "metadata": {}}'
In [20]: d.move_to_end('metadata',last=False)
In [21]: json.dumps(d)
Out[21]: '{"metadata": {}, "cell": {}}' |
I think we should not try to monkey with key order in JSON. We are not the On Thu, Aug 7, 2014 at 2:17 AM, Matthias Bussonnier <
Brian E. Granger |
We already mess with the indentation and insure order is preserved for reproductible save. I don't see in what trying to save with a specific order is problematic. But of course we should not rely on it. |
Getting back to work on this. A few questions:
Related to the nb.cells / nb.worksheet, we could change the top-level structure of the notebook to be a list instead of a dict, so that we can actually guarantee that a header comes before the body. That would be a more drastic change (1.x would not even be able to identify v4 notebooks as v4), but it would solve the header sniffing problem without resorting to key sorting. |
not confusing to_notebook_json
- various typos - discuss multi-line strings in nbformat doc - testing cleanup - py3compat simplification - don't use setdefault when composing notebook nodes - mime-type fix in svg2pdf
to avoid method:module conflict, renamed convert->converter
should common to all nbformats (I didn't change the old nbformats to use it, just in case)
use top-level nbformat.read/write, v4 directly for compose
My French is weak.
found via validation in ipython#6045
FWIW: adding to wes' comment on the mailing list i would strongly encourage adding one simple field to the nbformat structure: {
"@context": ...
} which by default could simply point to: {
"@context": "http://ipython.org/nbformat/v4/context"
} (here is an example from the foaf world: {
"@context": "http://xmlns.com/foaf/context",
"name": "Manu Sporny",
"homepage": "http://manu.sporny.org/"
} you could then say, that you may not interpret anything in
in this situation, one additional keyword gains you a whole lot more, without much maintenance on your side. ps: you guys are doing a terrific job and i can no longer practically keep up with all the things going on. however, when it comes to standards especially those that play well with others, i have drunk some coolade here in cambridge. so this is merely a suggestion to support a format that links with the wider Linked Data Web and a lot of standardization efforts (http://5stardata.info/ , http://schema.org). also it might be worthwhile to encourage an RFC type approach for things like this after you have fleshed out an initial draft. i do continue to monitor the dev list :) and wes did bring this up on a sep 13 email, but didn't see anything since then till matthias' announcement. pps: btw, if you were seriously interested in computational provenance and the notebook, the format will be key to this. i would be happy to join one of the dev-calls and present some thoughts on it. we have done a fair bit of work in other contexts. it does seem that your effort to characterize the notebook as a computational standard has to take provenance into account - otherwise it picks up on a very narrow slice of "reproducibility". |
I would say that's what we are doing with our IPEP approach, which includes the nbformat 4 IPEP, which began prior to the release of IPython 1.0. I'm not particularly persuaded by JSON-LD for notebooks, but if such data can be added to notebook-level metadata, there shouldn't be an issue with adoption by those who are interested. If it needs to be top-level, a proposal can be made to include it in nbformat v4.1. I think provenance is outside the scope of the notebook format itself, but I think different provenance approaches could be pursued via metadata in the notebook, without affecting the nbformat itself. |
Yeahhh ! Know we wait for the bug report flow. |
@minrk - thanks - i stopped watching ipython on github because of the intense amount of activity (which is a great thing) and just focused on the dev list for monitoring and didn't see anything there. if there is a better place for us to discuss these issues, i'm happy to move it there. re: json-ld - the key questions here are:
re: provenance - is the notebook defined to capture a slice of reproducibility, i.e some static snapshot that can be rerun? if so i agree that it's outside the scope. if not, then provenance as metadata is not the optimal route to provenance. |
I'm sadly mostly offline for a few weeks, and I'm typing this just about to Full provenance is a gnarly problem, and I'd like us to "play nice" in that The project is getting large enough, that there's really lots of room for Satra, I know you have a lot of knowledge on this topic. It would be great Cheers f On Mon, Nov 3, 2014 at 4:59 PM, Satrajit Ghosh notifications@github.com
Fernando Perez (@fperez_org; http://fperez.org) |
@fperez - happy to - let's plan on something after thanksgiving. it looks like we should announce on the mailing list depending on what we plan to cover. i'm quite positive there are others who might want to be part of the discussion. bon voyage! |
Great, thanks! (typed from Brasilia, as I watch outside the kind of
|
Still quite a bit to do, but the implemented parts of v4 are working pretty well. I did find and fix some things while testing the validation. Both v3 and v4 have a Draft 4 jsonschema, and use the standard jsonschema format for references, so jsonpointer is no longer used.
closes #5074
TODO in separate PRs:
name, text