Skip to content

create a ipynbv3 json schema and a validator #2396

Merged
merged 11 commits into from Jan 14, 2013

2 participants

@Carreau
IPython member
Carreau commented Sep 8, 2012

Json Schema is a draft ment to describe the structure of a Json format.
It could be used to validate which file are valid ipynb files.

This is a first draft a ipynb schema and validator, mostly auto generated.
There are still things that could be added, like 'cell level' is 'optional' but could be added as 'required only if cell is of type heading', or added description.
IIUC, one could even create it's own subtype like codecell/markdowncell and reference it to have more fine grained validation.

cf http://json-schema.org/
http://tools.ietf.org/html/draft-zyp-json-schema-03

schema is autogenerated from
http://www.jsonschema.net/index.html#
Then tweeked a little

@Carreau
IPython member
Carreau commented Sep 9, 2012

I added a quick implementation off json-references, so that you could references other part of the json file,
this help the reading of the schema by allowing thing like "a_key" : {"$ref": "/code_cell"} , where {"$ref": "/code_cell"} will be replaced at load time with the correct part of the json file.

the validation seem to work :

./validator.py  ~/ipython/docs/examples/notebooks/*.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/00_notebook_tour.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/01_notebook_introduction.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Animations_and_Progress.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Capturing Output.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Script Magics.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/cython_extension.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/display_protocol.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/formatting.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/octavemagic_extension.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/publish_data.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/rawraw.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/rmagic_extension.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/sympy.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/sympy_quantum_computing.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/test.ipynb (56)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/test_diff.ipynb (20)
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/trapezoid_rule.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/truc.ipynb (6) 

and should be refined.

Here is a test example, where I require code_cell to have a prompt_number property (even if set to null:

./validator.py  ~/ipython/docs/examples/notebooks/*.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/00_notebook_tour.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/01_notebook_introduction.ipynb (1)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Animations_and_Progress.ipynb (8)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Capturing Output.ipynb (13)
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Script Magics.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/cython_extension.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/display_protocol.ipynb (15)
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/formatting.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/octavemagic_extension.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/publish_data.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/rawraw.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/rmagic_extension.ipynb (1)
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/sympy.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/sympy_quantum_computing.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/test.ipynb (56)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/test_diff.ipynb (20)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/trapezoid_rule.ipynb (1)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/truc.ipynb (6)

and indeed, some code cell have an empty prompt number.
Also, through this way, I discovered that prompt_number is of type number or string as some prompt number are...nbsp;. I don't think it should happend, and prompt number should be number or null.

Thought ?

@ellisonbg
IPython member

When I wrote the python code to create the various notebook parts, I tried to not assume that various elements would always be present as much as possible. If we move to a schema (I like the idea) we might want to revisit that code to make sure that notebooks created using that Python code have all the right elements.

@Carreau
IPython member
Carreau commented Sep 9, 2012

Well, we can put element as not required, this is not a problem.

I would just like to have a nice descriptive of what the ipynb format is, even if for now it only gives warning for non conform instances.

I wrote that to be able to validate some test I made to generate notebook, and it can make a solid base to discuss the notebook format or make it evolve. Especially if we do cell by cell syncing, it would be great to be sure the format is always consistent.

We could also validate the messaging api with schema also. I still have to take a look ad wether there is a way to validate value and not only keys and structures.

@ellisonbg
IPython member
@Carreau
IPython member
Carreau commented Sep 10, 2012

That's seem fine with me.
l guess this would then be notebook 3.1 maybe.
I'll look into that.

@Carreau
IPython member
Carreau commented Sep 11, 2012

So, here we go.
I did most of the changes I think are reasonable,

biggest changes are :

prompt_number, required for code cell, can't be string, but can be null.
commits name are more or less explicit.

result of ./validator.py -s v3.withref.json ~/ipython/**/*.ipynb (-v is a little too verbose)

[    ] /Users/matthiasbussonnier/ipython/docs/examples/lib/BackgroundJobs.ipynb (22)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/00_notebook_tour.ipynb (1)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/01_notebook_introduction.ipynb (1)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/Animations_and_Progress.ipynb (8)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/Capturing Output.ipynb (13)
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/Script Magics.ipynb
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/cython_extension.ipynb
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/display_protocol.ipynb (15)
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/formatting.ipynb
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/octavemagic_extension.ipynb
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/publish_data.ipynb
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/rmagic_extension.ipynb (1)
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/sympy.ipynb
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/sympy_quantum_computing.ipynb
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/trapezoid_rule.ipynb (1)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/parallel/Parallel Magics.ipynb (24)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/parallel/helloworld.ipynb (1)
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/parallel/options/Parallel MC Options.ipynb
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/parallel/parallel_mpi.ipynb
[    ] /Users/matthiasbussonnier/ipython/docs/examples/parallel/rmt/rmt.ipynb (29)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/parallel/task1.ipynb (1)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/parallel/taskmap.ipynb (1)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/tests/heartbeat/gilsleep.ipynb (6)
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/tests/pylab-switch/Pylab Switching.ipynb
[    ] /Users/matthiasbussonnier/ipython/docs/examples/widgets/directview/directview.ipynb (3)

One thing that could be added is to require name for worksheets.

@Carreau
IPython member
Carreau commented Sep 28, 2012

Put dependencies in external with licences.

Thought ?

@ellisonbg ellisonbg and 1 other commented on an outdated diff Oct 2, 2012
IPython/nbformat/v3/jsonpointer.py
@@ -0,0 +1,222 @@
+# -*- coding: utf-8 -*-
@ellisonbg
IPython member
ellisonbg added a note Oct 2, 2012

Why is this not in externals?

@Carreau
IPython member
Carreau added a note Oct 2, 2012

Because mv != git mv ?

@Carreau
IPython member
Carreau added a note Oct 2, 2012

will fix it tomorrow.

@ellisonbg
IPython member
ellisonbg added a note Oct 2, 2012
@Carreau
IPython member
Carreau added a note Oct 2, 2012

Just to say that I moved it to external and git add it (so it's duplicated and I should remove this one).
I should have make a git mv which would have told git not to track this particular file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ellisonbg
IPython member

@Carreau what else do you think needs discussion on this. Did you have a chance to look through the nbformat and decide which attributes can be null?

@Carreau
IPython member
Carreau commented Oct 2, 2012

I don't saw anything obvious, and I don't want to spend too much time on this as draft4 should come out "soon".
Meaning that hopefully a few weeks later the python implementation will be up to date.

I also have to study if we can validate further (like check if instance of sting ends with \nfor example. but this will not be in the draft itself, more on what jsonschema.pyallow to do.

@Carreau
IPython member
Carreau commented Oct 3, 2012

removed jsonpointer.py from nbformat/v3 and rebased.

@Carreau
IPython member
Carreau commented Oct 3, 2012

removed jsonpointer.py from nbformat/v3 and rebased.

@Carreau
IPython member
Carreau commented Oct 3, 2012

removed jsonpointer.py from nbformat/v3 and rebased.

@ellisonbg
IPython member

Did the updated draft come out? What do you think we should do with this PR? It would be great to get it merged, but I am not quite sure what else needs to be done/answered.

@Carreau
IPython member
Carreau commented Nov 2, 2012

well, I so that draft 4 was due "soon" but I didn't looked at what "soon" was meaning between draft 2 and 3... So i'll look at why it is failing on travis and after we can merge.

@Carreau
IPython member
Carreau commented Nov 16, 2012

hum... travis build passes, but github say 'merge with caution'...why ?

@ellisonbg
IPython member

@Carreau I would like to merge this soon, but I see one issue. The command line script will be buried deep in the IPython package, so no one will know it is there and few will ever run it. Do you think it would make sense to create a top level subcommand/application so users could do:

ipython nbvalidate mynotebook.ipynb
@ellisonbg
IPython member

Also, the top level nbvalidate function should have a nice docstring as that is what most people would call.

@Carreau
IPython member
@ellisonbg
IPython member
@Carreau
IPython member
Carreau commented Jan 10, 2013

What makes them invalid? I would prefer if the notebooks we currently
use/generate are defined to be the reference standard. Or is it minor
things?

Almost nothing, some not converted to v3
other just because prompt are '' or ' '.
It would be better to have a 'real' value because right now '
' and ' ' are coupling format and interface as some frontend might not want to display '*' for busy and ' ' from not run. also this mean that the value of 'prompt_number' is either a string or a number... which I don't like.

and then gives a list of the things that were found to be invalid
This is a problem with current version of the library it can't tell you the issue specifically.

It make sens as it can't really now what you ment. Assuming you have a cell that have only 'input' and 'level' attribute,
what error do you throw ? "header cell can't have input" or "codecell can't have level attribute" ? (it's a little more complicated than that in reality) So it just throw "invalid object".

Do you want to do any more work on this PR, or
should we merge it and continue work on a later PR.

I would personally merge as is. All or Nothing is already quite good. We can build custom test as we feel the need.

Let still wait until monday so I have some time to look at it again and see if I can upgrade (if needed) the embeded version of the lib I use. If I don't I can still upgrade this later.

@ellisonbg
IPython member
@ellisonbg
IPython member

@Carreau did you get a chance to look at this?

@Carreau
IPython member
Carreau commented Jan 14, 2013

Not quite with all the PR around.
If you wish you can merge and open an issue assigned to me to upgrade this code.
I do my best but moving this one forward before end of week will be tough.

@ellisonbg
IPython member
@ellisonbg ellisonbg referenced this pull request Jan 14, 2013
Closed

Improve notebook json validation #2786

5 of 6 tasks complete
@ellisonbg
IPython member

Merging this. I have created an issue #2786 to track further work. Thanks for the code!

@ellisonbg ellisonbg merged commit ac9b125 into ipython:master Jan 14, 2013

1 check passed

Details default The Travis build passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.