Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a ipynbv3 json schema and a validator #2396

Merged
merged 11 commits into from Jan 14, 2013

Conversation

Carreau
Copy link
Member

@Carreau Carreau commented Sep 8, 2012

Json Schema is a draft ment to describe the structure of a Json format.
It could be used to validate which file are valid ipynb files.

This is a first draft a ipynb schema and validator, mostly auto generated.
There are still things that could be added, like 'cell level' is 'optional' but could be added as 'required only if cell is of type heading', or added description.
IIUC, one could even create it's own subtype like codecell/markdowncell and reference it to have more fine grained validation.

cf http://json-schema.org/
http://tools.ietf.org/html/draft-zyp-json-schema-03

schema is autogenerated from
http://www.jsonschema.net/index.html#
Then tweeked a little

@Carreau
Copy link
Member Author

Carreau commented Sep 9, 2012

I added a quick implementation off json-references, so that you could references other part of the json file,
this help the reading of the schema by allowing thing like "a_key" : {"$ref": "/code_cell"} , where {"$ref": "/code_cell"} will be replaced at load time with the correct part of the json file.

the validation seem to work :

./validator.py  ~/ipython/docs/examples/notebooks/*.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/00_notebook_tour.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/01_notebook_introduction.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Animations_and_Progress.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Capturing Output.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Script Magics.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/cython_extension.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/display_protocol.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/formatting.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/octavemagic_extension.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/publish_data.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/rawraw.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/rmagic_extension.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/sympy.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/sympy_quantum_computing.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/test.ipynb (56)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/test_diff.ipynb (20)
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/trapezoid_rule.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/truc.ipynb (6) 

and should be refined.

Here is a test example, where I require code_cell to have a prompt_number property (even if set to null:

./validator.py  ~/ipython/docs/examples/notebooks/*.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/00_notebook_tour.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/01_notebook_introduction.ipynb (1)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Animations_and_Progress.ipynb (8)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Capturing Output.ipynb (13)
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/Script Magics.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/cython_extension.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/display_protocol.ipynb (15)
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/formatting.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/octavemagic_extension.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/publish_data.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/rawraw.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/rmagic_extension.ipynb (1)
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/sympy.ipynb
[Pass] /Users/bussonniermatthias/ipython/docs/examples/notebooks/sympy_quantum_computing.ipynb
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/test.ipynb (56)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/test_diff.ipynb (20)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/trapezoid_rule.ipynb (1)
[    ] /Users/bussonniermatthias/ipython/docs/examples/notebooks/truc.ipynb (6)

and indeed, some code cell have an empty prompt number.
Also, through this way, I discovered that prompt_number is of type number or string as some prompt number are...nbsp;. I don't think it should happend, and prompt number should be number or null.

Thought ?

@ellisonbg
Copy link
Member

When I wrote the python code to create the various notebook parts, I tried to not assume that various elements would always be present as much as possible. If we move to a schema (I like the idea) we might want to revisit that code to make sure that notebooks created using that Python code have all the right elements.

@Carreau
Copy link
Member Author

Carreau commented Sep 9, 2012

Well, we can put element as not required, this is not a problem.

I would just like to have a nice descriptive of what the ipynb format is, even if for now it only gives warning for non conform instances.

I wrote that to be able to validate some test I made to generate notebook, and it can make a solid base to discuss the notebook format or make it evolve. Especially if we do cell by cell syncing, it would be great to be sure the format is always consistent.

We could also validate the messaging api with schema also. I still have to take a look ad wether there is a way to validate value and not only keys and structures.

@ellisonbg
Copy link
Member

Again, I really like the idea of having a validator. For example, the
server should validate newly uploaded notebooks. How about the following.
Why don't you go through the nbformat and take a first guess and which
attributes should be required and which should be optional. Then we can
look through that list, discuss and finalize it. But I love this idea
and appreciate your looking into this.

On Sun, Sep 9, 2012 at 1:01 PM, Bussonnier Matthias <
notifications@github.com> wrote:

Well, we can put element as not required, this is not a problem.

I would just like to have a nice descriptive of what the ipynb format is,
even if for now it only gives warning for non conform instances.

I wrote that to be able to validate some test I made to generate notebook,
and it can make a solid base to discuss the notebook format or make it
evolve. Especially if we do cell by cell syncing, it would be great to be
sure the format is always consistent.

We could also validate the messaging apihttp://ipython.org/ipython-doc/dev/development/messaging.htmlwith schema also. I still have to take a look ad wether there is a way to
validate value and not only keys and structures.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2396#issuecomment-8408614.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@Carreau
Copy link
Member Author

Carreau commented Sep 10, 2012

That's seem fine with me.
l guess this would then be notebook 3.1 maybe.
I'll look into that.

@Carreau
Copy link
Member Author

Carreau commented Sep 11, 2012

So, here we go.
I did most of the changes I think are reasonable,

biggest changes are :

prompt_number, required for code cell, can't be string, but can be null.
commits name are more or less explicit.

result of ./validator.py -s v3.withref.json ~/ipython/**/*.ipynb (-v is a little too verbose)

[    ] /Users/matthiasbussonnier/ipython/docs/examples/lib/BackgroundJobs.ipynb (22)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/00_notebook_tour.ipynb (1)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/01_notebook_introduction.ipynb (1)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/Animations_and_Progress.ipynb (8)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/Capturing Output.ipynb (13)
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/Script Magics.ipynb
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/cython_extension.ipynb
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/display_protocol.ipynb (15)
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/formatting.ipynb
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/octavemagic_extension.ipynb
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/publish_data.ipynb
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/rmagic_extension.ipynb (1)
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/sympy.ipynb
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/sympy_quantum_computing.ipynb
[    ] /Users/matthiasbussonnier/ipython/docs/examples/notebooks/trapezoid_rule.ipynb (1)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/parallel/Parallel Magics.ipynb (24)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/parallel/helloworld.ipynb (1)
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/parallel/options/Parallel MC Options.ipynb
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/parallel/parallel_mpi.ipynb
[    ] /Users/matthiasbussonnier/ipython/docs/examples/parallel/rmt/rmt.ipynb (29)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/parallel/task1.ipynb (1)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/parallel/taskmap.ipynb (1)
[    ] /Users/matthiasbussonnier/ipython/docs/examples/tests/heartbeat/gilsleep.ipynb (6)
[Pass] /Users/matthiasbussonnier/ipython/docs/examples/tests/pylab-switch/Pylab Switching.ipynb
[    ] /Users/matthiasbussonnier/ipython/docs/examples/widgets/directview/directview.ipynb (3)

One thing that could be added is to require name for worksheets.

@Carreau
Copy link
Member Author

Carreau commented Sep 28, 2012

Put dependencies in external with licences.

Thought ?

@@ -0,0 +1,222 @@
# -*- coding: utf-8 -*-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not in externals?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because mv != git mv ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix it tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following you, why is mv relevant?

On Tue, Oct 2, 2012 at 2:00 PM, Bussonnier Matthias <
notifications@github.com> wrote:

In IPython/nbformat/v3/jsonpointer.py:

@@ -0,0 +1,222 @@
+# -- coding: utf-8 --

Because mv != git mv ?


Reply to this email directly or view it on GitHubhttps://github.com//pull/2396/files#r1743857.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to say that I moved it to external and git add it (so it's duplicated and I should remove this one).
I should have make a git mv which would have told git not to track this particular file.

@ellisonbg
Copy link
Member

@Carreau what else do you think needs discussion on this. Did you have a chance to look through the nbformat and decide which attributes can be null?

@Carreau
Copy link
Member Author

Carreau commented Oct 2, 2012

I don't saw anything obvious, and I don't want to spend too much time on this as draft4 should come out "soon".
Meaning that hopefully a few weeks later the python implementation will be up to date.

I also have to study if we can validate further (like check if instance of sting ends with \nfor example. but this will not be in the draft itself, more on what jsonschema.pyallow to do.

@Carreau
Copy link
Member Author

Carreau commented Oct 3, 2012

removed jsonpointer.py from nbformat/v3 and rebased.

cf http://json-schema.org/
http://tools.ietf.org/html/draft-zyp-json-schema-03

autogenerated from
http://www.jsonschema.net/index.html#
then tweeked.

This version makes all our notedooks in docs/example/notebooks valid
@Carreau
Copy link
Member Author

Carreau commented Oct 3, 2012

removed jsonpointer.py from nbformat/v3 and rebased.

@Carreau
Copy link
Member Author

Carreau commented Oct 3, 2012

removed jsonpointer.py from nbformat/v3 and rebased.

@ellisonbg
Copy link
Member

Did the updated draft come out? What do you think we should do with this PR? It would be great to get it merged, but I am not quite sure what else needs to be done/answered.

@Carreau
Copy link
Member Author

Carreau commented Nov 2, 2012

well, I so that draft 4 was due "soon" but I didn't looked at what "soon" was meaning between draft 2 and 3... So i'll look at why it is failing on travis and after we can merge.

@Carreau
Copy link
Member Author

Carreau commented Nov 16, 2012

hum... travis build passes, but github say 'merge with caution'...why ?

@ellisonbg
Copy link
Member

@Carreau I would like to merge this soon, but I see one issue. The command line script will be buried deep in the IPython package, so no one will know it is there and few will ever run it. Do you think it would make sense to create a top level subcommand/application so users could do:

ipython nbvalidate mynotebook.ipynb

@ellisonbg
Copy link
Member

Also, the top level nbvalidate function should have a nice docstring as that is what most people would call.

@Carreau
Copy link
Member Author

Carreau commented Jan 10, 2013

Short from my phone.

The problems are that:
Library is not user friendly and throw unreadable traceback reason of why
it is not valid.
Most of our notebook are not.
Notebook generate invalid one.

I would suggest a 3 step approach.

  • use as internal tool test manually until all notebooks are fixed.
  • auto test with nose.
  • make the tool public/top level.

How does that sound?
Le 9 janv. 2013 23:46, "Brian E. Granger" notifications@github.com a
écrit :

Also, the top level nbvalidate function should have a nice docstring as
that is what most people would call.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2396#issuecomment-12071120.

@ellisonbg
Copy link
Member

Matthias,

On Thu, Jan 10, 2013 at 12:20 AM, Bussonnier Matthias <
notifications@github.com> wrote:

Short from my phone.

The problems are that:
Library is not user friendly and throw unreadable traceback reason of why
it is not valid.
Most of our notebook are not.
Notebook generate invalid one.

What makes them invalid? I would prefer if the notebooks we currently
use/generate are defined to be the reference standard. Or is it minor
things?

I would suggest a 3 step approach.

  • use as internal tool test manually until all notebooks are fixed.
  • auto test with nose.
  • make the tool public/top level.

How does that sound?

Sounds good. I think the first step would be to write a top -level
nbvalidate function that returns True/False (valid or not) and then gives a
list of the things that were found to be invalid. Then that can be used to
build the other tools. Do you want to do any more work on this PR, or
should we merge it and continue work on a later PR.

Cheers,

Brian

Le 9 janv. 2013 23:46, "Brian E. Granger" notifications@github.com a
écrit :

Also, the top level nbvalidate function should have a nice docstring as
that is what most people would call.


Reply to this email directly or view it on GitHub<
https://github.com/ipython/ipython/pull/2396#issuecomment-12071120>.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2396#issuecomment-12084617.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@Carreau
Copy link
Member Author

Carreau commented Jan 10, 2013

What makes them invalid? I would prefer if the notebooks we currently
use/generate are defined to be the reference standard. Or is it minor
things?

Almost nothing, some not converted to v3
other just because prompt are '' or ' '.
It would be better to have a 'real' value because right now '
' and ' ' are coupling format and interface as some frontend might not want to display '*' for busy and ' ' from not run. also this mean that the value of 'prompt_number' is either a string or a number... which I don't like.

and then gives a list of the things that were found to be invalid
This is a problem with current version of the library it can't tell you the issue specifically.

It make sens as it can't really now what you ment. Assuming you have a cell that have only 'input' and 'level' attribute,
what error do you throw ? "header cell can't have input" or "codecell can't have level attribute" ? (it's a little more complicated than that in reality) So it just throw "invalid object".

Do you want to do any more work on this PR, or
should we merge it and continue work on a later PR.

I would personally merge as is. All or Nothing is already quite good. We can build custom test as we feel the need.

Let still wait until monday so I have some time to look at it again and see if I can upgrade (if needed) the embeded version of the lib I use. If I don't I can still upgrade this later.

@ellisonbg
Copy link
Member

Matthias,

On Thu, Jan 10, 2013 at 11:22 AM, Bussonnier Matthias <
notifications@github.com> wrote:

What makes them invalid? I would prefer if the notebooks we currently
use/generate are defined to be the reference standard. Or is it minor
things?

Almost nothing, some not converted to v3
other just because prompt are '' or ' '.
It would be better to have a 'real' value because right now '
' and ' '
are coupling format and interface as some frontend might not want to
display '*' for busy and ' ' from not run. also this mean that the value of
'prompt_number' is either a string or a number... which I don't like.

and then gives a list of the things that were found to be invalid
This is a problem with current version of the library it can't tell you
the issue specifically.

It make sens as it can't really now what you ment. Assuming you have a
cell that have only 'input' and 'level' attribute,
what error do you throw ? "header cell can't have input" or "codecell
can't have level attribute" ? (it's a little more complicated than that in
reality) So it just throw "invalid object".

OK I understand, probably the best we can do it to just return True or
False then.

Do you want to do any more work on this PR, or
should we merge it and continue work on a later PR.

I would personally merge as is. All or Nothing is already quite good. We
can build custom test as we feel the need.

Let still wait until monday so I have some time to look at it again and
see if I can upgrade (if needed) the embeded version of the lib I use. If I
don't I can still upgrade this later.

OK I am fine with merging as is, but I will let you have a look and do the
merge yourself. Great work!


Reply to this email directly or view it on GitHubhttps://github.com//pull/2396#issuecomment-12113930.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@ellisonbg
Copy link
Member

@Carreau did you get a chance to look at this?

@Carreau
Copy link
Member Author

Carreau commented Jan 14, 2013

Not quite with all the PR around.
If you wish you can merge and open an issue assigned to me to upgrade this code.
I do my best but moving this one forward before end of week will be tough.

@ellisonbg
Copy link
Member

Ok I will merge and create an issue.

On Mon, Jan 14, 2013 at 10:49 AM, Bussonnier Matthias <
notifications@github.com> wrote:

Not quite with all the PR around.
If you wish you can merge and open an issue assigned to me to upgrade this
code.
I do my best but moving this one forward before end of week will be tough.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2396#issuecomment-12233262.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@ellisonbg ellisonbg mentioned this pull request Jan 14, 2013
6 tasks
@ellisonbg
Copy link
Member

Merging this. I have created an issue #2786 to track further work. Thanks for the code!

ellisonbg added a commit that referenced this pull request Jan 14, 2013
create a ipynbv3 json schema and a validator
@ellisonbg ellisonbg merged commit ac9b125 into ipython:master Jan 14, 2013
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
create a ipynbv3 json schema and a validator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants