Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import, export and update Jupyter notebooks with Jupytext? #1404

Closed
mwouts opened this issue Sep 5, 2018 · 5 comments
Closed

Import, export and update Jupyter notebooks with Jupytext? #1404

mwouts opened this issue Sep 5, 2018 · 5 comments

Comments

@mwouts
Copy link

mwouts commented Sep 5, 2018

I've recently been working on jupytext, a Jupyter notebook to Python script converter. The converter can convert in the two directions, and has a well tested support for round trip conversion. It can also update the inputs of an existing Jupyter notebooks, and preserve outputs when these match unchanged inputs.

I currently use this from both Jupyter notebook (our contents manager there creates both the .ipynb and the .py file, and reads inputs preferentially from .py) and from a Python IDE. This is very convenient for

  • collaborating (or versioning) notebooks
  • design or refactor notebooks as text with the confort of a good IDE

I think the same technology would easily allow Hydrogen to offer support for opening/exporting to Jupyter notebooks. Would you be interested in working together in that direction?

First challenge would be to agree on the cell markers! Do you have a documentation on how cells are defined in Hydrogen (I've seen #1296)? If necessary, I'm ready to evolve the current markers of jupytext, cf. #57.

@mwouts
Copy link
Author

mwouts commented Sep 5, 2018

See also a preliminary chat on this with @rgbkrk at jupyter/notebook#3694

@kylebarron
Copy link
Contributor

kylebarron commented Sep 5, 2018

This looks interesting, and better support for working with Jupyter Notebooks in Hydrogen is necessary.

A couple comments:

  1. To my knowledge, Hydrogen tries very hard to be completely language agnostic. How does Jupytext convert from Python scripts to Notebook? Does it use the Python interpreter or the Jupyter Python kernel? If it's the latter, it might be possible to have it use any Jupyter kernel, and thus let it be language agnostic.
  2. I could be wrong, but I don't think it's possible to embed Python code in Atom. I believe it only uses Javascript.
  3. For reference, most of the current code to export an arbitrary file as a Notebook is here.
  4. The cell markers used are described here:
    const regexString = `${escapedCommentStartString}(%%| %%| <codecell>| In\[[0-9 ]*\]:?)`;

@mwouts mwouts closed this as completed Sep 5, 2018
@mwouts mwouts reopened this Sep 5, 2018
@mwouts
Copy link
Author

mwouts commented Sep 6, 2018

Thanks @kylebarron, these are enlightening comments. Jupytext is a plain python package, and offers a command line converter, but also a contents manager for Jupyter.

The most convenient use case for Jupytext is to configure Jupyter notebook (or Jupyter lab) to use Jupytext's content manager, and then allow Jupyter notebook to open python scripts as notebooks (and save notebooks as scripts). The scripts can be edited in any IDE, including Atom+Hydrogen. Refreshing the notebook in Jupyter (Ctrl+R) will then display the latests edits, and preserve outputs for unchanged cells if you use paired notebooks.

To answer your question on Python interpreter or kernel, I would say that Jupytext runs in the same python process as the Jupyter application, not in a kernel (but, as a python package, it could also operate in a python kernel).

I have no experience of Javascript, and I'm afraid it would be painful to translate the python script parser of Jupytext from Python to Javascript. Clearly, Jupytext's identification of code/markdown cells in arbitrary python scripts is complex and fragile compared to the fully explicit JSON notebooks. It is also a bit more complex than the identification of Hydrogen's cells markers. In Jupytext, cells are often implicit (commented paragraphs are markdown, adjacent blocks of python code are a code cell). Only code cells made of multiple blocks have start and end markers.

The purpose of this is to

  • allow any python file to become a notebook (without adding any marker)
  • reversely, have python scripts generated from notebooks that also look natural
  • preserve cells on round trip conversion.

I understand that Hydrogen requires a representation of notebooks as scripts that is easier to parse into cells. I would be interested in working on a prototype of a notebook to script two way converter, that could build a script that is easily parsed by Hydrogen (the scripts currently generated by Jupytext are currently recognized as a single cell, if I am correct). For that, would you have recommendations on how to represent markdown cells (and distinguish them from commented code cells), as well as cell metadata (would a json dictionary after the cell marker, as suggested here, be acceptable ?)

@mwouts
Copy link
Author

mwouts commented Sep 22, 2018

Hello @kylebarron, I just published a new RC for Jupytext that implements a notebook converter for the double percent format, cf. mwouts/jupytext#59.

Would you have a minute to review the format specifications? In order to support notebook and cell metadata I had to extend the specifications, cf. the README.

If you also want to test it (which would be very helpful), the rc is available with

pip install jupytext==0.7.0rc0

Thanks!

@mwouts
Copy link
Author

mwouts commented Sep 23, 2018

The double percent format is available in Jupytext v0.7.0.

I will now close this issue, as Jupytext cannot be used directly from within Hydrogen/Atom (as it is coded in Python).

Yet let me hope that the future implementation of Hydrogen's converter can be compatible with that of Jupytext (and I'm ready to evolve the current implementation!). Please free to challenge cell header, and how to best represent markdown cells (commented code, or multiline strings?). See also Spyder's input at spyder-ide/spyder#7933.

For reference, the regular expression I currently use to parse the cell title, type and metadata is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants