Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for text only notebooks (python scripts, R markdown) in Jupyter #3694

Open
mwouts opened this issue Jun 17, 2018 · 19 comments
Open

Support for text only notebooks (python scripts, R markdown) in Jupyter #3694

mwouts opened this issue Jun 17, 2018 · 19 comments

Comments

@mwouts
Copy link

mwouts commented Jun 17, 2018

(R) markdown and Jupyter notebooks are two formats for notebooks that target similar functionality. Unfortunately, in practice they don't overlap much, as

  • (R) markdown notebooks are edited within RStudio (and now offer support for python)
  • Jupyter notebooks are edited within Jupyter

Imagine we could take the best from both worlds...

  • Markdown notebooks under version control are easier to merge
  • Markdown notebooks are lighter, since they don't store the output. That could solve Poor performance when saving a notebook over SSH #939
  • Markdown notebooks offer a very customizable rendering of code and outputs, and an impressive support for HTML slides
  • Jupyter notebooks include outputs, and could be a reference for R markdown notebooks with saved outputs
  • Jupyter notebooks reach a larger audience

Would Jupyter team consider the possibility of adding a new format for notebooks? Could that format be compatible with the already existing Rmd standard?

(R) Markdown to Jupyter converters are found at:

@mwouts
Copy link
Author

mwouts commented Jun 19, 2018

Most converters use Jupyter cell metadata to map the cell-specific rendering options available in R markdown. Yet no standard metadata seems yet covers this usage. Let's see if we can normalize this at #3700

@rgbkrk
Copy link
Member

rgbkrk commented Jun 19, 2018

Generally I'm interested in exploring new formats and supporting markdown formats more readily. Most of what holds things back is a desire to maintain the current formats people are working with.

@grst
Copy link

grst commented Jun 20, 2018

Would be nice to have a text-based format officially supported.
I don't see how this conflicts with maintaining compatibility with the current format.

Do you have suggestions for a way forward?

@rgbkrk
Copy link
Member

rgbkrk commented Jun 20, 2018

I don't see how this conflicts with maintaining compatibility with the current format.

Apologies, "maintain" was a bit loaded. Maintenance includes keeping old things working and moving current things forward. The limitation is in time / resources, not in ability.

Do you have suggestions for a way forward?

Here are some approaches to start with

  • Find all the current proposals and proposers. Collate the approaches in an issue (like this one!)
  • Propose your own and implement it in at least two frontends (classic notebook, nbconvert, jupyterlab, nteract, etc.)

@mwouts
Copy link
Author

mwouts commented Jul 19, 2018

Hello @rgbkrk , we've made some progresses on our own implementations, maybe it's a good time to chat a bit more on alternative, text-only formats for the notebooks.

I have experimented two directions

  • Jupyter notebooks as R markdown, at nbrmd. That solution has the advantage of being based on a well known standard, and possibly enhances Jupyter notebooks with the advanced conversion tools for R markdown (see the recent R Markdown: The Definitive Guide announcement)
  • Jupyter notebooks as plain python scripts at nbsrc.

As suggested by @grst, the text version works in pair with the standard .ipynb file. When the text version of the notebook is edited outside of Jupyter, and the notebook is reloaded, then inputs are taken from text version, and outputs from the .ipynb file.

We have implemented support for

  • jupyter notebook
  • jupyter nbconvert

Support for jupyter lab is in the pipe, and depends on jupyterlab/jupyterlab#3896.

I think the python script notebook is promising, as it sounds familiar to most Jupyter users, and allows the edition of notebooks in python IDEs. First difficulty however it to make sure that we start with a good file format for this. May I have your feedback on the proposed specifications? (I note that there used to be an official solution for this with nbpy in nbformat v3, but for some reason it was not ported to v4)

@rgbkrk
Copy link
Member

rgbkrk commented Jul 20, 2018

Thank you! I'm loving what I see so far. @mpacer -- check this out ^__^

@mwouts mwouts changed the title Support for markdown notebooks in Jupyter Support for text only notebooks (python scripts, R markdown) in Jupyter Aug 21, 2018
@mwouts
Copy link
Author

mwouts commented Aug 21, 2018

Hi Kyle, I just published a new release of the nbrmd package (companion Python script, or R markdown notebook, for Jupyter notebooks). I think it's becoming good enough for starting a larger beta-testing phase. Feel free to forward to people how may be interested. Thanks

@dsblank
Copy link
Member

dsblank commented Aug 21, 2018

@mwouts Very cool! I don't think I have ever seen a code coverage that high (95%) for such a project.

Have you thought about a related variation:

  1. A command that, while editing the notebook, turn all cells into one big code cell cell that has been parsed into your pure Python/R version.
  2. User edits the big code cell (moves sections around, etc)
  3. Parse the pure code cell back into its individual notebook cells.

That would make the notebook editor able to do anything that a regular editor can.

Also, do you actually need the ipynb file? If it didn't exist (or wasn't visible) it would be less confusing to the user as to what the goal of your project is.

Thanks for sharing!

@mwouts
Copy link
Author

mwouts commented Aug 21, 2018

Thanks @dsblank , I appreciate your comments! Actually this is my first open source contribution, so I dedicated quite a lot of time, indeed, to coverage...

Thanks for suggesting the variation, that's interesting. Do I understand correctly that you suggest a new cell magic that would work similarly to %load and %%writefile, but instead of the cell, allow to edit the code for the full notebook?

I agree that editing the python code directly is very comfortable. My current process for this is

  • open notebook in Jupyter, with metadata (or global config) nbrmd_formats="ipynb,py"
  • save notebook (both py and ipynb files are created/updated)
  • open py file in an IDE, refactor it there
  • refresh Jupyter notebook in Jupyter (Ctrl+R, after making sure URL has no # after file name)

What I obtain at that point is

  • a notebook with up-to-date inputs taken from the py file
  • outputs that match unchanged inputs are taken from the ipynb file
  • Jupyter kernel is unchanged

I think the result is similar in functionality to what you're expecting above - could you confirm? Differences to your suggestion are

  • source edition is done with a third party tool
  • a file is created on disk
  • and the user needs to refresh the notebook.

Regarding your second question, when a user does not want the ipynb file, he should change the default configuration (equal to "ipynb") to

c.ContentsManager.default_nbrmd_formats = ""

But then, he won't be able to store (source) notebooks outputs. Also, the above manipulation would have the effect of removing all outputs from the notebook.

I agree that having files working in pair is not that common, yet I think the purpose of each is clear. For instance, when nbrmd_formats="ipynb,py",

  • each of py and ipynb file opens the same notebook in Jupyter. And Jupyter updates both.
  • the py file is the source of the notebook, ipynb both contains source and outputs
  • sharing just one of these is enough to reconstruct the notebook (provided that running the notebook yields the current ipynb outputs!)

@dsblank
Copy link
Member

dsblank commented Aug 21, 2018

@mwouts Yes, something similar. I imagine a keyboard command, or a button that you would press, that would turn the entire notebook into one cell. After editing it as usual, then pressing the button again would parse it back into many cells of the appropriate type.

I may have made a mistake, but when I edited the .ipynb (using the myminder), saved, and closed, the .py file wasn't updated. That's why I was confused.

@mwouts
Copy link
Author

mwouts commented Aug 21, 2018

I see - in other words, the command or button would switch the notebook view to the text editor view, and then refresh the notebook view again. It sounds feasible - switching from raw view to notebook view should not be too difficult, but that belongs to a different part of Jupyter (i.e. notebook or lab application) than the one I've been exploring till now (the contents manager).

May I ask which file you have been editing? The default 'nbrmd_formats' for the binder demo is 'ipynb', thus saving notebook will not create py files, until a "nbrmd_formats" metadata is added to the file (that includes "py" in its values).

A sample py/ipynb file with that metadata activated is 'Sample notebook with python representation' in the demo folder - a notebook that actually describes an experimentation similar to the one we are discussing (switching manually from notebook to editor view on py, and refresh notebook).

@dsblank
Copy link
Member

dsblank commented Aug 23, 2018

It doesn't have to switch to the text editor view, just concat everything (via your algorithm to convert to text) as one cell in the notebook. Then do the reverse (parse back into cells via your code) when done editing.

I'll have to try the mybinder again.

@mwouts
Copy link
Author

mwouts commented Aug 23, 2018

Sure ! Please retry the mybinder, especially the Sample notebook with python representation, which I do expect to work (tested yesterday).

Would you like to open an issue at https://github.com/mwouts/nbrmd/ ? Converting notebook to text is as easy as text = nbrmd.writes(nb, ext='.py'), and reading back is nb = nbrmd.reads(text, ext='.py'). You could also try that in the mybinder with:

import nbrmd
nb = nbrmd.readf('Sample notebook with python representation.ipynb') # Load notebook from file
text = nbrmd.writes(nb, ext='.py')                                   # Write to text
nb = nbrmd.reads(text, ext='.py')                                    # Read notebook from text

However, as mentioned above I have no idea how to map a Jupyter button or shortcut to these functions, so we will probably have to work together on this ;-)

@mwouts
Copy link
Author

mwouts commented Sep 1, 2018

@rgbkrk, I like the Hydrogen package for Atom, and also the fact that it works with classical python scripts. Is it right that, currently, it cannot open Jupyter notebooks? I have the feeling that we could change that easily with the jupytext package (same project as before, we just found a better name!).

Command line conversion are

jupytext notebook.ipynb .py                      # create or overwrite notebook.py
jupytext notebook.py      .ipynb --update  # update notebook.ipynb (preserve outputs for unchanged cells)

Running these commands when the file is opened/closed seems fairly accessible, right? Obviously you may prefer the jupytext package to become stable first, but I think we're not so far now (feedback is very welcome!).

Possibly too much anticipation for now, but I also note that we could even collect the outputs of executed code in Hydrogen to update the original Jupyter notebook - jupytext already does the matching of python cells versus the notebook, and if I'm correct, the outputs you are getting from the kernel look like very much the ones in the ipynb file...

@rgbkrk
Copy link
Member

rgbkrk commented Sep 4, 2018

Finally got a chance to read through this issue today. 😄

That's right that Hydrogen does not open jupyter notebooks. It also doesn't use the jupyter notebook server, instead opting to connect directly to kernels. The outputs are all in memory (and yes the same format as the notebook as they come straight from the kernels).

@BenRussert ^^ see above

@mwouts
Copy link
Author

mwouts commented Sep 4, 2018

Great! So it should be doable to update Jupyter notebooks with results of execution within Hydrogen. But we will see that later on, right?

For now we could discuss identification of cells in python scripts (and possibly in a specific Hydrogen issue if you prefer!). If I get it correctly, in Hydrogen the start of cell pattern is # %% - while in jupytext it is # + {} (with metadata allowed inside the curly brackets). How do you identify end of cell in Hydrogen (in jupytext: end of python paragraph when no explicit start of cell is provided, otherwise next explicit start of cell, or # -) ?

@rgbkrk
Copy link
Member

rgbkrk commented Sep 4, 2018

I think it's the next occurrence of the # %% marker (or the end of the document if not another marker). I personally haven't been using it.

@BenRussert
Copy link

So it should be doable to update Jupyter notebooks with results of execution within Hydrogen. But we will see that later on, right?

Yes, this could even be done via a hydrogen plugin package using kernel middleware. This will make development much cleaner and easier. I will be very happy to help you understand the api and middleware and may even get this repo started if I find the time. For more info, check out the example plugins listed on our readme. I'll add a couple more links to that page later, you want an example using kernel middleware which most of these do.

If I get it correctly, in Hydrogen the start of cell pattern is # %% - while in jupytext it is # + {} (with metadata allowed inside the curly brackets). How do you identify end of cell in Hydrogen (in jupytext: end of python paragraph when no explicit start of cell is provided, otherwise next explicit start of cell, or # -) ?

The cell patterns below are all currently supported, of course, we could discuss and change that in a future version if people agree that is the way to go. Here is an example file containing a total of 5 cells (using all the supported comment markers). You can try this out in hydrogen.

print("The first cell is implied, so you dont have to put the separator comment there")

# empty lines or normal comments are just ignored

print("This is still part of the first cell")
# %%
print("This is the second cell. Empty lines do not make a difference, only cell separator comments")

# <codecell>
print('This is another (currently) supported cell syntax')

# In[]:
print('Another syntax, mimicing a notebook execution count')

# In[42]:
print('You can put numbers here if you want')

# anything below the last cell comment will be part of the last cell

You can get more info about the current way cells work in hydrogen here

Thanks for the mention @rgbkrk , this is good stuff!

@mwouts
Copy link
Author

mwouts commented Sep 15, 2018

Thanks @BenRussert . I like very much Hydrogen and also the concept of working with simple scripts!

I will not be able to contribute a JavaScript as I have do not know the language. Yet I will soon be working on a Python plugin for Jupyter that will allow to open and edit Hydrogen Python scripts in Jupyter as notebooks (and save notebooks as Hydrogen scripts), which I think may still be useful to Hydrogen users.

I will be happy to discuss an extension of Hydrogen's syntax at mwouts/jupytext#59. Most important question in my opinion is how to represent cell type and cell metadata.

A little detail: I intend to support only the # %% syntax there, in order to offer a stable round trip conversion (I guess the other syntaxes are found in the scripts generated by either nbformat.v3.nbpy or jupyter nbconvert --to script, i.e. scripts that were extracted from a Jupyter notebook, and could be generated directly as Hydrogen compatible script from the notebook itself using Jupytext).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants