New feature: Read input notebook from github #556

onevirus · 2020-11-30T14:46:35Z

In my org, I store every notebook in github and tweak papermill to read notebook from github directly.
Because we use papermill heavily in production and we need to version notebooks.

This is how we use

import papermill as pm

pm.execute_notebook(
   'https://github.com/nteract/papermill/blob/main/papermill/tests/notebooks/read_check.ipynb',
   'path/to/output.ipynb',
   parameters = dict(alpha=0.6, ratio=0.1)
)

Take just url of notebook like binder and nbviewer.
This has some pros.
Some teams use only dev / master branches.(Read from dev branch in dev env, read from master branch in prd env)
Other teams use tagging for versioning notebooks.
We don't need storage for notebooks.(We put output notebooks in gcs)

How do you think ?

ronytesler · 2020-12-07T16:09:23Z

I'd like to have it.
We run a notebook remotely in a google cloud notebook instance. Is there a way to watch the progress of the notebook as it runs? Instead of waiting it to finish (how do I know when it's finished or if it was run at all?).

MSeal · 2020-12-07T20:53:28Z

This is a good pattern to use for reading from git as a read-only source. If someone wanted to invest a little time in making a new IO Handler for reading git this library would be useful to use: GitPython. I'd be happy to review / merge such an improvement.

MSeal · 2020-12-07T20:55:33Z

We run a notebook remotely in a google cloud notebook instance. Is there a way to watch the progress of the notebook as it runs? Instead of waiting it to finish (how do I know when it's finished or if it was run at all?).

If you're using the CLI the terminal outputs progress (there's a few options to control this). Additionally it's saving the notebook output after each cell and periodically within a cell so refreshing the destination location in a notebook browser will show progress as well, albeit not in real-time necessarily.

ronytesler · 2020-12-07T20:58:28Z

@MSeal I use the instance's startup script, which uses papermill to execute the notebook. I run 'gcloud reset' on the machine so it would be started and the startup script will run. Is there a different way I can remotely run the notebook and also see its progress as you said?

onevirus · 2020-12-09T01:53:01Z

@MSeal
I checked GitPython. IMHO, GitPython looks not suitable in this case. What we need is download a file from github and git doesn't have this functionality(git checkout not a file but whole repo). So, I think we need to use github api directly or package for github like PyGithub. In nbviewer, they use github rest api. For gitlab, I think we need another io handler for gitlab.
Anyway, if you don't mind using github api, I'll tackle it.

MSeal · 2020-12-09T04:15:58Z

@onevirus That sounds reasonable for what you're targeting. I can imagine a more general git solution as well since there's a lot of git repos that aren't github/gitlab. But that being said github is the most popular in open source so I think optimizing for that end is worth the effort.

MSeal · 2020-12-09T04:20:47Z

@ronytesler this is somewhat a different topic than the issue that was opened here, but usually you have the startup script logging to a logging sink that captures the stdout/stderr and makes it available to view. Papermill in and of itself doesn't manage this as it's a bit out of scope of the project. Managed execution of VMs or containers isn't the easiest to navigate but most of the solutions involve monitoring those standard outputs and triggering said executors on demand in some execution context. In this story arch papermill's responsibility is to output log text, notebook saves, and manage the kernel locally within that context.

MSeal added enhancement new-contributor-friendly labels Dec 7, 2020

onevirus mentioned this issue Jul 22, 2021

Read notebook from github #622

Merged

MSeal closed this as completed in #622 Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New feature: Read input notebook from github #556

New feature: Read input notebook from github #556

onevirus commented Nov 30, 2020

ronytesler commented Dec 7, 2020

MSeal commented Dec 7, 2020

MSeal commented Dec 7, 2020

ronytesler commented Dec 7, 2020

onevirus commented Dec 9, 2020

MSeal commented Dec 9, 2020

MSeal commented Dec 9, 2020

New feature: Read input notebook from github #556

New feature: Read input notebook from github #556

Comments

onevirus commented Nov 30, 2020

ronytesler commented Dec 7, 2020

MSeal commented Dec 7, 2020

MSeal commented Dec 7, 2020

ronytesler commented Dec 7, 2020

onevirus commented Dec 9, 2020

MSeal commented Dec 9, 2020

MSeal commented Dec 9, 2020