Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a small CLI? #4

Closed
choldgraf opened this issue Jan 28, 2020 · 38 comments
Closed

Add a small CLI? #4

choldgraf opened this issue Jan 28, 2020 · 38 comments

Comments

@choldgraf
Copy link
Contributor

choldgraf commented Jan 28, 2020

How do folks feel about a lightweight CLI that lets people execute notebooks from the command-line? e.g., something that would replace

jupyter nbconvert --to notebook --execute mynotebook.ipynb

Something like

jupyter execute mynotebook.ipynb

or

nbclient execute mynotebook.ipynb

?

Alternatively, we could recommend that people use papermill if they wish to execute from the command line? (https://papermill.readthedocs.io/en/latest/usage-execute.html)

@mgeier
Copy link
Contributor

mgeier commented Jan 28, 2020

What about:

python3 -m nbclient mynotebook.ipynb --inplace

@choldgraf
Copy link
Contributor Author

that could work as long as nbclient only did one thing, which was executing notebooks 👍

@mgeier
Copy link
Contributor

mgeier commented Jan 28, 2020

That's what I was wondering all along: what else does nbclient?

@mgeier
Copy link
Contributor

mgeier commented Jan 30, 2020

I would be genuinely interested in what else nbclient is supposed to do.

If it does more than one thing, the natural way to call it on the command line would be this:

python3 -m nbclient.execute mynotebook.ipynb --inplace

This is just like e.g. Python's built-in HTTP server is started:

python3 -m http.server

@MSeal
Copy link
Contributor

MSeal commented Feb 4, 2020

For now the intention is to have a very narrow scoped notebook executor that has few dependencies, like jupyter_client does for interacting with kernels. This library doesn't handle output IO and does one thing well. It'll likely get an async execution pattern, but beyond that I don't see major responsibilities being added to the core the library for now.

Coupling execution with transformation changes made improving or fixing execution patterns very slow to update in nbconvert. Thus nbclient is meant to just execute notebooks in memory without execution opinions or notebook manipulations. nbconvert will keep it's ExecuteProcessor which will wrap this library so we don't drop support for nbconvert's piping pattern. And similarly papermill will keep it's opinionated execution pattern with parameterization and input / output isolation but without needing the rest of nbconvert's dependencies. Both can lean on the in-memory execution model here and implement just the tool specific concerns. Given that, I'm not sure if we need a CLI for this library with the two downstream opinionated patterns. But it wouldn't be hard to add one if it feels useful to folks.

@choldgraf
Copy link
Contributor Author

Makes sense - I don't think a CLI is anything urgent...maybe something to stew on and see if others jump in and make the same request over time

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/how-to-run-a-notebook-using-command-line/3475/6

@cdeil
Copy link

cdeil commented Jun 24, 2020

I would like to execute a bunch of notebooks just to time their execution, without having to write output files and clean them up after. Don't care really where this functionality is offered, any of the suggestions mentioned above seems fine. This could be another simple option to implement?

jupyter nbconvert --to none

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/how-to-run-a-notebook-using-command-line/3475/10

@choldgraf
Copy link
Contributor Author

@MSeal recommended using papermill and this pattern for execution w/o outputs:

papermill {test_file_name.ipynb} /dev/null {optional args...} 

Perhaps we should document use-cases like these in the FAQ of nbclient. I think many people will see this repository "expecting" it to handle things like command-line execution, so we could point them towards papermill or other relevant repositories in those cases

@davidbrochart
Copy link
Member

I guess a small CLI wouldn't hurt? We could start with:

jupyter nbclient --execute my_notebook.ipynb

@MSeal
Copy link
Contributor

MSeal commented Jun 24, 2020

It kinda hurts, because we already have nbconvert and papermill CLIs for executing. If we did add one I'd want to mark the one in nbconvert as deprecated and warn users that CLI option is going away. I think this would be better overall for simplicity of tools, but I find removing things from nbconvert can be a hard sell for users. Thoughts if we went that direction?

@choldgraf
Copy link
Contributor Author

choldgraf commented Jun 24, 2020

I'd imagine a super simple execution CLI like

nbclient mynotebook.ipynb [-o output_ntbk.ipynb]

and that's it. For anything fancier, we would recommend that folks use papermill, and for execution in the context of conversion we either recommend nbconvert, or we find a way to chain execution with nbconvert so that nbconvert isn't responsible for that at all.

Whatever package does it, I think there should be a way to run <verb> <notebook> and have it execute the notebook. The mental model I assume many people have is python myfile.py...we should have something that is just as simple, and doesn't require people to know developer-specific terminology like /dev/null or lots of extra parameterization.

That said, in the meantime, I think we can still improve this by just adding documentation to answer the question "how do I quickly execute a notebook from the command line".

@MSeal
Copy link
Contributor

MSeal commented Jun 24, 2020

The thing I worry about is the 100 issues that arise with "can option A be added to nbclient CLI?" Makes it hard to keep by KISS when you add another non DRY path in the code.

Let's do

That said, in the meantime, I think we can still improve this by just adding documentation to answer the question "how do I quickly execute a notebook from the command line".

for sure.

@choldgraf
Copy link
Contributor Author

@MSeal yeah, I agree with that. I think if we add a CLI to nbclient, it should be explicitly restricted to the behavior that we expose in execute. I'd see this as "just" a command-line interface for the execute function. People will ask for extra parameters etc, and in those cases we suggest they use papermill and request them there.

Maybe it makes sense to use papermill for this, but I'm a bit worried about, e.g., a total newcomer who would be scared off by all of papermill's extra functionality (in the same way that they are scared off by nbconvert's extra functionality). It's why I think there's value in having something extremely simple as an option.

@davidbrochart
Copy link
Member

Just pointing to nbterm, which can execute notebooks from the command line.

@choldgraf
Copy link
Contributor Author

@davidbrochart that looks really cool! Though I don't see any documentation. Where should folks go to learn how to use nbterm to execute notebooks?

@davidbrochart
Copy link
Member

You're right, no documentation yet, it's very new! I will work on that soon, in the meantime there is the README 😄

@palewire
Copy link
Contributor

palewire commented Jul 13, 2021

I think the super simple proposals here sound great.

My take is that the term 'nbclient" is a little opaque. I think something that hooks into the jupyter namespace and uses a more direct verb like execute reads clearer and will be easier for noobs to grok.

Also, I think the starting expectation that this command would output anything is a hangover effect from a conversion tool, nbconvert, being the prior way to achieve this goal. In my view, the execution use case and the conversion use case are separate. They should have separate CLIs with different options.

Following on that, one questionable jupyterism I think a more narrowly tailored execution command might be able to solve is the practice of hiding any error tracebacks in the output file. In my view, a CLI user expects and deserves the traceback to surface into STDOUT and STDERR in the term.

So my first thought is that something like

jupyter execute [path]

is the best starting point, with no output option required to run it

@drscotthawley
Copy link

drscotthawley commented Sep 14, 2021

Found this issue from this forum post. This is not directly related to the CLI but to the question of "how do I run this notebook from the command line".

The solutions in the forum thread included

$ jupyter nbconvert --to notebook --execute mynotebook.ipynb

as a solution, but that doesn't print out any of the print statements that are in my notebook. I tried Papermill but it just generates a bunch of JSON without any of my print statements....so again if this is "executing" it's not obvious to this user.

So I'm just leaving this solution that serves my simple needs for now in case others come across this thread:

I defined a bash alias/function like so:

nbrun() { jupyter nbconvert --to script "$1"; cat "${1%.*}".py | grep -v get_ipython > run_this.py; python3 run_this.py;}

and then I just run

$ nbrun mynotebook.ipynb

The grep -v get_ipython is to strip out calls to !pip... Certainly one could do a better job and actually call pip as part of the bash script.. but for now this is all I need. Presumably a proper CLI would have more careful things involved, so like I say, this is just for other lost web-searchers, to tide us over. :-) Thanks for your hard work!

@palewire
Copy link
Contributor

Here's a start at it for discussion. What do you think @choldgraf and others?

#165

@drscotthawley
Copy link

Note also Jeremy Howard's new effort nbprocess may be of interest however it still very much under-construction. https://nbprocess.fast.ai/

@palewire
Copy link
Contributor

palewire commented Sep 27, 2021

Thanks for the tip @drscotthawley. I don't know how the rest of the crowd here would poll, but I'm still a strong believer that nblicent and notebook world is in desperate need of a simple CLI for running notebooks. This seems like the right place to provide it, IMHO.

@choldgraf
Copy link
Contributor Author

choldgraf commented Oct 2, 2021

Some quick thoughts now that I've looked at discussion in #165 as well:

I think @MSeal is saying that this package is meant to be a "back-end" package that is consumed by other tools (like nbconvert, papermill, etc). The design, dependencies, etc of this package are optimized for developer consumers rather than end-user consumers. So the challenge is that if we make this package more user-facing, then it will create a tension in design, maintenance, scope, etc. For example, as described here the title nbclient is not a very end-user-friendly name...most users don't know what "client" means, but that's OK because this package is meant for developers to consume.

I'm also thinking of comments like #165 (comment) - it feels like we are making a sub optimal choice there about CLI design just because nbclient uses traitlets.

In addition, if this library adds user-facing dependencies like rich or click, are tools that consume this library (like nbconvert) happy with that dependency? I think we'd need to be very disciplined to not scope-creep on features like this (whether in this repo or another), because making nbclient more user-facing will also attract different kinds of requests for improvements.

So, I wonder if a solution would be to create a lightweight CLI package that is primarily designed for end-users. It'd be called something like nbexecute and it could lean a bit more heavily into the user-facing dependencies and design (e.g., depend on things like click, rich, etc). Its job would be to use nbclient under the hood, and provide user-friendly APIs, CLIs, and documentation to make it really easy to use.

The thing that I worry about is that I suspect a natural evolution of such a CLI will become more and more like papermill over time. Right now we just want a simple CLI, but what happens when somebody opens a feature request to add a variable to the notebook?

I'm curious what @palewire considers to be the "overkill" aspects of papermill. Is it that the CLI feels too complex? Or like it also feels designed for conversation rather than execution? Or the fact that it's not a core Jupyter tool? Or the dependencies are too beefy? Or the documentation doesn't make this "simple" use-case obvious enough?

To be clear, I think that something like jupyter execute mynotebook.ipynb would be a super useful CLI to expose to users, my questions here are more around "where is the right place for the tool that does this job?"

@palewire
Copy link
Contributor

palewire commented Oct 2, 2021

Thanks for the thoughtful consideration, @choldgraf. Here's where I come from on papermill.

Distance from the core jupyter package is tough on newbies and novices. It's got a different brand name. It's not integrated with the core jupyter command. It requires users asking "How do I cron this thing?" to stumble around the web in confusion.

Papermill is presented and configured as a tool for super users. Below is a screenshot of the first use cases presented in the papermill README.

Screenshot from 2021-10-02 08-42-12

It then goes on to immediately describe how to output results to Amazon S3. Those cases are leagues beyond the use cases I have in mind, which are:

  • I have a web scraper I want to schedule to run once a day in my crontab
  • I have a GitHub Action or other CI workflow that needs to pull some data, process it with a notebook and commit every so often
  • I have a notebook to gather, transform and output data into a structure I can work with in a data visualization or publishing tool. I'd like to run it ad hoc from my shell or via a bash script or Makefile

In these cases, which I would wager are common, all the user needs to do is run a single notebook every so often. No parameters. No pipeline control maneuvers. No fancy outputting techniques. Just run a notebook. And if it crashes I get the errors spit back in my face right away. I think the goal of a tool like this should be to surface those simple options first.

One thing I like about nbclient and find admirable is that it takes a similar approach to the "back-end" of Python modules. So to me it seems like the natural place to bring it to the "front-end" of the command line as well.

This is beyond what we've discussed thus far, I think, but my view is that such a tool should also be packaged with the core packages installed by beginner users like jupyterlab so that with the single, initial install a user can run jupyter execute. If those master packages can run notebooks in your browser, I think the user's expectation is that they can run them in the shell too. I know it was mine. And in the little world where I work I've encountered at least a half dozen users who've shared the same expectation. Ultimately, I think that's the best outcome for the software. Where the CLI lives is less important to me.

@davidbrochart
Copy link
Member

So, I wonder if a solution would be to create a lightweight CLI package that is primarily designed for end-users. It'd be called something like nbexecute and it could lean a bit more heavily into the user-facing dependencies and design (e.g., depend on things like click, rich, etc). Its job would be to use nbclient under the hood, and provide user-friendly APIs, CLIs, and documentation to make it really easy to use.

Actually I was planning on doing almost that, by extracting out the execution logic of nbterm into its own library, but without relying on jupyter_client (as it's already the case in nbterm).

@choldgraf
Copy link
Contributor Author

Just a quick thought that all of @palewire's comments above make sense to me, I think those are reasonable concerns with the current state of tooling. There is not a "dead simple" way to executive Jupyter Notebooks from the CLI right now. I quite like the vision of pip install jupyter and immediately have access to verbs like jupyter execute mynotebook.ipynb. I'll think on it more but wanna leave space for the ideas of others as well!

@choldgraf
Copy link
Contributor Author

It sounds like there aren't really strong opinions that we shouldn't do this, so what do folks think about:

  • Getting Command-line interface: jupyter execute #165 ready to merge
  • In the documentation, mention that the CLI functionality here is meant to be relatively simple, and for more advanced use-cases, recommend tools like Papermill

@palewire
Copy link
Contributor

Sounds good to me. Happy to take on whatever else you'd like to see.

davidbrochart added a commit that referenced this issue Nov 2, 2021
* Added experimental CLI for #4

* Lint fix

* Lint fix

* black format

* again

* Switched to a traitlets style CLI draft

* Trim

* Trim

* Added more input options and a notebook that can be used to test error handling

* black

* lint

* More

* Trim

* docstring

* Fixed notebook variable name

* Added better error message when no notebooks provided

* Removed extra kwargs. Fixed something I broke

* Trim

* go

* Update cli.py

* Add newline at end of file

* Fix linter

* Typo

* Tweaked top of docs

* Docs

* Fix linting

Co-authored-by: David Brochart <david.brochart@gmail.com>
@palewire
Copy link
Contributor

palewire commented Nov 9, 2021

I think this ticket can be closed thanks to #165 being merged. One final thing I'm trying to push through on it: Adding nbclient docs to the main jupyter package's docs.

@davidbrochart
Copy link
Member

Thanks for your work @palewire !

@fperez
Copy link
Member

fperez commented Nov 10, 2021

BTW, sorry I missed this earlier! First, awesome job @palewire and team, thx Chris for opening up the initial issue!

Before this gains a lot of traction, I would suggest naming, or at least aliasing it, to run instead of execute. The reason is that IPython has had runsince basically forever (it was pretty much the raison d'être for IPython, absent notebooks), and unbeknownst to many, %run even recognizes notebooks:

image

I think it will be valuable for users to simply remember that 'jupyter run' runs notebooks, and in IPython the same command works the same.

Sorry for not having pitched earlier during the discussion, but I think it's worth considering.

@fperez
Copy link
Member

fperez commented Nov 10, 2021

The above reminds me I need to add that to the %run docs, that functionality is buried in the code but not obvious from the docs.

@choldgraf
Copy link
Contributor Author

I didn't know about that pattern in ipython - but now that i know about it I agree that we should follow precedent and name it run since it is basically the same functionality.

this makes me wonder what the ipython run verb does under the hood - does it have a notebook execution implementation inside of ipython?

@fperez
Copy link
Member

fperez commented Nov 10, 2021

Ask run??, dear @choldgraf and ye shall receive

image

We've had safe_execfile_ipy in there forever, and it knows how to run noetbooks. Again, ?? is your friend:

image

Which now makes me realize, in addition to documenting ipynb support in %run, we should add the --allow-errors flag there too for consistency.

@palewire
Copy link
Contributor

palewire commented Nov 10, 2021

I can make a patch renaming the subcommands to run, tho I'm tempted to leave an execute command in there with a Python deprecation warning since we've just blasted it a bit. But if you want to just pull the bandaid we can do that too.

@fperez
Copy link
Member

fperez commented Nov 10, 2021

@palewire had it been pushed out to pypi yet? If it was only the twitter blast, then I think it's fine to rename now - anyone running prod from git sources knows the game they're playing 🔥 :)

@palewire
Copy link
Contributor

It was out on PyPI, but only for a few days. My patch PR is in now for your review.

#173

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants