Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not getting a CellExecutionError raised with %run commands #48

Closed
palewire opened this issue May 7, 2020 · 12 comments
Closed

Not getting a CellExecutionError raised with %run commands #48

palewire opened this issue May 7, 2020 · 12 comments

Comments

@palewire
Copy link
Contributor

palewire commented May 7, 2020

I follow the practice of creating a master notebook that runs all the notebooks in my pipeline via the %run magic command. When I execute that notebook with nbclient, it fails to raise errors when they happen in the sub-notebooks. This is, for obvious reasons, not good. Is there a way to address this?

@MSeal
Copy link
Contributor

MSeal commented May 7, 2020

What's the specific pattern you used with %run? I usually use %%bash when I an running a process magic as that magic was updated to fail on non-zero status codes.

In this case since you're in a python process already, Why not use https://github.com/jupyter/nbclient/blob/master/nbclient/client.py#L839 directly in your master notebook?

Also if you're not aware, papermill is a little more opinionated for pipeline work (has parameterization and input / output notebook isolation / CLI) and uses nbclient under the hood.

@palewire
Copy link
Contributor Author

palewire commented May 7, 2020

I am doing it like:

%run ./folder/notebook.ipynb

I'd rather not introduce nbclient into a notebook, where the magic command works just fine. In my opinion, nbclient should "just work" in these circumstances and run all notebooks from the terminal as I would expect them to in the browser.

My temporary solution has been to recreate the master notebook's manifest of sub notebooks in my nbclient shell script so that it reproduces what's being done with %run. This requires me to keep the manifest list in two different places, but at least I see the errors now.

@choldgraf
Copy link
Contributor

choldgraf commented May 7, 2020

Just to add a bit more context, I tried re-creating and:

If I have three noteboks:

  • the first raises an error
  • the second runs the first either with %run or with nbclient
  • the third runs the second with nbclient

So the chain is always triggered by running nbclient from the third notebook.

In the second notebook:

  • If it runs execution with nbclient.execute then an error is raised (a CellExecutionError) and the command fails
  • If it runs with %run then the error is still raised (in this case, whatever error the first notebook raised, not CellExecutionError) and is placed in the offending cell's output, but the main execution process does not fail

@palewire
Copy link
Contributor Author

palewire commented May 7, 2020

It's that last bullet point where my use case falls that I think you should address.

Here's my workflow where it comes up.

We have a closed-source notebook pipeline behind datadesk/california-coronavirus-data that downloads several dozen data sources from Google Sheets, GitHub, open data portals, etc., and then processes it to go on our live page on latimes.com. Each download routine and each process routine is a separate notebook. In total, it's more than 20 notebook files.

There are three master notebooks (download, process and optimize) that round all the notebooks into straightforward routines. They include documentation explaining what each notebook does.

Those three master notebooks are run manually five times a day (at least) after we finish a run manually collecting data from 61 state agencies across the state. (I just finished the 9 a.m. run myself before writing this message.)

To simplify things, the entrant is asked to run a single make command that triggers Python scripts using nbclient that execute the master notebooks in the proper order. At the end, the files are proofed on a staging version of the website and sent live.

Without the %run errors being thrown, a download or processing step can fail and there is no notification to the user at all.

@choldgraf
Copy link
Contributor

Thanks for the clarification @palewire - two quick thoughts - one on-topic and one off-topic:

Those three master notebooks are run manually five times a day (at least) after we finish a run manually collecting data from 61 state agencies across the state. (I just finished the 9 a.m. run myself before writing this message.)

I think that's a very reasonable workflow (it sounds quite cool!). However the feeling I get is this is precisely what papermill was designed for. My intuition is that the "magic" commands are all tailored for an interactive session, and so in general they may break in unexpected ways when use programmatically. Is accomplishing the same thing with papermill or with nbclient a non-starter for you? I think in general it is not good practice to include magic commands in production (but that's just my opinion).

that I think you should address

Just a friendly open sourcey note here - that "you" can also be you ;-) this is a 100% volunteer project and contributions are most-welcome. I try to use passive tense in these kinds of statements: "this is an issue I think should be addressed" rather than "this is an issue I think you should address". Don't forget we're all doing this in our spare time. (apologies if this comes across as super pedantic and condescending, I promise I don't mean it that way)

@palewire
Copy link
Contributor Author

palewire commented May 7, 2020

I definitely appreciate your effort here and didn't mean to offend. I am trying to articulate my problem and opinion about the issue as clearly as I can, not attack anyone.

So is it the case that the %run command can fail without raising the standard error code to pass along to nbclient? If that is the case, that seems like the root bug here. I suppose there must be some reason for that but it clearly contradicts the Zen of Python, which states, "Errors should never pass silently."

@choldgraf
Copy link
Contributor

choldgraf commented May 7, 2020

I definitely appreciate your effort here and didn't mean to offend. I am trying to articulate my problem and opinion about the issue as clearly as I can, not attack anyone.

No offense taken - and I appreciate you taking the time to explain further, maybe I am just being pedantic after all 😬

So is it the case that the %run command can fail without raising the standard error code to pass along to nbclient

Ahh - your point made me try out something new which makes me think that you are right:

If I run %run mybrokencode.ipynb in the notebook it shows an error. However, in Jupyter a "proper error" will stop all of the cells from being executed if I hit "run all". If I hit "run all" in the same notebook, then while the %run command does show an error, all of the other cells still get executed.

e.g. hitting restart and run-all with %run gives me this:

image

while raising a "proper" error gives this:

image

So this makes me think that IPython is just displaying an error, rather than actually raising an error.

@MSeal
Copy link
Contributor

MSeal commented May 7, 2020

@palewire I agree that %run should fail when an exception / non-zero exit code occurs. I pushed for make a similar change for %%bash a while back but didn't realize %run has the same issue. I can help make the issue and point to the code in IPython that would need to be changed.

That being said, I definitely second @choldgraf 's point that papermill was made for running these -- it lets you parameterize a single notebook (say with the state your calculating against as an input) so you only need one notebook you can run multiple times with different state.

@choldgraf
Copy link
Contributor

also cc @Carreau who I believe may have more cycles for ipython these days, and may be interested in this issue?

@palewire
Copy link
Contributor Author

palewire commented May 9, 2020

I started a modest PR on their end. ipython/ipython#12301

@palewire
Copy link
Contributor Author

palewire commented Jun 1, 2020

The patch we devised here has been merged into the IPython trunk.

@palewire palewire closed this as completed Jun 1, 2020
@choldgraf
Copy link
Contributor

yahooo! @palewire thanks so much for upstreaming this one ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants