Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combined run_cell and execute_cell #12

Merged
merged 1 commit into from
Feb 11, 2020
Merged

Conversation

MSeal
Copy link
Contributor

@MSeal MSeal commented Feb 4, 2020

Addresses #11
Changed the default for store_history to False (was intended to be changed with nbconvert 6.0). In doing so I found an issue with ipython kernel execution, where execution counts were not tracked when history is disabled. I've made nbclient manage execution_count itself by default now with the ability to control the behavior in execute_cell.
Also removed extra debug prints left in tests

@MSeal MSeal requested a review from choldgraf February 4, 2020 05:51
@MSeal
Copy link
Contributor Author

MSeal commented Feb 4, 2020

I would like to break execute_cell into smaller parts, but given it's complexity from nbconvert and all the tracking it does I wait to tackle that later on.

@MSeal
Copy link
Contributor Author

MSeal commented Feb 5, 2020

@choldgraf if you wouldn't mind taking a look at this one, I'd like to merge it before the initial release and before more PRs get made that'd be in conflict.

@choldgraf
Copy link
Contributor

Hey, will try to get to it soon. Sorry, I just finished a flight to Australia 🇦🇺

@MSeal
Copy link
Contributor Author

MSeal commented Feb 5, 2020

No worries, I really appreciate all the reviews you've been doing. If you can't get to it for a bit that's fine too.

Copy link
Contributor

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks pretty reasonable to me - what was the differing purposes of run_cell and execute_cell in the first place? I had one question about execute_cells args, but other than that this looks like a nice simplification to me

if execution_count:
cell['execution_count'] = execution_count
self._check_raise_for_error(cell, exec_reply)
self.nb['cells'][cell_index] = cell
Copy link
Contributor

@choldgraf choldgraf Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we've already got access to the notebook in self.nb, couldn't this method be simplified by simply providing a cell_index rather than cell and cell_index? Then the first thing that could be done is running

cell = self.nb['cells'][cell_index] 

Copy link
Contributor

@mgeier mgeier Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we've already got access to the notebook in self.nb

I think that's actually a bad thing.
The notebook should not be stored in the executor!

This mistake was made in nbconvert, where the notebook was passed to the preprocess() method and subsequently stored in the executor:

https://github.com/jupyter/nbconvert/blob/8c9bffc8deb65ced99b20684ddd148fd885ad9e8/nbconvert/preprocessors/execute.py#L404

A later preprocess() call could overwrite the locally stored notebook.
Therefore, the preprocess() call was not re-entrant.

I've already mentioned this problem in nbconvert: jupyter/nbconvert#886

During the move to nbclient, the situation seems to have gotten worse!

Now the notebook seems to be passed to the constructor of the executor, which means one executor instance can only ever execute a single notebook.
Is that really what you want?

To me, it would make much more sense if one executor instance could execute multiple notebooks in a row. Ideally also concurrently.

[UPDATE: I've created a separate issue for this: #18]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the notebook seems to be passed to the constructor of the executor, which means one executor instance can only ever execute a single notebook.
Is that really what you want?

Yes. There's a lot of state it holds for this particular execution and all the configuration for the execution means that having a owning object makes it a lot easier to manage this state and object manipulation. Otherwise we would have to pass many many arguments to every function, making the function signatures unwieldy.

A later preprocess() call could overwrite the locally stored notebook.
Therefore, the preprocess() call was not re-entrant.
...

During the move to nbclient, the situation seems to have gotten worse!

The issue in nbconvert was that it was half-way between abstractions. There were some functions that were reusable and some that were not without a new object. This class should now be re-entrant on execution methods at the top. .execute called twice will execute the whole notebook twice without having prior run state cause issues (I should add a test for this). I don't think it's "worse", it just more an object oriented approach than a functional one. in this case with lots of state and configuration object oriented has a lot of benefits over functional.

The rename that merged for changing to NotebookClient should help with the abstraction of concerns. Specifically it's denoting it acts on behalf of a notebook, meaning it should hold the state of the said notebook.

To me, it would make much more sense if one executor instance could execute multiple notebooks in a row. Ideally also concurrently.

I didn't design for this pattern. We could have designed for that but I choose to keep it closer to how the code was before to reduce the number of changes -- also because I'm not sure that's strictly a better model. You do have the execute method for repeating calls to the client object to execute everything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't design for this pattern. We could have designed for that but I choose to keep it closer to how the code was before to reduce the number of changes -- also because I'm not sure that's strictly a better model.

This is pragmatic. I'd like to see the code evolve as well to be more functional, but at the same time we're still in this annoying mess of very class oriented dynamic traitlets for configuration in Jupyterland.

@MSeal
Copy link
Contributor Author

MSeal commented Feb 10, 2020

@choldgraf

This all looks pretty reasonable to me - what was the differing purposes of run_cell and execute_cell in the first place? I had one question about execute_cells args, but other than that this looks like a nice simplification to me

Mostly an artifact of organic growth of the code in nbconvert I think. There were competing thoughts of how the interface should operate that weren't reconciled. One of the two was doing some state prep before it did an almost stateless execution call. Not necessary at this point imo.

@rgbkrk
Copy link
Member

rgbkrk commented Feb 10, 2020

I think after the conflict is cleaned up you should go ahead with this simplification before taking it even further for a refactor. 👍

@choldgraf
Copy link
Contributor

I agree w/ @rgbkrk - we can tidy-up and extend the package over time, but I don't think any of that should block on an initial release, and this is a clear improvement to me!

@MSeal MSeal merged commit 8d5fc8c into jupyter:master Feb 11, 2020
@choldgraf
Copy link
Contributor

choldgraf commented Feb 11, 2020

Woo 🎉

@MSeal MSeal deleted the execute_cell branch February 11, 2020 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants