Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution dependency extension #1213

Merged
merged 3 commits into from Feb 3, 2018

Conversation

benelot
Copy link
Contributor

@benelot benelot commented Jan 29, 2018

I am implementing the extension for custom execution dependencies as discussed in #1193. It seems to work and properly resolves the dependency tree (it even handles circular deps in a simple way). However, in the end it collects all cells in the order found by the topological sort and tries to run them, but then the java script crashes/and the script starts to run anew.

DISCLAIMER: I have limited experience with JavaScript, so there might be quirky stuff done in code which I comes from my experience as a Java developer. But the problem does not seem to be in that part, but seems to be somewhere else.

Any idea what is wrong? Second question I have is if there is a neat way I can tell the user that a circular dependency is preventing the dependencies to run. For now it just prints to the console.

To test it, simply create two python cells and add a #A tag to the first and a =>A tag to the second. The run the second and look into the developer console. There you can see that it restarts the script after it prints "Execute cells.."

@benelot benelot changed the title [WIP] Execution dependency extension. [WIP] Execution dependency extension Jan 29, 2018
Copy link
Member

@jcb91 jcb91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to see proper graph resolution, although not the easiest to read 😆 I just pushed all dependencies not already in my execution list into it, after pushing in their dependencies. Simpler than general algorithms because we know what will end up last! No circular detection there though. Anyway, there's a bunch of comments in line

var orig_execute = codecell.CodeCell.prototype.execute; // get original cell execute function
CodeCell.prototype.execute = function (stop_on_error) {
var root_tags = this.metadata.tags || [];
if(root_tags != [] && root_tags.some(tag => /=>.*/.test(tag))) { // if the root cell contains any dependencies, resolve dependency tree...
Copy link
Member

@jcb91 jcb91 Jan 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javascript doesn't treat empty arrays as falsey, like python, and doesn't compare their contents, so [] != [] always evaluates to true. I suspect you wanted something like root_tags.length < 1?

Copy link
Contributor Author

@benelot benelot Jan 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure if this.metadata.tags would be [] or something like null/undefined. That is where my javascript knowledge ends. But I see that since I either have [...] or [], so it is defined and I can ask for its length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. this.metadata.tags may often be undefined, e.g. if no tags have been set. In fact, in a trickier world, it could be anything that'll fit into the notebook JSON, but in practice we can assume that it will be either an array of strings, or undefined (if it's not, something more significant has gone wrong with the notebook somewhere, and this extension breaking is likely to be the least of anyone's worries)

var tags = cell.metadata.tags || [];
var identities = tags.filter(tag => /#.*/.test(tag)).map(tag => tag.substring(1)); // ...get all identities and drop the #
if(cell === root_cell && !tags.some(tag => /#.*/.test(tag))) {
identities.push("DD27AE1D138027D0D7AB824FD0DDDC61367D5CCA4AAB42CE50840762B053764D"); // ...generate an id for the root cell for internal usage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than hard-coding, you could use root_cell.cell_id

identified_cells.forEach(function (cell) { // ...get all identified cells (the ones that have at least one #tag)
var tags = cell.metadata.tags || [];
var identities = tags.filter(tag => /#.*/.test(tag)).map(tag => tag.substring(1)); // ...get all identities and drop the #
if(cell === root_cell && !tags.some(tag => /#.*/.test(tag))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than testing !tags.some(... you can use identities.length < 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might also be worth renaming identities to something like required_ids for better readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single cell can have multiple identities, that is why it is called identities. I do not see why they would be required_ids. Below I just add an id, because the root cell does not need an id since it could be the one that is independent (the user is just working on that one and would not understand why it needs an id to run.).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies, I misunderstood, for some reason I thought this was a list of dependencies' ids. No rename necessary 👍


var deps = tags.filter(tag => /=>.*/.test(tag)).map(tag => tag.substring(2)); // ...get all dependencies and drop the =>
identities.forEach(function (id) {
cell_map[id] = cell;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line implicitly means only a single cell can have each #some_id tag. Is this what you want? It could make sense to have a few cells with the same tag, so do something like

cell_map[id] = cell_map[id] || [];
cell_map[id].push(cell);

Alternatively, if you don't intend to support this, at least let the user know that adding multiple cells with the same identifier tag won't work correctly in the readme.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as far as I thought about this, I wanted to work with unique ids, so one id for one cell. In the one id for multiple cells case, I would not be sure what is the right order to execute them and even if they are all independent from each other, they might have different dependencies, so I would have to account for each of them separately. But you gave me a nice idea. Is the cell_id you mentioned above unique within the notebook's context? Then I could use that for uniqueness and allow multiple cells to have the same hashtag id. Maybe I will get to this in a later version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cell_id you mentioned above unique within the notebook's context? Then I could use that for uniqueness and allow multiple cells to have the same hashtag id.

Yes, it should be unique for the current session. See notebook/static/notebook/js/cell.js#L101 for where it gets assigned, and notebook/static/base/js/utils.js#L206-L220 for the implementation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not be sure what is the right order to execute them

I think it would be reasonable to assume they ought to be executed in the order in which they appear in the notebook?

they might have different dependencies, so I would have to account for each of them separately

sort of, or they can be treated as essentially an aggregate (dependencies of this tag are simply the collected dependencies of all its constituent cells).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds both reasonable. I will look into this as soon as the rest works.

}

if(processed_nodes >= Object.keys(dep_graph).length) {
console.error('There is a circular dependency in your execute dependencies!');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could do an alert here, or alternatively use the notebook dialog for something prettier. See

if (!Jupyter.notebook.trusted) {
dialog.modal({
title : 'Initialization cells in untrusted notebook',
body : 'This notebook is not trusted, so initialization cells will not be automatically run on kernel load. You can still run them manually, though.',
buttons: {'OK': {'class' : 'btn-primary'}},
notebook: Jupyter.notebook,
keyboard_manager: Jupyter.keyboard_manager,
});
return;
}
for an example usage.

In fact, it probably makes sense for this extension to refuse to execute dependencies in untrusted notebooks in the same way that init_cell does...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, I will add it.

console.log("Map processing order to cells...", processing_order)
var dependency_cells = processing_order.map(id =>cell_map[id]); // ...get dependent cells by their id
console.log("Execute cells..", dependency_cells)
dependency_cells.forEach(function (cell) { cell.execute(stop_on_error); }); // ...execute all dependent cells in order
Copy link
Member

@jcb91 jcb91 Jan 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be the main problem you're seeing. Since you've called cell.execute, this will call your patched version, so look for dependencies again, etc. Instead, you should almost certainly be doing

dependency_cells.forEach(cell => orig_execute.call(cell, stop_on_error));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that is how this works! Thanks for pointing it out to me!

}
orig_execute.call(this, stop_on_error); // execute original cell execute function
};
console.log('[exec_deps] loaded');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should use the updated name of execution_dependencies


Writing extensive notebooks can become very complicated since many cells act as stepping stones to produce intermediate results for later cells. Thus, it becomes tedious to
keep track of the cells that have to be run in order to run a certain cell. This extension simplifies handling the execution dependencies by introducing tag annotations to
identify each cell and indicate a dependency on others. This improves on the current state which requires remembering all dependencies by heart or annotating the cells in the comments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth noting that dependencies are definitely executed, rather than say only being executed once per kernel session. This may be important for cells which take a long time to execute...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I should add that. Thanks for reviewing my code, I am very grateful for that! I really like jupyter notebooks and I am very interested in contributing more extensions in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, happy to help 😄 And of course, we'd be happy to have anything you think might be useful to include here 😉

@benelot
Copy link
Contributor Author

benelot commented Jan 30, 2018

One more thing, how do I know about compatibility? I currently have all compatibilities in, but all I know is that it is compatible with 5.x.

@benelot benelot changed the title [WIP] Execution dependency extension Execution dependency extension Jan 30, 2018
@benelot benelot changed the title Execution dependency extension [WIP] Execution dependency extension Jan 30, 2018
@benelot
Copy link
Contributor Author

benelot commented Jan 30, 2018

It does not work yet on my larger example. This needs some more time.

@jcb91
Copy link
Member

jcb91 commented Jan 30, 2018

For compatibility, I just check whether the functions I'm calling exist in the notebook of the given version (4.0.0 for 4.x, etc). You can do this quite conveniently on github by looking at the notebook repo's tags, e.g. https://github.com/jupyter/notebook/tree/4.0.0/notebook/static. In general though, this nbextension isn't using too many fancy notebook methods that have changed recently, like multiple selections, so I'd guess it's compatible with 4.x & 5.x. Probably the most likely thing to break 3.x might be the dialog but even that may be fine. However, I wouldn't worry much about 3.x because I haven't heard of anyone using it for quite some time now...

It does not work yet on my larger example. This needs some more time.

Sure, will hold off merge till you're ready

@benelot benelot force-pushed the execution_dependencies branch 8 times, most recently from 25e02f0 to 8ab463d Compare January 31, 2018 10:34
@benelot benelot changed the title [WIP] Execution dependency extension Execution dependency extension Jan 31, 2018
@benelot
Copy link
Contributor Author

benelot commented Jan 31, 2018

Fixed some problems with processing if the dependencies are only a subgraph within many completely unrelated cells.

I will keep on my roadmap:

  • One #tag for multiple cells
  • Execution cells as dependencies only once per kernel session (option switchable to force reexecution of cells).

@benelot benelot changed the title Execution dependency extension [WIP ]Execution dependency extension Jan 31, 2018
@benelot benelot changed the title [WIP ]Execution dependency extension [WIP] Execution dependency extension Jan 31, 2018
@benelot benelot force-pushed the execution_dependencies branch 9 times, most recently from 2a18c0f to 59b6a9d Compare January 31, 2018 12:18
@benelot benelot changed the title [WIP] Execution dependency extension Execution dependency extension Jan 31, 2018
@benelot
Copy link
Contributor Author

benelot commented Jan 31, 2018

Looks like it works now! I used it for the rest of the day and it did not break. Ready to be pulled!

@jcb91
Copy link
Member

jcb91 commented Jan 31, 2018

Great, thanks. Could you add a note to the changelog under the github master section, just noting the new extension? Apologies, I should have mentioned this before 🤦‍♂️

@benelot
Copy link
Contributor Author

benelot commented Feb 1, 2018

Of course I can.

@benelot
Copy link
Contributor Author

benelot commented Feb 1, 2018

There you go. Btw I am working on the improvement to make the dependent cell only if it has not been run before or if it has been modified since its last run. Are there any flags that tell me (if/the last time) a cell has been run and (if/the last time) it has been modified? Maybe I am looking in the wrong places because I can not find it.

I will open a new pull request as soon as I am ready with the new changes.

Edit: Also, the Changelog's pull request links should be updated, they are all invalid except for mine. They are missing the repository name.
EditEdit: It seems that the build failed because my link is invalid even though it is the only one that works. Or is it only me?

P.S. I made a contribution page for you because I liked the way things were going with my first contribution. Maybe have a review on it to see if that is how you like your contributions: Create A new Notebook Extension

@jcb91
Copy link
Member

jcb91 commented Feb 1, 2018

Thanks :)

Are there any flags that tell me (if/the last time) a cell has been run and (if/the last time) it has been modified?

No, there aren't in the main notebook. There's the ExecuteTime nbextension, but it doesn't record modifications, doesn't keep track of which kernel was running, and only takes into account the current browser frontend. The lack of such a feature is, I think at least in part, by design. The kernel is intended to be frontend-agnositc. That is to say, it should not care (or expect to be able to tell) whether it is being sent commands by a notebook, a terminal, a qt console, your own custom client, whatever. As a result, the kernel has no notion of cells (although I guess one could consider each ipython In element a cell, but that's language-dependent, and getting off-topic). In addition, since the frontend doesn't know what's happened on this kernel in the past, it will probably only be able to tell what has/hasn't been executed since the page loaded (which may be a long time after kernel creation, since the page can be refreshed). I suppose one could implement local storage or metadata structures to get around this, but that might be more effort than is really worth it.

However, in many (most?) cases, people are using kernels which are only connected to a single notebook frontend, and in such cases, we could get away with a relatively simple model where the nbextension could record cells' ids as they are executed, removes them from the executed list if they get modified or the kernel restarts, and therefore check whether they need re-executing, say if they use -> to specify a dependency rather than =>. This would err on the side of re-executing if in doubt, since it would 'forget' cells had been executed whenever the page was refreshed. Make sense? Maybe it's not worth implementing imperfectly in the frontend, and better to just get people to do the checking in the cells' kernel code.

Also, the Changelog's pull request links should be updated, they are all invalid except for mine. They are missing the repository name.

🤦‍♂️ d'oh I just fixed this for the links for the 3.x releases, then clearly managed to make the same mistake again :( thanks for pointing this out 👍, I'll add a PR to fix these.

It seems that the build failed because my link is invalid even though it is the only one that works. Or is it only me?

Hmm. The lint build seems to fail because isort has changed its mind about ordering imports again. That can be safely ignored. The docs build seems to fail on a bunch of links which look like they should be fine, so I've restarted it to check. I don't think it's anything to do with your edit, which looks fine to me.

P.S. I made a contribution page for you because I liked the way things were going with my first contribution. Maybe have a review on it to see if that is how you like your contributions: Create A new Notebook Extension

I guess you mean this to be a CONTRIBUTING.md file? Maybe replace "I" with your handle @benelot for clarity? Otherwise, sure, sounds like a good plan to me. I have a feeling we maybe once said we adhere to the ipython contributing guidelines, but I forget where I read that, or what they might be. At any rate, we can always update the file later as necessary...

var processing_queue = [root_cell_id];
var processed_nodes = 0;
var in_degree = {}; // ...collect in-degrees of nodes
while(processing_queue.length > 0 && processed_nodes < Object.keys(dep_graph).length) {// ...stay processing deps while the queue contains nodes and the processed nodes are below total node quantity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loop never increments processed_nodes, so gets stuck for circular dependencies...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to the code. Ready for merging.

@jcb91 jcb91 merged commit a698bbf into ipython-contrib:master Feb 3, 2018
@benelot
Copy link
Contributor Author

benelot commented Feb 3, 2018

Yeah!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants