Execution dependency extension #1213

benelot · 2018-01-29T14:44:07Z

I am implementing the extension for custom execution dependencies as discussed in #1193. It seems to work and properly resolves the dependency tree (it even handles circular deps in a simple way). However, in the end it collects all cells in the order found by the topological sort and tries to run them, but then the java script crashes/and the script starts to run anew.

DISCLAIMER: I have limited experience with JavaScript, so there might be quirky stuff done in code which I comes from my experience as a Java developer. But the problem does not seem to be in that part, but seems to be somewhere else.

Any idea what is wrong? Second question I have is if there is a neat way I can tell the user that a circular dependency is preventing the dependencies to run. For now it just prints to the console.

To test it, simply create two python cells and add a #A tag to the first and a =>A tag to the second. The run the second and look into the developer console. There you can see that it restarts the script after it prints "Execute cells.."

jcb91

Interesting to see proper graph resolution, although not the easiest to read 😆 I just pushed all dependencies not already in my execution list into it, after pushing in their dependencies. Simpler than general algorithms because we know what will end up last! No circular detection there though. Anyway, there's a bunch of comments in line

jcb91 · 2018-01-29T17:46:31Z

src/jupyter_contrib_nbextensions/nbextensions/execution_dependencies/execution_dependencies.js

+            var orig_execute = codecell.CodeCell.prototype.execute;                                                 // get original cell execute function
+            CodeCell.prototype.execute = function (stop_on_error) {
+                var root_tags = this.metadata.tags || [];
+                if(root_tags != [] && root_tags.some(tag => /=>.*/.test(tag))) {                                    // if the root cell contains any dependencies, resolve dependency tree...


javascript doesn't treat empty arrays as falsey, like python, and doesn't compare their contents, so [] != [] always evaluates to true. I suspect you wanted something like root_tags.length < 1?

I was not sure if this.metadata.tags would be [] or something like null/undefined. That is where my javascript knowledge ends. But I see that since I either have [...] or [], so it is defined and I can ask for its length.

Exactly. this.metadata.tags may often be undefined, e.g. if no tags have been set. In fact, in a trickier world, it could be anything that'll fit into the notebook JSON, but in practice we can assume that it will be either an array of strings, or undefined (if it's not, something more significant has gone wrong with the notebook somewhere, and this extension breaking is likely to be the least of anyone's worries)

jcb91 · 2018-01-29T17:58:00Z

src/jupyter_contrib_nbextensions/nbextensions/execution_dependencies/execution_dependencies.js

+                        var tags = cell.metadata.tags || [];
+                        var identities = tags.filter(tag => /#.*/.test(tag)).map(tag => tag.substring(1));          // ...get all identities and drop the #
+                        if(cell === root_cell && !tags.some(tag => /#.*/.test(tag))) {
+                            identities.push("DD27AE1D138027D0D7AB824FD0DDDC61367D5CCA4AAB42CE50840762B053764D");    // ...generate an id for the root cell for internal usage


rather than hard-coding, you could use root_cell.cell_id

jcb91 · 2018-01-29T17:59:01Z

src/jupyter_contrib_nbextensions/nbextensions/execution_dependencies/execution_dependencies.js

+                    identified_cells.forEach(function (cell) {                                                      // ...get all identified cells (the ones that have at least one #tag)
+                        var tags = cell.metadata.tags || [];
+                        var identities = tags.filter(tag => /#.*/.test(tag)).map(tag => tag.substring(1));          // ...get all identities and drop the #
+                        if(cell === root_cell && !tags.some(tag => /#.*/.test(tag))) {


rather than testing !tags.some(... you can use identities.length < 1

might also be worth renaming identities to something like required_ids for better readability

A single cell can have multiple identities, that is why it is called identities. I do not see why they would be required_ids. Below I just add an id, because the root cell does not need an id since it could be the one that is independent (the user is just working on that one and would not understand why it needs an id to run.).

apologies, I misunderstood, for some reason I thought this was a list of dependencies' ids. No rename necessary 👍

jcb91 · 2018-01-29T18:04:07Z

src/jupyter_contrib_nbextensions/nbextensions/execution_dependencies/execution_dependencies.js

+
+                        var deps = tags.filter(tag => /=>.*/.test(tag)).map(tag => tag.substring(2));               // ...get all dependencies and drop the =>
+                        identities.forEach(function (id) {
+                            cell_map[id] = cell;


this line implicitly means only a single cell can have each #some_id tag. Is this what you want? It could make sense to have a few cells with the same tag, so do something like

cell_map[id] = cell_map[id] || []; cell_map[id].push(cell);

Alternatively, if you don't intend to support this, at least let the user know that adding multiple cells with the same identifier tag won't work correctly in the readme.

Yeah, as far as I thought about this, I wanted to work with unique ids, so one id for one cell. In the one id for multiple cells case, I would not be sure what is the right order to execute them and even if they are all independent from each other, they might have different dependencies, so I would have to account for each of them separately. But you gave me a nice idea. Is the cell_id you mentioned above unique within the notebook's context? Then I could use that for uniqueness and allow multiple cells to have the same hashtag id. Maybe I will get to this in a later version.

Is the cell_id you mentioned above unique within the notebook's context? Then I could use that for uniqueness and allow multiple cells to have the same hashtag id.

Yes, it should be unique for the current session. See notebook/static/notebook/js/cell.js#L101 for where it gets assigned, and notebook/static/base/js/utils.js#L206-L220 for the implementation

I would not be sure what is the right order to execute them

I think it would be reasonable to assume they ought to be executed in the order in which they appear in the notebook?

they might have different dependencies, so I would have to account for each of them separately

sort of, or they can be treated as essentially an aggregate (dependencies of this tag are simply the collected dependencies of all its constituent cells).

Sounds both reasonable. I will look into this as soon as the rest works.

jcb91 · 2018-01-29T18:08:48Z

src/jupyter_contrib_nbextensions/nbextensions/execution_dependencies/execution_dependencies.js

+                    }
+
+                    if(processed_nodes >= Object.keys(dep_graph).length) {
+                        console.error('There is a circular dependency in your execute dependencies!');


you could do an alert here, or alternatively use the notebook dialog for something prettier. See

jupyter_contrib_nbextensions/src/jupyter_contrib_nbextensions/nbextensions/init_cell/main.js

Lines 106 to 115 in a4544ec

if (!Jupyter.notebook.trusted) {

dialog.modal({

title : 'Initialization cells in untrusted notebook',

body : 'This notebook is not trusted, so initialization cells will not be automatically run on kernel load. You can still run them manually, though.',

buttons: {'OK': {'class' : 'btn-primary'}},

notebook: Jupyter.notebook,

keyboard_manager: Jupyter.keyboard_manager,

});

return;

}

for an example usage.

In fact, it probably makes sense for this extension to refuse to execute dependencies in untrusted notebooks in the same way that init_cell does...

Sounds great, I will add it.

jcb91 · 2018-01-29T18:13:38Z

src/jupyter_contrib_nbextensions/nbextensions/execution_dependencies/execution_dependencies.js

+                        console.log("Map processing order to cells...", processing_order)
+                        var dependency_cells = processing_order.map(id =>cell_map[id]);                             // ...get dependent cells by their id
+                        console.log("Execute cells..", dependency_cells)
+                        dependency_cells.forEach(function (cell) { cell.execute(stop_on_error); });                 // ...execute all dependent cells in order


I think this may be the main problem you're seeing. Since you've called cell.execute, this will call your patched version, so look for dependencies again, etc. Instead, you should almost certainly be doing

dependency_cells.forEach(cell => orig_execute.call(cell, stop_on_error));

Oh, that is how this works! Thanks for pointing it out to me!

jcb91 · 2018-01-29T18:14:16Z

src/jupyter_contrib_nbextensions/nbextensions/execution_dependencies/execution_dependencies.js

+                }
+                orig_execute.call(this, stop_on_error);                                                             // execute original cell execute function
+            };
+            console.log('[exec_deps] loaded');


this should use the updated name of execution_dependencies

jcb91 · 2018-01-29T18:16:34Z

src/jupyter_contrib_nbextensions/nbextensions/execution_dependencies/README.md

+
+Writing extensive notebooks can become very complicated since many cells act as stepping stones to produce intermediate results for later cells. Thus, it becomes tedious to
+keep track of the cells that have to be run in order to run a certain cell. This extension simplifies handling the execution dependencies by introducing tag annotations to
+identify each cell and indicate a dependency on others. This improves on the current state which requires remembering all dependencies by heart or annotating the cells in the comments.


might be worth noting that dependencies are definitely executed, rather than say only being executed once per kernel session. This may be important for cells which take a long time to execute...

True, I should add that. Thanks for reviewing my code, I am very grateful for that! I really like jupyter notebooks and I am very interested in contributing more extensions in the future.

No problem, happy to help 😄 And of course, we'd be happy to have anything you think might be useful to include here 😉

benelot · 2018-01-30T07:22:54Z

One more thing, how do I know about compatibility? I currently have all compatibilities in, but all I know is that it is compatible with 5.x.

benelot · 2018-01-30T10:16:10Z

It does not work yet on my larger example. This needs some more time.

jcb91 · 2018-01-30T11:23:23Z

For compatibility, I just check whether the functions I'm calling exist in the notebook of the given version (4.0.0 for 4.x, etc). You can do this quite conveniently on github by looking at the notebook repo's tags, e.g. https://github.com/jupyter/notebook/tree/4.0.0/notebook/static. In general though, this nbextension isn't using too many fancy notebook methods that have changed recently, like multiple selections, so I'd guess it's compatible with 4.x & 5.x. Probably the most likely thing to break 3.x might be the dialog but even that may be fine. However, I wouldn't worry much about 3.x because I haven't heard of anyone using it for quite some time now...

It does not work yet on my larger example. This needs some more time.

Sure, will hold off merge till you're ready

benelot · 2018-01-31T10:47:14Z

Fixed some problems with processing if the dependencies are only a subgraph within many completely unrelated cells.

I will keep on my roadmap:

One #tag for multiple cells
Execution cells as dependencies only once per kernel session (option switchable to force reexecution of cells).

benelot · 2018-01-31T17:40:37Z

Looks like it works now! I used it for the rest of the day and it did not break. Ready to be pulled!

jcb91 · 2018-01-31T21:51:15Z

Great, thanks. Could you add a note to the changelog under the github master section, just noting the new extension? Apologies, I should have mentioned this before 🤦‍♂️

benelot · 2018-02-01T08:00:29Z

Of course I can.

benelot · 2018-02-01T08:44:32Z

There you go. Btw I am working on the improvement to make the dependent cell only if it has not been run before or if it has been modified since its last run. Are there any flags that tell me (if/the last time) a cell has been run and (if/the last time) it has been modified? Maybe I am looking in the wrong places because I can not find it.

I will open a new pull request as soon as I am ready with the new changes.

Edit: Also, the Changelog's pull request links should be updated, they are all invalid except for mine. They are missing the repository name.
EditEdit: It seems that the build failed because my link is invalid even though it is the only one that works. Or is it only me?

P.S. I made a contribution page for you because I liked the way things were going with my first contribution. Maybe have a review on it to see if that is how you like your contributions: Create A new Notebook Extension

jcb91 · 2018-02-01T19:17:46Z

Thanks :)

Are there any flags that tell me (if/the last time) a cell has been run and (if/the last time) it has been modified?

No, there aren't in the main notebook. There's the ExecuteTime nbextension, but it doesn't record modifications, doesn't keep track of which kernel was running, and only takes into account the current browser frontend. The lack of such a feature is, I think at least in part, by design. The kernel is intended to be frontend-agnositc. That is to say, it should not care (or expect to be able to tell) whether it is being sent commands by a notebook, a terminal, a qt console, your own custom client, whatever. As a result, the kernel has no notion of cells (although I guess one could consider each ipython In element a cell, but that's language-dependent, and getting off-topic). In addition, since the frontend doesn't know what's happened on this kernel in the past, it will probably only be able to tell what has/hasn't been executed since the page loaded (which may be a long time after kernel creation, since the page can be refreshed). I suppose one could implement local storage or metadata structures to get around this, but that might be more effort than is really worth it.

However, in many (most?) cases, people are using kernels which are only connected to a single notebook frontend, and in such cases, we could get away with a relatively simple model where the nbextension could record cells' ids as they are executed, removes them from the executed list if they get modified or the kernel restarts, and therefore check whether they need re-executing, say if they use -> to specify a dependency rather than =>. This would err on the side of re-executing if in doubt, since it would 'forget' cells had been executed whenever the page was refreshed. Make sense? Maybe it's not worth implementing imperfectly in the frontend, and better to just get people to do the checking in the cells' kernel code.

Also, the Changelog's pull request links should be updated, they are all invalid except for mine. They are missing the repository name.

🤦‍♂️ d'oh I just fixed this for the links for the 3.x releases, then clearly managed to make the same mistake again :( thanks for pointing this out 👍, I'll add a PR to fix these.

It seems that the build failed because my link is invalid even though it is the only one that works. Or is it only me?

Hmm. The lint build seems to fail because isort has changed its mind about ordering imports again. That can be safely ignored. The docs build seems to fail on a bunch of links which look like they should be fine, so I've restarted it to check. I don't think it's anything to do with your edit, which looks fine to me.

P.S. I made a contribution page for you because I liked the way things were going with my first contribution. Maybe have a review on it to see if that is how you like your contributions: Create A new Notebook Extension

I guess you mean this to be a CONTRIBUTING.md file? Maybe replace "I" with your handle @benelot for clarity? Otherwise, sure, sounds like a good plan to me. I have a feeling we maybe once said we adhere to the ipython contributing guidelines, but I forget where I read that, or what they might be. At any rate, we can always update the file later as necessary...

jcb91 · 2018-02-01T19:18:41Z

src/jupyter_contrib_nbextensions/nbextensions/execution_dependencies/execution_dependencies.js

+                        var processing_queue = [root_cell_id];
+                        var processed_nodes = 0;
+                        var in_degree = {};                                                                    // ...collect in-degrees of nodes
+                        while(processing_queue.length > 0 && processed_nodes < Object.keys(dep_graph).length) {// ...stay processing deps while the queue contains nodes and the processed nodes are below total node quantity


this loop never increments processed_nodes, so gets stuck for circular dependencies...

Added it to the code. Ready for merging.

benelot · 2018-02-03T18:24:29Z

Yeah!

benelot changed the title ~~[WIP] Execution dependency extension.~~ [WIP] Execution dependency extension Jan 29, 2018

jcb91 reviewed Jan 29, 2018

View reviewed changes

benelot force-pushed the execution_dependencies branch from 627b3cc to cec798a Compare January 30, 2018 09:29

benelot changed the title ~~[WIP] Execution dependency extension~~ Execution dependency extension Jan 30, 2018

benelot force-pushed the execution_dependencies branch from cec798a to e99b02a Compare January 30, 2018 09:32

benelot changed the title ~~Execution dependency extension~~ [WIP] Execution dependency extension Jan 30, 2018

benelot force-pushed the execution_dependencies branch 8 times, most recently from 25e02f0 to 8ab463d Compare January 31, 2018 10:34

benelot changed the title ~~[WIP] Execution dependency extension~~ Execution dependency extension Jan 31, 2018

benelot changed the title ~~Execution dependency extension~~ [WIP ]Execution dependency extension Jan 31, 2018

benelot changed the title ~~[WIP ]Execution dependency extension~~ [WIP] Execution dependency extension Jan 31, 2018

benelot force-pushed the execution_dependencies branch 9 times, most recently from 2a18c0f to 59b6a9d Compare January 31, 2018 12:18

Implement execution dependency extension.

509c10a

benelot force-pushed the execution_dependencies branch from 59b6a9d to 509c10a Compare January 31, 2018 13:06

benelot changed the title ~~[WIP] Execution dependency extension~~ Execution dependency extension Jan 31, 2018

Update changelog to announce new extension.

6a94634

jcb91 reviewed Feb 1, 2018

View reviewed changes

jcb91 mentioned this pull request Feb 1, 2018

[docs] fix broken links in changelog #1217

Merged

Fix lack of processed nodes counter to handle cyclic dependencies.

0adfb29

jcb91 merged commit a698bbf into ipython-contrib:master Feb 3, 2018

juhasch mentioned this pull request Feb 4, 2018

FIXES #1224 - Added contribution section to README.md #1225

Merged

casperdcl mentioned this pull request Feb 8, 2019

[execution_dependencies] conditional execution [feature request] #1395

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution dependency extension #1213

Execution dependency extension #1213

benelot commented Jan 29, 2018 •

edited

jcb91 left a comment

jcb91 Jan 29, 2018 •

edited

benelot Jan 29, 2018 •

edited by jcb91

jcb91 Jan 29, 2018

jcb91 Jan 29, 2018

jcb91 Jan 29, 2018

jcb91 Jan 29, 2018

benelot Jan 29, 2018

jcb91 Jan 29, 2018

jcb91 Jan 29, 2018

benelot Jan 29, 2018

jcb91 Jan 29, 2018

jcb91 Jan 29, 2018

benelot Jan 30, 2018

jcb91 Jan 29, 2018

benelot Jan 29, 2018

jcb91 Jan 29, 2018 •

edited

benelot Jan 29, 2018

jcb91 Jan 29, 2018

jcb91 Jan 29, 2018

benelot Jan 29, 2018

jcb91 Jan 29, 2018

benelot commented Jan 30, 2018

benelot commented Jan 30, 2018

jcb91 commented Jan 30, 2018

benelot commented Jan 31, 2018 •

edited

benelot commented Jan 31, 2018

jcb91 commented Jan 31, 2018

benelot commented Feb 1, 2018

benelot commented Feb 1, 2018 •

edited

jcb91 commented Feb 1, 2018

jcb91 Feb 1, 2018

benelot Feb 3, 2018

benelot commented Feb 3, 2018

	if (!Jupyter.notebook.trusted) {
	dialog.modal({
	title : 'Initialization cells in untrusted notebook',
	body : 'This notebook is not trusted, so initialization cells will not be automatically run on kernel load. You can still run them manually, though.',
	buttons: {'OK': {'class' : 'btn-primary'}},
	notebook: Jupyter.notebook,
	keyboard_manager: Jupyter.keyboard_manager,
	});
	return;
	}

Execution dependency extension #1213

Execution dependency extension #1213

Conversation

benelot commented Jan 29, 2018 • edited

jcb91 left a comment

Choose a reason for hiding this comment

jcb91 Jan 29, 2018 • edited

Choose a reason for hiding this comment

benelot Jan 29, 2018 • edited by jcb91

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcb91 Jan 29, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benelot commented Jan 30, 2018

benelot commented Jan 30, 2018

jcb91 commented Jan 30, 2018

benelot commented Jan 31, 2018 • edited

benelot commented Jan 31, 2018

jcb91 commented Jan 31, 2018

benelot commented Feb 1, 2018

benelot commented Feb 1, 2018 • edited

jcb91 commented Feb 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benelot commented Feb 3, 2018

benelot commented Jan 29, 2018 •

edited

jcb91 Jan 29, 2018 •

edited

benelot Jan 29, 2018 •

edited by jcb91

jcb91 Jan 29, 2018 •

edited

benelot commented Jan 31, 2018 •

edited

benelot commented Feb 1, 2018 •

edited