Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook cell execution #169

Open
davidbrochart opened this issue Jun 15, 2023 · 19 comments · May be fixed by #197
Open

Notebook cell execution #169

davidbrochart opened this issue Jun 15, 2023 · 19 comments · May be fixed by #197
Labels
enhancement New feature or request

Comments

@davidbrochart
Copy link
Collaborator

davidbrochart commented Jun 15, 2023

Problem

Today at the Jupyter Server meeting we talked about notebook restoration, where RTC could be a solution (see jupyter-server/jupyter_server#900). In that solution, the frontend just displays live changes to the notebook shared model. But currently, a notebook cell execution is either None or an integer greater than 0, so there is no way to show that a cell is executing.

Proposed Solution

We could encode the "cell executing" state in the execution count, for instance as "*". I think it could even be part of nbformat, because it brings some information about the state of the notebook at the time it was saved.

@fcollonval
Copy link
Member

Taking the following scenario with the above proposal:

  • User A open a notebook
  • User A runs all cells among which one takes a long time
  • While the execution is processing the long running cell, user A saves the document
  • User A needs to shutdown the application (included server and kernel) for some external reasons before the long running cell finishes.
  • Two hours later user A reopens the notebook
  • The long running cell display a * that does not mean I'm running

So for me a state information should not be part of the document.

There are a couple of other scenarii I can think of that will make the * storage confusing; e.g. sharing a notebook file with a cell in that state with coworkers, open a read-only notebook with such state,... .

@echarles
Copy link
Member

So for me a state information should not be part of the document.

After reading twice @fcollonval scenario, I tend to agree that the state information should not be part of the document.

This drives I think to a more general discussion about "what is a document?, what is a notebook?, what is a kernel?, what is a server? what is a state?" in RTC land.

@davidbrochart
Copy link
Collaborator Author

  • The long running cell display a * that does not mean I'm running

No, it means I was not finished when the notebook was saved, and it's aligned with the cell outputs. For instance, imagine a cell that prints intermediary results and a final result. The saved notebook won't show the final result, which is reflected in the execution state as not finished.

So for me a state information should not be part of the document.

It is already, implicitly through the outputs. IMO the execution state just makes it more explicit.

@echarles
Copy link
Member

No, it means I was not finished when the notebook was saved,

This is a developer perspective. How can user eyes go from * to not finished when the notebook was saved?

It is already, implicitly through the outputs. IMO the execution state just makes it more explicit.

You are taking for granted that the current implementation (the outputs are part of the CRDT) is the right approach, which I am not convinced.

@davidbrochart
Copy link
Collaborator Author

How can user eyes go from * to not finished when the notebook was saved?

Let me reformulate to just not finished executing, which is the current situation.

You are taking for granted that the current implementation (the outputs are part of the CRDT) is the right approach, which I am not convinced.

No, I am just taking for granted that outputs are part of the notebook format, CRDT or not.

@fcollonval
Copy link
Member

Let me reformulate to just not finished executing, which is the current situation.

No the current situation is more specific than that; we displayed a * are:

  • cell is currently executed in the kernel
  • cell is queued for execution

So there are always an assumption it is a transient information linked to some live interaction with the kernel.


You are raising a good point about the outputs that could be reflecting only a partial execution. So I lean towards proposing adding an information in the document about a cell being partly executed that should be unrelated to the * used in the UI. That could be stored in the execution_count or else where.

@davidbrochart
Copy link
Collaborator Author

  • cell is queued for execution

Although execution being queued is really kernel specific. For instance, akernel has the ability to run cells concurrently, so they might never be queued. If cells are queued, it's only because the kernel is executing blocking code.
So for me the * really means not finished executing, but we don't know if the cell started executing. Let's say it is scheduled for execution. And I think this * is the exact information that we want to store in the document.

@echarles
Copy link
Member

No, I am just taking for granted that outputs are part of the notebook format, CRDT or not.

I stil think that nbformat and the CRDT shared models are different things. Similar but different, so governed by different rules and schemas.

... If cells are queued, it's only because the kernel is executing blocking code. So for me the * really means not finished executing, but we don't know if the cell started executing. Let's say it is scheduled for execution. And I think this * is the exact information that we want to store in the document.

You clearly define two different states, but foresee only one indicator for the user. This could be better, like eg..

  • . = queued for execution
  • * = being executed.

However, the static representation of the notebook in a ipynb file with nbformat should not be discussed at the same level as the runtime status of that same notebook. Hence my point above that the rules and schemas should be different.

@echarles
Copy link
Member

I would even be tempted to say that the output of a cell not being fully executed should not be part of that saved version (the ipynb file) of the notebook, as showing incomplete and potentially giving wrong conclusions to the reader. I don't expect everyone to agree with that, but I expect this to be a discussion point.

@davidbrochart
Copy link
Collaborator Author

You clearly define two different states, but foresee only one indicator for the user.

My point is that there are not two different states, only one which is scheduled for execution. The * visual indicator corresponds to this state.

@davidbrochart davidbrochart linked a pull request Nov 15, 2023 that will close this issue
@davidbrochart
Copy link
Collaborator Author

davidbrochart commented Nov 15, 2023

I opened #197 which adds an execution_state field to a cell, that encodes the execution state as an "idle" or "busy" string.

@krassowski
Copy link
Contributor

As a counter proposal, #227 adds pending_requests. We already discussed this at length with @davidbrochart but I am still on fence here. I wanted to highlight it here in case if @echarles or @fcollonval have thoughts on this.

@echarles
Copy link
Member

echarles commented May 9, 2024

Thx a lot @krassowski for joining the bits. I sometimes tend to step a it too much back, but I can see separated tracks like Server model / POST api / RTC / Async / Pending that are discussed and implemented, while not having a well defined complete picture of the interactions. So yes, I have toughts, and the thought is that we miss a copter view of all this...
`

@krassowski
Copy link
Contributor

To recap, we have three proposals:

Separately, there is a question of recording execution timing data on the server side (for use cases of jupyterlab-execute-time and https://github.com/mwakaba2/jupyterlab-notifications extensions).

@echarles
Copy link
Member

echarles commented May 9, 2024

Thx, useful for the pending_request story. There are other aspects that are not tackled like:

  • the user input which does not work for now in case of server model
  • ipywidgets support in the server model (and rtc), while being backwards compatible with all the existing huge ipywidgets ecosystem.

These are just 2 examples, I am pretty sure there are other ones that will be discovered as side effects if we don't carefully identify them.

@krassowski
Copy link
Contributor

You clearly define two different states, but foresee only one indicator for the user.

My point is that there are not two different states, only one which is scheduled for execution. The * visual indicator corresponds to this state.

I think there should be two different states even in an async kernel, because there is a use case for executing async cells with dependencies and cancelling a cell which was scheduled to run while its dependencies have not finished.

In fact the execution queue visualisation and manipulation is one of the most frequently requested features in JupyterLab issue tracker: jupyterlab/jupyterlab#7825. I believe we should design the API with this in mind.

@krassowski
Copy link
Contributor

In the future maybe should be a fourth state to indicates that the kernel died when the cells were executing (or waiting in execution queue), say [E].

@krassowski
Copy link
Contributor

Also, pending user input is a special case of running.

@krassowski
Copy link
Contributor

krassowski commented Jun 4, 2024

Coming here from a difference place today: I am trying to implement an indicator that a cell is running in the minimap for notebook; the minimap has access to cell models but not widgets.

The problem is the [*] prompt is not even a part of the ICodeCellModel in JupyterLab. Instead the CodeCell widget has a setPrompt() method which accept a string. This is different from execution_count which is a number.

To move forward with implementation I want to deprecate setPrompt in favour of semantic information about the cell state in the cell model.

Thinking out loud, both execution_state and pending_requests can encode information about (a) whether the cell is currently running (b) whether the cell is scheduled for execution - as long as execution_state is a string and not a boolean (which is already the case in #197).

Further, I would think that pending_requests might be a private implementation detail of the CodeCellModel with execution_state being the only thing that is exposed.

I think the logic for prompt would be, in pseudo code:

if execution_state == 'idle':
  if execution_number:
    prompt = '[' + execution_number + ']'
  else:
    prompt = '[ ]'
elif execution_state == 'queued':
  prompt = '[.]'
elif execution_state == 'running':
  prompt = '[*]'

This is also loosely related to datalayer/jupyter-server-nbmodel#13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants