Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Notebook] Clear kernel queue #2524

Closed
Carreau opened this issue Oct 26, 2012 · 28 comments · Fixed by #4195
Closed

[Notebook] Clear kernel queue #2524

Carreau opened this issue Oct 26, 2012 · 28 comments · Fixed by #4195
Labels
Milestone

Comments

@Carreau
Copy link
Member

Carreau commented Oct 26, 2012

In kernel.js,
execution callback Queue keep growing in size even when callback are no more needed.

@tkf
Copy link
Contributor

tkf commented Nov 23, 2012

At what point can we clear the queue? After execute_reply? It is unclear to me because IPython's execute_request can have multiple replies, which is different from normal RPC where you can clear the queue once you get the reply.

@ellisonbg
Copy link
Member

Can you clarify this. Is it a problem with just the Javascript side of things or the actual kernels queue. Please give more details.

@Carreau
Copy link
Member Author

Carreau commented Nov 28, 2012

Only JS IIRC.
The callbacks are registered, but never removed from the array once called.

@tkf
Copy link
Contributor

tkf commented Nov 28, 2012

I looked up the source. In terminal console, it seems there is no callback dict as request is done in a blocking manner. In QT console, there is callback dict for execute reply etc., but not for iopub (e.g., inline pylab). I guess this is because in QT console, result always appears linearly (after the already executed input prompt). In contrast, in notebook you need to remember in which cell the reply goes.

Looking at ZMQTerminalInteractiveShell.run_cell, it seems that you can remove the callback after execute reply.
https://github.com/ipython/ipython/blob/master/IPython/frontend/terminal/console/interactiveshell.py#L138

But I am wondering if it is safe to assume that all iopub reply reaches at the client at the point execute reply arrives. As both messages go through different channels on different ports, isn't it possible to execute_reply message to arrive before iopub messages? Or ZMQ solves this problem wisely?

@Carreau
Copy link
Member Author

Carreau commented Nov 28, 2012

I was also wondering what would append if a js plugin would register a callback itself.
Potentially if we slightly change the kernel logic to send 'idle' when the kernel-side-queue is empty we could considere clearing this.
But there should be another way like attaching callback to the object they are coming from and clearing on new execution request...

@ellisonbg
Copy link
Member

The traffic on iopub can arrive in any order, before or after the execute reply is received. We can't make any assumptions about ordering. Is the issue you are worrying about that the iopub callbacks never get cleared?

@tkf
Copy link
Contributor

tkf commented Nov 29, 2012

Is the issue you are worrying about that the iopub callbacks never get cleared?

If "you" is me, yes :)

How about something like "end of IO signal" from kernel to tell clients that it is safe to clear callback queues. This way, you don't need to assume sequential execution and it will be thread-friendly.

@ellisonbg
Copy link
Member

The problem is that the kernel doesn't know when code is done running for a cell. Users can always start a thread and that thread could keep pushing output at later times. But there is a problem with this right now - that output will be associated with the wrong cell. So maybe it is not too bad to do what you suggest? Thoughts?

@tkf
Copy link
Contributor

tkf commented Dec 1, 2012

Yes, it is hard for threads created by user. Probably we can monkey patch threading module, or provide a cell magic command to temporally do monkey patch? The monkey patch I have in mind wraps the target function so that it sends "end of IO signal" if it is the last thread started in the corresponding execution request. At least, we can provide IPython-aware threading API so that it can be used in, for example, some magic functions such as %%script --bg. But I guess it will need significant change in IPython kernel. For example, we need to record threads started in execute request and check which thread it is in when sys.stdout (OutStream) is used.

But implementing "end of IO signal" does not require changes for making IPython thread-aware.

Other way I can think of is to move execute_reply to iopub channel, so that client can be sure all of the related IO replies are reached to the client at the time it gets execute_reply. It's like using execute_reply as "end of IO signal". But "broadcasting" execution reply on IO pub probably does not make sense.

@ellisonbg
Copy link
Member

It is a bit more complicated than this and I don't think it will work to muck with threads like this. Here is how the logic works:

  • When a command is sent to the kernel, we grab the id of the request.
  • Any subsequent data being written to the iopub channel gets that request id. This enables the frontend to know which cell the output is associated with.
  • That request id stays active on iopub until the next command arrives.

What this means for threading is that if a thread lives beyond a single command, whenever it writes to iopub, it will use whatever request id is active at the time. This means that the threading output will appear in multiple cells - all of those that are run while the thread lives. I don't know of any way for the thread's output to be redirected to the current original cell.

Here is one option:

  • Introduce a new "output_finished" message on iopub.
  • This message would be sent at the same time that execute_reply is sent.
  • The frontend could use this to know when output is done.

BUT,threading or other long lived code (!mycode &) would still write output to iopub. But the frontend simply wouldn't have a callback handler for it. But maybe this is not any worse than the current situation where that output is written to multiple cells. So here is the big question underlying all of this:

How do we want to handle output of long lived things (threads and !mycode &)? @fperez @minrk ?

@tkf
Copy link
Contributor

tkf commented Dec 4, 2012

The iopub behavior you described is more less the same as what I think it would be (actually I thought request id will be cleared after execute_reply). And output_finished message you suggested is the same as what I thought it should be for the first implementation (i.e., no thread support).

For the thread support, I am imagining doing something like the following in OutStream.flush():

def flush(self):
    data = self._buffer.getvalue()
    content = {u'name': self.name, u'data': data}
    parent_header = THREAD_HEADER_MAP[threading.current_thread()]
    self.session.send(
        self.pub_socket, u'stream', content=content,
        parent=parent_header, ident=self.topic)

Here THREAD_HEADER_MAP is a hypothetical dict which maps a thread object to the message header object. Then, you can use something like the following for thread target function to register the thread to this dict.

def target_wrapper(parent_header, real_target):
    THREAD_HEADER_MAP[threading.current_thread()] = parent_header
    try:
        real_target()
    finally:
        THREAD_HEADER_MAP.pop(threading.current_thread())
        for (thread, header) in THREAD_HEADER_MAP.items():
            if header == parent_header:
                break
        else:
            get_session().send(self.pub_socket, u'output_finished',
                              parent=parent_header)

(please imagine that THREAD_HEADER_MAP is a thread-safe dict)

@minrk
Copy link
Member

minrk commented Dec 4, 2012

Introduce a new "output_finished" message on iopub.

This makes sense. The status=idle message already serves exactly this purpose (it is the last message of an execute_request, and on iopub channel), and IPython.parallel is using it to indicate that output is complete. Adjusting the message spec to make this clearer and/or more explicit would probably be beneficial. Basically an EOF-type message.

Right now, we have 'status=busy / idle' messages, which imply that the kernel has started / finished working on a task. It makes a bit more sense to me to have 'started / finished task' messages that imply that the kernel is busy / idle.

@ellisonbg
Copy link
Member

@minrk makes a great point. We already send a message at exactly the right
point. Let's think about that. In order to clear the queue, the frontend
simply needs to know which request just finished. What if we extend both
the status busy|idle message to include the id of the request that just
started. We could rename these message types to something like
execute_start/execute_stop? Maybe task_start/stop? But I am a little
hesitant to go changing the message spec in a way that will break
frontends, so may be keep the same names, just add the request id to the
message content?

On Tue, Dec 4, 2012 at 1:03 PM, Min RK notifications@github.com wrote:

Introduce a new "output_finished" message on iopub.

This makes sense. The status=idle message already serves exactly this
purpose (it is the last message of an execute_request, and on iopub
channel), and IPython.parallel is using it to indicate that output is
complete. Adjusting the message spec to make this clearer and/or more
explicit would probably be beneficial. Basically an EOF-type message.

Right now, we have 'status=busy / idle' messages, which imply that the
kernel has started / finished working on a task. It makes a bit more sense
to me to have 'started / finished task' messages that imply that the
kernel is busy / idle.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2524#issuecomment-11015603.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@ellisonbg
Copy link
Member

@tkf I really don't want to hack threads like this. We will face the same issues when users background processes using popen or !mycode &, so we should have a general approach to the problem. Also, using threads in the notebook is rare, but backgrounding processes is more common. We just need to figure out where to put the output that comes back and is not associated with a particular cell.

@tkf
Copy link
Contributor

tkf commented Dec 4, 2012

What do you mean by general? If that means "it can be applied to many cases", I think my example is general enough. You can write the magic to execute command in background using the thread target function I mentioned. But of course I agree we should not do this by default.

Also I agree that the magic to execute shell command does not need to use sys.stdout so that there is no need for THREAD_HEADER_MAP. But that's a specialized case which is possible only within IPython module, right? I'd like to know how to support general code emitting iopub (sys.stdout, plot, etc...) in thread without something like THREAD_HEADER_MAP.

@ellisonbg
Copy link
Member

By general, I mean that there are many ways of running external processes:
popen, pexpect, os.system in a thread, etc. It is too complex of a problem
to try to handle all of these cases with this threading trick.

I also don't want to build anything into the ipython core that requires
mucking with threads and thread safe dicts, etc.

On Tue, Dec 4, 2012 at 1:41 PM, Takafumi Arakaki
notifications@github.comwrote:

What do you mean by general? If that means "it can be applied to many
cases", I think my example is general enough. You can write the magic to
execute command in background using the thread target function I mentioned.
But of course I agree we should not do this by default.

Also I agree that the magic to execute shell command does not need to use
sys.stdout so that there is no need for THREAD_HEADER_MAP. But that's a
specialized case which is possible only within IPython module, right? I'd
like to know how to support general code emitting iopub (sys.stdout, plot,
etc...) in thread without something like THREAD_HEADER_MAP.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2524#issuecomment-11016943.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@ellisonbg
Copy link
Member

I also don't want to create the expectation that output from threads will
behave in a sensible manner. We are better off to simply say "that is not
supported".

On Wed, Dec 5, 2012 at 4:35 PM, Brian Granger ellisonbg@gmail.com wrote:

By general, I mean that there are many ways of running external processes:
popen, pexpect, os.system in a thread, etc. It is too complex of a problem
to try to handle all of these cases with this threading trick.

I also don't want to build anything into the ipython core that requires
mucking with threads and thread safe dicts, etc.

On Tue, Dec 4, 2012 at 1:41 PM, Takafumi Arakaki <notifications@github.com

wrote:

What do you mean by general? If that means "it can be applied to many
cases", I think my example is general enough. You can write the magic to
execute command in background using the thread target function I mentioned.
But of course I agree we should not do this by default.

Also I agree that the magic to execute shell command does not need to use
sys.stdout so that there is no need for THREAD_HEADER_MAP. But that's a
specialized case which is possible only within IPython module, right? I'd
like to know how to support general code emitting iopub (sys.stdout, plot,
etc...) in thread without something like THREAD_HEADER_MAP.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2524#issuecomment-11016943.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@tkf
Copy link
Contributor

tkf commented Dec 7, 2012

If you want thread support only for calling command line program, why do you need to support popen, pexpect and os.system? If there is no thread API, what user can do is to run command line program through magic command or subprocess API, right? Then, there is no need for any general way to support thread. And I don't uderstand why my threading trick cannot handle all these ways of running subprocess, if the trick actually works as intended. It's not like I am 100% sure about my approach. I am just puzzled that why you just want to restrict threading support to subprocess related stuff while talking about generality. Supporting any subprocess related programming in thread is as difficult as supporting everything you can do in thread. It's a stupid example, but you can execute any python code after running some external command. If you are interested only in running external command, I guess it's better to just provide a magic command to do that.

Having said that, it would be really nice if IPython provide thread API. For example, as ctypes releases GIL, you can call c function in a thread and update its output as a graph once in a while. Most of the time (i.e. computation is done in C), you can execute code in the same notebook.

@ellisonbg
Copy link
Member

But if we add a new message to the iopub channel that clears the iopub
callback for a cell once the execute_reply has been sent, it doesn't matter
what we do with subsequent threaded output. Even if we are clever and able
to tag the output with the right message id, there won't be a callback to
handle the output. We simply should figure out a way to handle output
without a message id and not worry about trying to tag threaded output
correctly at this point.

On Fri, Dec 7, 2012 at 11:17 AM, Takafumi Arakaki
notifications@github.comwrote:

If you want thread support only for calling command line program, why do
you need to support popen, pexpect and os.system? If there is no thread
API, what user can do is to run command line program through magic command
or subprocess API, right? Then, there is no need for any general way to
support thread. And I don't uderstand why my threading trick cannot handle
all these ways of running subprocess, if the trick actually works as
intended. It's not like I am 100% sure about my approach. I am just puzzled
that why you just want to restrict threading support to subprocess related
stuff while talking about generality. Supporting any subprocess related
programming in thread is as difficult as supporting everything you can do
in thread. It's a stupid example, but you can execute any python code after
running some external command. If you are interested only in running
external command, I guess it's better to just provide a magic command to do
that.

Having said that, it would be really nice if IPython provide thread API.
For example, as ctypes releases GIL, you can call c function in a thread
and update its output as a graph once in a while. Most of the time (i.e.
computation is done in C), you can execute code in the same notebook.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2524#issuecomment-11142506.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@ellisonbg
Copy link
Member

But all of this is much less important than simply clearing the JS
callbacks, so let's focus on that.

On Mon, Dec 10, 2012 at 9:29 AM, Brian Granger ellisonbg@gmail.com wrote:

But if we add a new message to the iopub channel that clears the iopub
callback for a cell once the execute_reply has been sent, it doesn't matter
what we do with subsequent threaded output. Even if we are clever and able
to tag the output with the right message id, there won't be a callback to
handle the output. We simply should figure out a way to handle output
without a message id and not worry about trying to tag threaded output
correctly at this point.

On Fri, Dec 7, 2012 at 11:17 AM, Takafumi Arakaki <
notifications@github.com> wrote:

If you want thread support only for calling command line program, why do
you need to support popen, pexpect and os.system? If there is no thread
API, what user can do is to run command line program through magic command
or subprocess API, right? Then, there is no need for any general way
to support thread. And I don't uderstand why my threading trick cannot
handle all these ways of running subprocess, if the trick actually works as
intended. It's not like I am 100% sure about my approach. I am just puzzled
that why you just want to restrict threading support to subprocess related
stuff while talking about generality. Supporting any subprocess
related programming in thread is as difficult as supporting everything you
can do in thread. It's a stupid example, but you can execute any python
code after running some external command. If you are interested only in
running external command, I guess it's better to just provide a magic
command to do that.

Having said that, it would be really nice if IPython provide thread API.
For example, as ctypes releases GIL, you can call c function in a thread
and update its output as a graph once in a while. Most of the time (i.e.
computation is done in C), you can execute code in the same notebook.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2524#issuecomment-11142506.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@Carreau
Copy link
Member Author

Carreau commented Dec 10, 2012

I propose Attaching cell id to callbacks (as callbacks are function one can attach other attributes to it)
and remove them when the cell is re-executed/deleted.
This will at least add an upper-limit to the number of stored callback.

Note that this will only be an issue for really long running notebook.

@ellisonbg
Copy link
Member

This is a pretty simple way of going about it, but I still think we want to
clear the queue more aggressively for long notebooks. I am also not
convinced we need cell ids yet. But I am seeing more reason to introduce
them.

On Mon, Dec 10, 2012 at 10:12 AM, Bussonnier Matthias <
notifications@github.com> wrote:

I propose Attaching cell id to callbacks (as callbacks are function one
can attach other attributes to it)
and remove them when the cell is re-executed/deleted.
This will at least add an upper-limit to the number of stored callback.

Note that this will only be an issue for really long running notebook.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2524#issuecomment-11209843.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@minrk
Copy link
Member

minrk commented Dec 10, 2012

the request id is already in the parent_header field. That's what parent_header has always been for - identifying the request that caused the reply / side-effect. What's missing?

@tkf
Copy link
Contributor

tkf commented Dec 11, 2012

@minrk, What @Carreau suggested was cell id. Currently it is not in the message, right? And I guess he is suggesting for adding JS callback queues, not message.

I think once output_finished is implemented there will be no left over in the message queue. There is no need for extra care to clear the queue, no?

@tkf
Copy link
Contributor

tkf commented Dec 11, 2012

@ellisonbg

what we do with subsequent threaded output. Even if we are clever and able
to tag the output with the right message id, there won't be a callback to
handle the output. We simply should figure out a way to handle output

That's why I send output_finished in the last remaining thread. That
way, callback is not cleared until all the created threads are finished.
But I agree this discussion should go to other issue. I just want to
make the solution for the queue more extensible.

@minrk
Copy link
Member

minrk commented Dec 11, 2012

What @Carreau suggested was cell id. Currently it is not in the message, right? And I guess he is suggesting for adding JS callback queues, not message.

Gotcha, I misunderstood - that was about the callback itself, not any addition to message content.

So, I guess here's the question getting back to the issue at hand: In what way does the current status=idle message not address this issue, other than being poorly named? When status=idle is received, the iopub callback should be unregistered. I expect that the situation is a bit different for aborted tasks, so that will need to be looked at. Further, the callbacks can't be keyed by cell_id, because that won't work for widget-type requests, which are not associated with any cell.

@tkf
Copy link
Contributor

tkf commented Dec 11, 2012

I thought if you send 100 execution request at once there will be only one status=idle after the last execution reply. But I just found out that there will be 100 replies with msg_id in its parent_header. So I think status=idle is functionally same as output_finished I was talking about.

@ellisonbg
Copy link
Member

Yes, I don't think we need another message to be sent.

On Mon, Dec 10, 2012 at 5:51 PM, Takafumi Arakaki
notifications@github.comwrote:

I thought if you send 100 execution request at once there will be only one
status=idle after the last execution reply. But I just found out that there
will be 100 replies with msg_id in its parent_header. So I think
status=idle is functionally same as output_finished I was talking about.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2524#issuecomment-11227805.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants