-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Routing ThreadPoolExecutor results to originating parent_header/cell #9969
Comments
Ever so slightly related to ipython/ipykernel#109 |
Background threads are nice, but background callables are what I really care about :) |
Thanks @rgbkrk ! |
I think to do this we would need some hook which was run when a thread was created, inside the newly created thread. We could use that to take a copy of the parent message ID at the time the thread was created, so that output was routed back to there. However, I don't know of any such hook. With the proposal for a kernel nanny to capture stdout/stderr at a lower level, I think such low-level output would always have this issue (showing up under the last cell executed), because that output won't be sent with a parent ID. That's probably not terribly important, though. |
@takluyver There's also the issue of what to do with things created by Tornado timed calls, then. |
I just realized I failed to include my code for this. Here's a reproduction: import time
def print_wait(data, timeout):
time.sleep(timeout)
print(data)
from concurrent import futures
tp = futures.ThreadPoolExecutor(max_workers=10) |
I hear you on background callables - we should err on the side of message passing across boundaries. |
It's a bit complicated to track the 'right' cell in general, e.g. is it:
The first is pretty easy to implement, but rarely desirable, because it would route all output to the cell that instantiated the ThreadPool. I don't know how to do 2. without baking awareness of the parent state into a ThreadPool subclass, and at that point I'm not sure now useful it is. If we assume that we control the ThreadPoolExecutor (i.e. it's our subclass), we should be able to do it with:
The last tricky bit is the
|
As a selfish frontend developer, I'd prefer to close off all my subscriptions/listenders on an idle. Yet, as a user of asynchronous background processes, I'd like a way to route accordingly. This isn't even just for the linear notebook - it affects dashboards, general output areas, etc. |
Before we figure out how to make At the time some code is executed, it is being executed in a namespace, and some values are injected there. Dealing with # cell A
def from_this_cell(arg, *, show=output_to_this_cell):
show(arg) Then later # cell B
from_this_cell("output below cell A") Is there already a building block that can be used like this for threaded/callbacked notebook code? |
While it doesn't exist yet, there's a bit of a proposal to provide an update mechanism like this: from IPython.display import display, HTML, Updatable
d = Updatable(HTML('<b>whoa</b>'))
display(d)
# ...
d.replace(HTML('<i>awesome</i>')) |
@glyph there's an Output widget that uses the comms system to send output to a different location, but these are 'live widget' things, not plain outputs. All of the routing information is really in the The main thing I think we need to do is to simplify (dramatically) the parent-setting machinery, so that it's easier to swap it out. Right now, The second thing we need to do is to change in the frontend how output handlers are cleared in the frontend. Right now, we are clearing callbacks once status-idle has been received, and only handling async output if it appears to come from the most recently executed cell (this is what IPython's stateful storage of parent will do with background threads). We would need to move the callback-clearing to the cell level, rather than the message level, to better indicate that there are no event listeners for a given output area. This notebook demonstrates all the hacks necessary to get it to work right now:
And even that won't work for concurrent outputs, as the state is still process-wide, so setting the parent in one thread sets it for all threads as long as that thread is in its context. So we need to:
|
the frontend-side is done (in one frontend) here: jupyter/notebook#1826 |
@rgbkrk yup, the disable-status bit is telling frontends that "this cell will never be done" to ensure that they keep their handlers active for outputs from that cell. With jupyter/notebook#1826 this extra bit is unnecessary, and the cell can send its idle message as normal without clearing the output handlers. So if we want to minimally allow retargeting to cells, the main kernel feature is a nicer API for recording and reassigning the parent. The harder part (in IPython, at least) is that the parent must be thread-local to avoid problems with concurrent output, but also fallback on the main thread storage if no thread-specific parent has been set. |
Code run in a background thread does not necessarily match the output area with the originating
parent_header
.In the case of the jupyter notebook, this means the last cell to be run ends up with stdout or display_data from that prior cell. Is there any way for a user to scope outputs in background threads to target an original cell?
My example above is trivial, but I'll also give context on asychronous background tasks that a user is likely to do:
I know @glyph cares about this too. 😉
The text was updated successfully, but these errors were encountered: