Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional to the actual signal, send a message on the control port #294

Merged
merged 4 commits into from Nov 13, 2017

Conversation

@filmor
Copy link
Contributor

filmor commented Sep 22, 2017

In Erlang I have only on POSIX and only in the newest version some control over the normal runtime signals, in particular SIGINT. However, I can easily pick up and process the respective message on the control socket to interrupt execution.

Can this functionality be added? If yes, I'd continue and document the functionality and add tests. Maybe it would also be good to have the option of not sending an actual signal (via os.kill) at all.

@SpencerPark

This comment has been minimized.

Copy link
Contributor

SpencerPark commented Oct 7, 2017

I was just looking for something similar to this, specifically for interrupting the kernel. Sending something over the control channel makes complete sense for this. It is a high priority message like the shutdown request. I don't see the benefit of sending the signals over sending a control message.

Since we are at a higher level when working at the messaging level I think that the message doesn't need to be signal related anymore but rather simply an interrupt_request or similar. What do you think?

@rgbkrk

This comment has been minimized.

Copy link
Member

rgbkrk commented Oct 8, 2017

Cc @ivanov

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 9, 2017

The rationale for signals is that the kernel may block message processing while it is executing user code, so a new message - even on the control channel - will not necessarily interrupt it. This is the case in the reference Python kernel, in the R kernel, and probably in all kernels built on the Python 'wrapper kernel' machinery, e.g. the bash kernel.

We have also thought of adding messages for signals in a different context, to allow interrupting kernels running remotely. In that proposal, the frontend would send a signal message to a 'kernel nanny', a process running alongside the kernel, which would respond by sending a real Unix signal to the kernel process.

IIRC it also came up there that some kernels would rather get an interrupt message than a signal, so it sounds like this might be worth doing.

I'm not entirely convinced that sending both the signal and the message is the right thing to do. I don't yet have a concrete reason why not, but it feels a bit messy. Maybe kernels should have a way to declare whether they expect signals as real signals or as messages?

@SpencerPark

This comment has been minimized.

Copy link
Contributor

SpencerPark commented Oct 9, 2017

Fair enough, I think an opt-in for the interrupt message is a good compromise. This would be backwards compatible as well which is a big plus. A kernel.json flag seems like a good place for a kernel to make that request. I don't think it matters if the front-end acknowledges the request as currently kernels that don't handle the interrupt signal simply don't support interrupts. If a kernel requests an interrupt message as the method of communication and never receives one then the behavior is simply the same as it is currently.

My understanding of the control channel was that it should try to always be free to handle the high priority requests so when implementing the protocol I built with that in mind. It is on a separate thread and is therefore available to handle the request while the main shell is busy.

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 10, 2017

The way we implement the control channel is that it can pre-empt queued messages on the shell channel. So if you send 10 cells for execution and then immediately ask the kernel to shut down, it will finish executing the first cell and shutdown before starting the other 9. But only signals pre-empt something that's already running.

@filmor

This comment has been minimized.

Copy link
Contributor Author

filmor commented Oct 10, 2017

Is this behaviour of the control socket that a requirement, though? In Erlang I can easily interrupt and shut down execution while a cell is evaluated.

I think I agree with scoping this down to an interrupt_request instead of trying to handle arbitrary signals. However, there should still be a way to deactivate any kind of signal-use (apart from TERM). Erlang handles the situation kind-of gracefully, I think, but this is a property of the runtime that I can't necessarily influence from the kernel implementation.

Is there any scenario in which sending the interrupt_request as a message on the control socket unconditionally could break a kernel?

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 10, 2017

Is this behaviour of the control socket that a requirement, though?

Not particularly, or at least it's not really specified, as far as I know. I'm just explaining the background of why signals are needed. :-)

Is there any scenario in which sending the interrupt_request as a message on the control socket unconditionally could break a kernel?

I think it would be fine for a well-written kernel, but it could cause problems if a kernel author doesn't realise that they're getting both a signal and a message, and the kernel gets interrupted twice. I think it's common for kernels to print something like warning: unknown message type on unhandled messages, so if people start seeing unhandled interrupt_request messages, they're going to go and 'fix' that, even if interrupting already works with SIGINT.

Allowing the kernel to pick either messages or signals will hopefully avoid this kind of confusion.

@filmor

This comment has been minimized.

Copy link
Contributor Author

filmor commented Oct 18, 2017

Okay, I'll update the PR accordingly.

@filmor filmor force-pushed the filmor:interrupt branch from 5a43bf0 to 05b2642 Oct 19, 2017
@filmor

This comment has been minimized.

Copy link
Contributor Author

filmor commented Oct 19, 2017

Updated, please have another look.

if sys.platform == 'win32':
from .win_interrupt import send_interrupt
send_interrupt(self.kernel.win32_interrupt_event)
self._send_signal_message(signal.SIGINT)

This comment has been minimized.

Copy link
@SpencerPark

SpencerPark Oct 19, 2017

Contributor

I don't think the _send_signal_message applies anymore. Might be left over from the initial commit?

This comment has been minimized.

Copy link
@filmor

filmor Oct 19, 2017

Author Contributor

You're right, fixed :)

@filmor filmor force-pushed the filmor:interrupt branch from 05b2642 to 16119c9 Oct 19, 2017
@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 20, 2017

Thanks. Since this involves a change to the message protocol, I'm going to start a discussion on the mailing list.

@lresende

This comment has been minimized.

Copy link
Member

lresende commented Oct 23, 2017

@kevin-bates I want to make sure you see this, as this might be similar to what you did for remote kernel interrupt in "Enterprise Gateway"

@kevin-bates

This comment has been minimized.

Copy link
Member

kevin-bates commented Oct 23, 2017

Thanks for the heads up @lresende. Yes, I had seen this a few days ago.

The kernel launchers we use in Enterprise Gateway appear to follow the kernel nanny model described above, although our launchers actually house the target kernel.

Btw, another advantage of a message-based interrupt is that the interrupt can span userid differences (while signals do not). This is more of an issue in a gateway environment since services like JupyterHub will launch the entire notebook server as the target user (retaining userid matches between kernel managers and kernel instances).

@ccordoba12

This comment has been minimized.

Copy link
Contributor

ccordoba12 commented Oct 27, 2017

This is a really great addition!

I just have one question: Is it possible to add a restart message too? If that's the case, then we (Spyder) wouldn't need to wait for kernel nanny either.

@filmor

This comment has been minimized.

Copy link
Contributor Author

filmor commented Oct 30, 2017

@ccordoba12 What would you expect the kernel to do when it gets this message?

@takluyver There doesn't seem to be any activity on the ML regarding this PR, maybe it helps if you just cc all relevant people here ...

@minrk

This comment has been minimized.

Copy link
Member

minrk commented Oct 30, 2017

I think this is a fine addition. We just need to add this to the docs and bump the minor-revision of the spec.

@filmor filmor force-pushed the filmor:interrupt branch from 16119c9 to 42abb76 Oct 30, 2017
@ccordoba12

This comment has been minimized.

Copy link
Contributor

ccordoba12 commented Oct 30, 2017

@filmor, kernels can be restarted after a shutdown:

https://github.com/jupyter/jupyter_client/blob/master/jupyter_client/manager.py#L293

This would be useful for Spyder because people could interact with external kernels the same way as they would do with kernels started by Spyder itself.

@minrk, what do you think?

@SpencerPark

This comment has been minimized.

Copy link
Contributor

SpencerPark commented Oct 30, 2017

@ccordoba12 The shutdown_request looks like it already implements what you are asking about. The contents of this message has a restart flag to check if the shutdown precedes a restart.

What additional information are you looking for on top of that?

@ccordoba12

This comment has been minimized.

Copy link
Contributor

ccordoba12 commented Oct 30, 2017

Sorry, it seems you're right. I'll take a look at it more carefully.

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 31, 2017

Thanks @filmor . I think the remaining thing to do is to bump the protocol version to 5.3. The places I'm aware of this are in the messaging doc near the top, and in _version.py.

@filmor

This comment has been minimized.

Copy link
Contributor Author

filmor commented Nov 7, 2017

I have no idea why this test fails ...

filmor added 3 commits Sep 21, 2017
- interrupt_mode="signal" is the default and current behaviour
- With interrupt_mode="message", instead of a signal, a
  `interrupt_request` message on the control port will be sent
@filmor filmor force-pushed the filmor:interrupt branch from d748a48 to 21b9569 Nov 13, 2017
@filmor

This comment has been minimized.

Copy link
Contributor Author

filmor commented Nov 13, 2017

And I can not reproduce it locally. Any hints?

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Nov 13, 2017

It might be related to a change in zeromq, perhaps. @minrk this is the failure:

    def test_tracking(self):
        """test tracking messages"""
        a,b = self.create_bound_pair(zmq.PAIR, zmq.PAIR)
        s = self.session
        s.copy_threshold = 1
        stream = ZMQStream(a)
        msg = s.send(a, 'hello', track=False)
        self.assertTrue(msg['tracker'] is ss.DONE)
        msg = s.send(a, 'hello', track=True)
        self.assertTrue(isinstance(msg['tracker'], zmq.MessageTracker))
        M = zmq.Message(b'hi there', track=True)
        msg = s.send(a, 'hello', buffers=[M], track=True)
        t = msg['tracker']
        self.assertTrue(isinstance(t, zmq.MessageTracker))
>       self.assertRaises(zmq.NotDone, t.wait, .1)
E       AssertionError: NotDone not raised by wait
@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Nov 13, 2017

I think Min's changes in #304, which was just merged, probably fixed this. Closing and reopening to test.

@takluyver takluyver closed this Nov 13, 2017
@takluyver takluyver reopened this Nov 13, 2017
@filmor filmor changed the title [RFC] Additional to the actual signal, send a message on the control port Additional to the actual signal, send a message on the control port Nov 13, 2017
@filmor

This comment has been minimized.

Copy link
Contributor Author

filmor commented Nov 13, 2017

Yep, this looks better. Is there anything else to do?

- **interrupt_mode** (optional): May be either ``signal`` or ``message`` and
specifies how a client is supposed to interrupt cell execution on this kernel,
either by sending an interrupt ``signal`` via the operating system's
signalling facilities (e.g. `SIGTERM` on POSIX systems), or by sending an

This comment has been minimized.

Copy link
@takluyver

takluyver Nov 13, 2017

Member

SIGTERM -> SIGINT

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Nov 13, 2017

Thanks!

@takluyver takluyver merged commit 0d7d00f into jupyter:master Nov 13, 2017
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@takluyver takluyver added this to the 5.2 milestone Dec 15, 2017
@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Dec 15, 2017

@meeseeksdev backport

@meeseeksdev

This comment has been minimized.

Copy link

meeseeksdev bot commented Dec 15, 2017

There seem to be a conflict, please backport manually

takluyver added a commit that referenced this pull request Dec 15, 2017
…the control port

In Erlang I have only on POSIX and only in the newest version some control over the normal runtime signals, in particular SIGINT. However, I can easily pick up and process the respective message on the control socket to interrupt execution.

Can this functionality be added? If yes, I'd continue and document the functionality and add tests. Maybe it would also be good to have the option of not sending an actual signal (via `os.kill`) at all.

Signed-off-by: Thomas Kluyver <thomas@kluyver.me.uk>
@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Dec 15, 2017

Backported manually.

@filmor filmor deleted the filmor:interrupt branch Apr 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.