Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid iopub losing messages by polling two iopub AND shell socket #1183

Closed

Conversation

maartenbreddels
Copy link
Collaborator

A POC/port to nbconvert of voila-dashboards/voila#536
Solves: nteract/papermill#426
Alternative to #994

I think eventually this should go into nbclient, but opening this just to show how it can be done.

#994 polls for 1 second on the shell channel before polling/reading from the iopub channel. In this 1 second, the iopub socket can hit the high water mark of 0mq (default of a 1000).
This causes some messages to be dropped, causing a IOPub timeout. Instead, in this PR, we poll both sockets, receiving them ASAP, no messages lost.

Fixing the unittests might be difficult, since it requires monkeypatching zmq.select I think.

if monotonic() > deadline:
self._handle_timeout(exec_timeout, cell)
if xlist:
raise RuntimeError("Oops, unexpected rror")
Copy link
Member

@jasongrout jasongrout Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise RuntimeError("Oops, unexpected rror")
raise RuntimeError("Oops, unexpected error from zmq")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is partly the reason this is a draft, we don't have a test that could trigger this. My guess is that maybe closing the socket from the kernel side might do it, or segfaulting the kernel.

@MSeal
Copy link
Contributor

MSeal commented Feb 9, 2020

Thanks for putting this together. I'll want to dedicate some time to comparing and testing it as this section of the code is hard to fully reason about on inspection alone. The good news is this code is still the same in nbclient so it should translate over cleanly. I can help with the tests if we get to approval and there's still gaps.

@echuber2
Copy link

Hi, was this implemented in another PR? I still see IOPub timeouts sometimes.

@MSeal
Copy link
Contributor

MSeal commented Mar 10, 2021

@echuber2 the work here is not longer applicable as it was rewritten and moved into nbclient with the 6.0 release. There is always an internal buffer somewhere with the zmq communication. Timeouts usually occur when there's 1000's or 10s of 1000's of messages in a very short time (a second). I'd look at if your notebook is trying to print way too much information at once as it's also likely a mess in output formatting as well when there's that many messages. If there's a smaller unit of messages but still hitting timeouts I'd post to nbclient with more details on what you're executing.

@MSeal MSeal closed this Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants