Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type annotations, refactor sync/async #623

Merged
merged 9 commits into from
Mar 26, 2021

Conversation

davidbrochart
Copy link
Member

@davidbrochart davidbrochart commented Mar 8, 2021

Fixes #553
Fixes #621

@davidbrochart davidbrochart marked this pull request as draft March 8, 2021 16:20
context = Instance(zmq.Context)
def _context_default(self):
context: Instance = Instance(zmq.Context)
def _context_default(self) -> zmq.Context:
Copy link
Member Author

@davidbrochart davidbrochart Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised not to see the default decorator here:

    @default('context')

Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @default decorator was added to traitlets more recently than this code was written. Traitlets used to exclusively use 'magic method names' instead of decorators. The decorator approach is not required for the default-generator, but fine to add if you are making a pass on all these bits.

def _client_factory_default(self):
client_class: DottedObjectName = DottedObjectName('jupyter_client.blocking.BlockingKernelClient')
client_factory: Type = Type(klass='jupyter_client.KernelClient')
def _client_factory_default(self) -> Type:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No @defaults('client_factory') here?


def _kernel_spec_manager_default(self):
def _kernel_spec_manager_default(self) -> kernelspec.KernelSpecManager:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No @defaults('kernel_spec_manager') here?

@davidbrochart
Copy link
Member Author

I don't know how to add type hints to traits, right now this doesn't have any effect (see ipython/traitlets#647).

@davidbrochart
Copy link
Member Author

davidbrochart commented Mar 9, 2021

The fact that we are trying to support sync/async in a single code base is actually not good for type hints. For instance, a KernelManager has a _kill_kernel() method that should return None. But then, an AsyncKernelManager, which derives from KernelManager, redefines _kill_kernel() as an async method. If we try to type annotate the return value of these methods, we get the following mypy error:

Return type "Coroutine[Any, Any, None]" of "_kill_kernel" incompatible with return type "None" in supertype "KernelManager"

I posted a question on Stack Overflow about that, it seems that this is a legit error. I'm not sure what to do about that.

@davidbrochart
Copy link
Member Author

In #533 we suggested to have a single, async-native code base for jupyter-client, as we did in nbclient. The non-async methods would be wrappers around the async methods, and run them until complete. This would probably solve the issue I mentioned above, and it will also remove some duplicated code.
Any thoughts, should we do that?

@dhirschfeld
Copy link
Contributor

I've thought the unasync project would be a good way to support both sync and async apis. In the end we went async-only but maybe it would be useful here where async-only probably isn't an option.

@davidbrochart
Copy link
Member Author

Thanks a lot @dhirschfeld, I didn't know this project.
It seems to be based on code generation, which is actually quite nice since it allows to have a single (async) code base, without the problems of translating it to blocking code at run-time, since it's done ahead of time.
I'll look at it in greater details, thanks again.

@dhirschfeld
Copy link
Contributor

The folks on the trio gitter channel are a very helpful bunch if you have questions about adapting it to your requirements.
https://gitter.im/python-trio/general

@davidbrochart
Copy link
Member Author

Thanks, I was worried it would be only for trio-specific async code, but it seems to apply to general async code, since it basically just removes async, await, and changes __aenter__ to __enter__ and stuff like that, so that's great.

@davidbrochart
Copy link
Member Author

This last commit refactors KernelManager.
Previously KernelManager was the blocking KM, and AsyncKernelManager derived from it, redefining some methods as async. Apart from duplicating a lot of code, this prevented to type-annotate these methods.
Now, KernelManager is a base class from which BlockingKernelManager and AsyncKernelManager derive from. KernelManager only has the async version of the methods (single source of truth), and BlockingKernelManager wraps and run them until complete in an event loop. This event loop runs in a separate thread to avoid issues with an already running event loop.

@kevin-bates
Copy link
Member

Isn't this going to break direct subclasses of KernelManager (besides AsyncKernelManager)?

Since we want to eventually have one implementation (async), what is the downside of a split-and-deprecate approach while using inheritance purely as the means to support configuration in a backwards-compatible manner? Yes, it means some duplication of code, but it doesn't stifle innovation, is much simpler to support and is temporary.

@davidbrochart
Copy link
Member Author

Isn't this going to break direct subclasses of KernelManager (besides AsyncKernelManager)?

Yes, previous subclasses of KernelManager will have to subclass BlockingKernelManager instead.

Since we want to eventually have one implementation (async), what is the downside of a split-and-deprecate approach while using inheritance purely as the means to support configuration in a backwards-compatible manner? Yes, it means some duplication of code, but it doesn't stifle innovation, is much simpler to support and is temporary.

I'm not sure to follow what you mean, could you explain further?

@kevin-bates
Copy link
Member

Yes, previous subclasses of KernelManager will have to subclass BlockingKernelManager instead.

OK. I guess we're talking about a major-release boundary here (e.g., 7.0) where any subclasses not updated or who haven't capped jupyter_client < 7.0 will have their consuming applications break.

Since we want to eventually have one implementation (async), what is the downside of a split-and-deprecate approach while using inheritance purely as the means to support configuration in a backwards-compatible manner? Yes, it means some duplication of code, but it doesn't stifle innovation, is much simpler to support, and is temporary.

I'm not sure to follow what you mean, could you explain further?

By split-and-deprecate I mean we do not have AsyncKernelManager invoke super().method-name on any methods - thereby 'splitting' the functional offering of each of the two classes. We also deprecate all KernelManager methods (not the class) with a timeframe for their removal.

Because the implementations are split and because we want to continue to innovate, new functionality - like Kernel Provisioning - can move forward on the async class while leaving the synchronous methods alone and not requiring proliferation of both behaviors - one of which we plan to abandon. Once we determine the appropriate deprecation cycle, we remove the KernelManager methods. But, until then, those applications that have not adopted the more modern technology will continue to work as they do today - no changes required.

We also wouldn't be introducing a new paradigm that runs every method of KernelManager on a separate thread. (There don't appear to be tests using the IOLoopKernelManager and IOLoopKernelRestarter subclasses - will they be affected - other than updating the former's superclass? )

I think we could use a one-time (or semi-regularly) maintainer's sync-up of sorts - open to all - where we discuss these issues and a roadmap, in general. Would that be something worthwhile to others?

@davidbrochart
Copy link
Member Author

Regarding breaking the API by renaming the current KernelManager to BlockingKernelManager, I think I can avoid that. I'm going to push a commit.
I'm still not sure we want to eventually get rid of the sync behavior. Async has advantages indeed, but it forces users to enter a whole new paradigm. To me, if we can offer both APIs without much effort, then we should.

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same issues will apply for KernelClient as well. Generally speaking, any methods called from within another method will not have their overridden methods called on subclasses. The aliasing trick breaks subclasses.

@@ -338,10 +380,15 @@ def start_kernel(self, **kw):

# launch the kernel subprocess
self.log.debug("Starting kernel: %s", kernel_cmd)
self.kernel = self._launch_kernel(kernel_cmd, **kw)
self.kernel = await self._async__launch_kernel(kernel_cmd, **kw)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks subclasses that override _launch_kernel() as they won't have their override called.

@@ -438,16 +485,16 @@ def shutdown_kernel(self, now=False, restart=False):
# Stop monitoring for restarting while we shutdown.
self.stop_restarter()

self.interrupt_kernel()
await self._async_interrupt_kernel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks subclasses that override interrupt_kernel() as they won't have their override called.


if now:
self._kill_kernel()
await self._async__kill_kernel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks subclasses that override _kill_kernel() as they won't have their override called.

else:
self.request_shutdown(restart=restart)
# Don't send any additional kernel kill messages immediately, to give
# the kernel a chance to properly execute shutdown actions. Wait for at
# most 1s, checking every 0.1s.
self.finish_shutdown()
await self._async_finish_shutdown()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks subclasses that override finish_shutdown() as they won't have their override called.

@@ -500,21 +554,23 @@ def restart_kernel(self, now=False, newports=False, **kw):
"No previous call to 'start_kernel'.")
else:
# Stop currently running kernel.
self.shutdown_kernel(now=now, restart=True)
await self._async_shutdown_kernel(now=now, restart=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks subclasses that override shutdown_kernel() as they won't have their override called during restarts.

@davidbrochart
Copy link
Member Author

I think the same issues will apply for KernelClient as well. Generally speaking, any methods called from within another method will not have their overridden methods called on subclasses. The aliasing trick breaks subclasses.

There should be a test for that.

@davidbrochart
Copy link
Member Author

davidbrochart commented Mar 12, 2021

I think we should be more explicit as to what is async and what is sync. If a subclass of KernelManager is async, then it should override the methods prefixed with _async_, e.g. _async__launch_kernel. If it is sync, then it should override e.g. _launch_kernel.
Maybe even the API of a kernel manager should be different, depending on whether it is async or not. I'm afraid a lot of people will forget to await the async API, if it has the same name as the sync API.

@kevin-bates
Copy link
Member

I'm still not sure we want to eventually get rid of the sync behavior. Async has advantages indeed, but it forces users to enter a whole new paradigm. To me, if we can offer both APIs without much effort, then we should.

Deprecation cycles can take years (kernel_cmd was deprecated 7 years ago). I'm just saying, to reasonably move forward, we should consider splitting the class implementations. As you're finding out, once subclasses exist in nature, it is extremely difficult to do anything. I'd rather we innovate on the async classes, allow folks to adopt the new innovations (or remain where they are if they desire), and move forward with async while the sync classes gather dust (but work).

@kevin-bates
Copy link
Member

kevin-bates commented Mar 12, 2021

I think we should be more explicit as to what is async and what is sync. If a subclass of KernelManager is async, then it should override the methods prefixed with async, e.g. _async__launch_kernel. If it is sync, then it should override e.g. _launch_kernel.
Maybe even the API of a kernel manager should be different, depending on whether it is async or not. I'm afraid a lot of people will forget to await the async API, if it has the same name as the sync API.

I agree - in a world where we don't have to continue supporting clients, this is all true. The reality is we have a contract to uphold and introducing this level of change breaks that contract - and, yes, it makes things extremely difficult to change.

If we're saying that we're free to completely rework our public contract whenever (and subclass authors be damned), then that's a violation of trust.

@davidbrochart
Copy link
Member Author

Regarding the sync API deprecation, I meant that I'm not sure it should happen at all. Having an async-only API is a pretty big constraint for users, as once they use it async/await propagates to their entire code base. I think a sync API is very useful and should stay forever.
I wouldn't say we are breaking a contract with clients, it's just the life of software: APIs change and sometimes there are backward-incompatible changes. BTW, overriding a private method in a class doesn't look like a good practice, and this may be a good time to improve this API, and document it.

@kevin-bates
Copy link
Member

Regarding the sync API deprecation, I meant that I'm not sure it should happen at all.

Yes, I know.

Having an async-only API is a pretty big constraint for users, as once they use it async/await propagates to their entire code base.

Agreed.

I think a sync API is very useful and should stay forever.

I think it would be good to discuss this. I think we're going to find it hard to adequately support both, but that depends on the kinds of changes we'll be making. I guess with this approach, new functionality can be added within the dual-class structures - so long as it is async. Is that a correct understanding? If so, that is great.

I wouldn't say we are breaking a contract with clients, it's just the life of software: APIs change and sometimes there are backward-incompatible changes.

Because we have documented, promoted, and allowed folks to subclass KernelManager and KernelClient - we are breaking a contract but that's what major release boundaries are for. I had proposed in the Kernel Provisioning proposal that it be delivered at a major boundary due to a hypothetical contract breakage (that has been extremely difficult to avoid - because it's a contract). I'm just not sure the addition of type hints warrants a major release.

BTW, overriding a private method in a class doesn't look like a good practice, and this may be a good time to improve this API, and document it.

Actually, subclass overrides of private-callable methods is exactly how you allow subclasses to influence their behavior on the class. It's important to distinguish between publically-callable methods - which is the contract for client applications and privately-callable methods - which amends the contract promised to clients extending your functionality.

From what I've read in Python, a class that is intended to be subclassed would use double-underscore prefixes on method names to indicate those methods are truly private (and triggers the name-mangling). Single-underscored methods are more like protected methods in Java and can be overridden. I added a number of these kinds of methods in #612 to allow third-parties to hook into things like parameter validation, setting up env, etc. It's actually a best practice because you're saying these methods are not callable from outside the package. The base framework calls these methods to drive the customization.

We even document that subclasses override _launch_kernel() to "launch kernel subprocesses differently".

I'm sorry for being so persistent about this and I do appreciate the discussion (and it's helping me see things). Thanks for spending time on this (and patience with me 😄 ).

Would you mind adding some description to the PR's description of the approach used (i.e., all sync methods are essentially placed onto a separate thread/event loop) so others don't need to glean this from the changes? You should also mention the API breakage so we can discuss what our release cycle would be if this were to be merged. If we're going to be introducing a major release - there are others we'll want to do like finally remove the 7-year-deprecated kernel_cmd, the old cleanup() method, etc. I'd also like to see Kernel Provisioning and Kernel Handshaking make it into the next major release update as well.

@davidbrochart
Copy link
Member Author

Thanks Kevin for the detailed explanation, and relevant background (as usual, and this is very appreciated as I lack those).
This PR is very much a work in progress, so don't worry if you see me breaking everything. I probably need to do so for my own understanding, but will probably have to put some things back in place for backward compatibility. Thanks anyway for the guidance in this process.
I will reply more in details later.

@davidbrochart
Copy link
Member Author

Thanks Kevin for taking the time to test and investigate. I could reproduce the bug in papermill's tests.
The issue comes from the fact that papermill, which uses nbclient under the hood to execute a notebook, uses an AsyncKernelClient. Some requests, such as kernel_info(), can wait for a reply when they are passed reply=True, or not (reply=False). Because the reply argument is passed at run-time, kernel_info() must be able to return either a message id (reply=False) or the reply message (reply=True), and thus must be an async function. Because the kernel client is async, the reply message has to be awaited (if the kernel client was blocking, we wait for the reply synchronously for you).
So this line in papermill has to be replaced with:

info_msg = self.wait_for_reply(await self.kc.kernel_info())

This is not without consequences, since now the whole function call stack leading to this line has to be async. If this is unacceptable, then kernel_info() can be called synchronously:

info_msg = self.wait_for_reply(run_sync(self.kc.kernel_info()))

@davidbrochart
Copy link
Member Author

davidbrochart commented Mar 24, 2021

I managed to fix the issue by having requests return synchronously when reply=False and asynchronously when reply=True, with an AsyncKernelClient. For a BlockingKernelClient, they always return synchronously.
Papermill's test suite passes with this change, could you give it a try @kevin-bates ?

@kevin-bates
Copy link
Member

Thanks David. I don't see a relevant issue or pull request in Papermill for this - do you know if something is in the works?

I guess this raises some concerns regarding breaking existing clients. However, I would imagine the only clients it would break would be those using AsyncKernelClient that do not call kc.kernel_info() synchronously - which are probably few. I'm assuming Voila is not experiencing this.

I thought I tried the synchronous call approach yesterday (using run_sync from nbclient) and it didn't work, but if that could be used by a client of AKC, then I suppose we could go forward here. (I just tried again using jupyter_client's run_sync and see the same (hang) issue. Are you able to get that to work?)

The alternative of forcing clients to plumb their call stack to be async may be too much to ask. (Not saying that Papermill wouldn't do that, just talking about clients in general.)

I see you've just posted an update while I was writing this response. I'll give this a shot. Thanks!

@kevin-bates
Copy link
Member

Works like a charm @davidbrochart - very nice! I also tried the more complicated scenario I referenced in which AKM and AKC subclasses come into play - ditto - like a charm!

@davidbrochart
Copy link
Member Author

Awesome, thanks a lot for your help Kevin.
Are we ready to squash and merge?

@kevin-bates
Copy link
Member

Are we ready to squash and merge?

I am - although I would like to see at least another maintainer review this due to its significant changes.

Thank you for all the work (and patience) on this - this is really outstanding (IMHO). You've essentially flipped the script by making async the basis on which we can move forward!

@davidbrochart
Copy link
Member Author

I will merge today, if no objection.

Copy link
Member

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@davidbrochart davidbrochart merged commit 8c81bd4 into jupyter:master Mar 26, 2021
@davidbrochart davidbrochart deleted the type branch March 26, 2021 07:36
@blink1073
Copy link
Member

🎉

@JohanMabille
Copy link
Member

🚀

@ccordoba12
Copy link
Contributor

Hey guys, this seems like a huge refactoring to be included in a patch release (6.1.13). Furthermore, it included breaking changes (see jupyter/qtconsole#476 and spyder-ide/spyder#15161), so a heads up to upstream projects would be nice for next time.

@davidbrochart
Copy link
Member Author

Hi Carlos, although there has been big internal changes, I was not expecting any change for the user. But it looks like I was wrong, sorry for the inconvenience.
I'm waiting for someone with PyPI admin rights to yank the release (it's already marked as broken on conda-forge), and I will release 6.2.

def __init__(self, context=None, session=None, address=None, loop=None):
def __init__(
self,
context: zmq.asyncio.Context,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the removal of the None default is what is breaking qtconsole, although this does imply they must not be using the HB channel since it unconditionally references through self.context in its _create_socket() method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the removal of the None default is what is breaking qtconsole

Thanks for your help @kevin-bates. I found that too but now we have another problem (see below).

although this does imply they must not be using the HB channel since it unconditionally references through self.context in its _create_socket() method.

It has something to do with the way the channel inherits from HBChannel and QObject and a re-implementation of super to make that work (I really don't understand that logic, @minrk implemented it).

Copy link
Contributor

@ccordoba12 ccordoba12 Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the error I'm seeing now while starting a kernel:

ERROR:tornado.general:Uncaught exception in ZMQStream callback
Traceback (most recent call last):
  File "/home/carlos/.virtualenvs/test-jupyter-client/lib/python3.8/site-packages/zmq/eventloop/zmqstream.py", line 434, in _run_callback
    callback(*args, **kwargs)
  File "/home/carlos/.virtualenvs/test-jupyter-client/lib/python3.8/site-packages/jupyter_client/threaded.py", line 101, in _handle_recv
    ident, smsg = self.session.feed_identities(msg)
  File "/home/carlos/.virtualenvs/test-jupyter-client/lib/python3.8/site-packages/jupyter_client/session.py", line 954, in feed_identities
    idx = msg_list.index(DELIM)
AttributeError: '_asyncio.Future' object has no attribute 'index'

Any ideas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also saw this @ccordoba12, it has to do with the threaded channels. Don't worry, I will clear all that out before the next release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Please give me a ping in case you want me to test your changes.

@ccordoba12
Copy link
Contributor

I'm waiting for someone with PyPI admin rights to yank the release (it's already marked as broken on conda-forge), and I will release 6.2.

Thanks for your help @davidbrochart!

@MSeal
Copy link
Contributor

MSeal commented Apr 8, 2021

6.1.13 is yanked on PyPI now as well

@vidartf
Copy link
Contributor

vidartf commented Apr 9, 2021

Just as an FYI, this change also breaks nbconvert 5.x branch, as it has no async wrapping around its get_msg call etc:

EDIT: Ah I see this was addressed in #637

@davidbrochart
Copy link
Member Author

You mean that it was reported in #637? And #638 should fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AsyncKernelClient.is_alive() returns non-awaited coroutine? Type Annotations
10 participants