Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asyncio Python 3.8 #85

Open
MSeal opened this issue Jun 24, 2020 · 13 comments · May be fixed by #139
Open

Asyncio Python 3.8 #85

MSeal opened this issue Jun 24, 2020 · 13 comments · May be fixed by #139

Comments

@MSeal
Copy link
Contributor

MSeal commented Jun 24, 2020

See this related issue: nteract/papermill#515, but essentially the async implementation on windows 3.8 uses a different implementation that doesn't include all the same functions. This means pyzqm barfs on loop actions. We likely need to patch nbclient in a similar way? I know this should only really be set on entrypoint but from testing setting it multiple times works just fine and the reality is there is no other event loop policy right now that we'd be worried about overwriting.

@jhamrick
Copy link
Member

jhamrick commented May 6, 2021

Hi! I'm in the process of trying to update nbgrader to be compatible with nbconvert v6, and running into this issue. In particular, the workaround that everyone seems to be using doesn't fully work for nbgrader, for reasons I'm still not entirely clear on but I will do my best to explain:

nbgrader has 'autograde' and 'validate' functionality, which use nbconvert's execute preprocessor to execute the notebook for grading or validation. These obviously fail to work unless I use the workaround. I can add it into an entrypoint somewhere and that works fine.

What breaks down is that nbgrader also has a few notebook server extensions for running autograding and validation from the notebook interface itself. With the workaround that worked in the previous step, the notebook process hangs and will not serve anything up (if you go to the browser, it will eventually just time out). I was surprised at first because I would have thought the notebook would need this same patch, but after some sleuthing I discovered this is because Tornado 6.1 now supports the proactor event loop and so the notebook doesn't patch it in that case. So I guess there's something where I'm trying to set a different event loop than what Tornado wants to use, and this causes it to hang.

So as far as I can tell, there is currently an incompatibility with running anything using nbclient as a server extension for the notebook, since nbclient needs to use the WindowsSelectorEventLoopPolicy while Tornado needs to use the WindowsProactorEventLoopPolicy. Is my assessment here correct, or have I missed something? (To be honest, I don't really understand the underlying issue of what an event loop policy is anyway, so it's very possible I'm not understanding something).

If I've got it right and there is such an incompatibility, then I'm at a loss for how to move forward with this in nbgrader. I'm really hoping someone here can help shed some light on this!

Reference issue on nbgrader repo: jupyter/nbgrader#1421

@MSeal
Copy link
Contributor Author

MSeal commented May 7, 2021

Hello @jhamrick !

As I recall, yes the root problem is that WindowsProactorEventLoopPolicy (the default on windows python 3.8+) doesn't implement all of the async methods which causes pyzmq to fail when trying to communicate with the kernel. But Tornado had it's own async setup before Python fully established one, so there's some conflict of approach in a few low level places like this. In so far I'm not aware how notebook server got around these issues with pyzmq. Maybe there's a way now to fix this on the nbclient side that I'm not aware other than the monkey patch that's not working for the extension situation. I believe @minrk would be the best person to know the answer to that?

@jhamrick
Copy link
Member

jhamrick commented May 8, 2021

Thanks @MSeal for clarifying (and confirming my conclusions 😄 ).

@minrk if you have any thoughts about how we might be able to fix this on the nbclient side, or otherwise make it so that you can run code that depends on nbclient from the notebook server, that would be fantastic!

@minrk
Copy link
Member

minrk commented May 9, 2021

since nbclient needs to use the WindowsSelectorEventLoopPolicy while Tornado needs to use the WindowsProactorEventLoopPolicy

Everything should still work if WindowsSelectorEventLoopPolicy is used. Tornado 6.1 added workaround support for the default Proactor event loop by also running a selector event loop in another thread if proactor is detected, but I don't think it added a requirement to run with proactor. Indeed, tornado should still behave better when not using proactor due to the singular thread. The change in 6.1 should only mean that tornado works in both cases, not that it requires (or even prefers) proactor. The question remains: how to do that since the notebook server doesn't do it automatically anymore.

I think maybe it was a mistake to disable the selector patch in the notebook with tornado 6.1, since I think behavior will actually still be better with selector than proactor in all of our tornado-based applications. In the absence of that, extensions need some ability to register the asyncio policy early enough that it's called before the notebook app's (or any other extension's) first get_event_loop().

My guess is the hang mentioned here from attempting to reinstate the selector policy is not an incompatibility, but rather a 'too late' issue - the policy is switched after something has already been hooked up to the first event loop instance, meaning some things have been connected to an event loop instance that never runs.

I opened PRs to restore the preference for SelectorEventLoop:

I'm also going to explore piggy-backing on tornado's reader features in zmq.asyncio which is where nbclient (really jupyter_client) is trying to hook up pyzmq to asyncio directly and having an issue. I think I need to be careful with some threadsafety issues doing that, since some methods are short-cut and might need to be handed off to the io thread to avoid crashes.

@jhamrick
Copy link
Member

Thanks so much for looking into this @minrk ! That makes sense about it being a 'too late' issue, since I suppose I was setting it after the notebook server is initialised (when the extensions are loaded).

Do you think once those PRs are merged they could go into bugfix releases, so that we can unpin the dependency on nbconvert 6 in nbgrader? Or do you have any suggestions for workarounds I could use in the meantime?

@minrk
Copy link
Member

minrk commented May 11, 2021

I think we should do a patch release with those, yeah.

do you have any suggestions for workarounds I could use in the meantime?

Hmm. When I try to test, setting the event loop policy in an extension does work because NotebookApp and ServerApp don't request handles on the eventloop until start at the very end (this was intentional to allow folks to override our policy in extensions when we started applying the asyncio patch). So the workaround I would have suggested was to set the event loop policy in your load_jupyter_server_extension(). This only works if no other extensions request a handle on the event loop before yours, though. I see you added the policy in nbgraderapp, but the extensions seem to load from baseapp.NbGrader, so I'm not sure where you added the patch necessarily fires in the extension case. I would add the patch to NbGrader.initialize here so that it runs on load_jupyter_server_extension here, rather than relying on import-time which can have weird caching / ordering issues.

@jhamrick
Copy link
Member

Ah hmm interesting, I did try a version that had the patch in initialize, and that was still causing it to hang 🤔 ... but I guess if only one extension can try to set it, that might be the issue---there are two nbgrader extensions which would presumably try to patch the loop (formgrader and validate). But perhaps I can check to see if it's set already, and only patch it if it's not. I'll take another look at this in the next few days and see what I can figure out. Thanks for looking into this and for the suggestion!

@minrk
Copy link
Member

minrk commented May 13, 2021

The patch in notebookapp ought to work, I think. You could perhaps get an additional check when applying the patch to see if the event loop is already in use, but I'm not sure if there's an API for that (asyncio.get_event_event_loop() always returns an event loop, there isn't a public API I know of to ask if it has been created already).

This check might work at least to make it clear that the event loop cannot be used:

                if type(asyncio.get_event_loop_policy()) is WindowsProactorEventLoopPolicy:
                    # close the existing proactor loop so if anyone has a handle on it they get an error instead of hanging.
                    # there doesn't appear to be a way to ask if this has been accessed already, so this
                    # will usually create a new loop and close it immediately
                    proactor_loop = asyncio.get_event_loop()
                    proactor_loop.close()

                    # prefer the pre-3.8 default of Selector with add_reader support
                    asyncio.set_event_loop_policy(WindowsSelectorEventLoopPolicy())

@minrk
Copy link
Member

minrk commented May 13, 2021

Further detail on a source of hangs caused by setting the policy multiple times: setting the event loop policy always effectively clears the 'current' event loop (which is stored on the policy instance), so asyncio.get_event_loop() will return a different loop before/after, even if the event loop policy class is the same. A recipe for hangs:

  1. extension A calls asyncio.set_event_loop_policy(WindowsSelectorEventLoopPolicy())
  2. extension A calls self.loop = asyncio.get_event_loop() gets SelectorEventLoop 1
  3. extension B calls the same asyncio.set_event_loop_policy(WindowsSelectorEventLoopPolicy())
  4. extension B calls asyncio.get_event_loop(), gets SelectorEventLoop 2
  5. main app calls asyncio.get_event_loop(), gets SelectorEventLoop 2 and starts it running
  6. extension A does self.loop.do_something() # never runs!

Any part of extension A that re-uses its handle on SelectorEventLoop 1 will hang because that loop is never running. Adding a condition like the above to set the policy only if necessary should mean that you only set the policy once. Further, adding asyncio.get_event_loop().close() prior to any change of policy should make informative errors a bit more likely when anyone does have a handle on the replaced loop, instead of hanging (hanging is still not impossible, I think, depending on what might have been hooked up to that loop). It won't be any less broken, but hopefully it will help track down what's wrong. If all they've done is set up listeners and don't call any methods on the loop later, it will still hang without feedback, though.

@minrk
Copy link
Member

minrk commented May 13, 2021

If all they've done is set up listeners and don't call any methods on the loop later, it will still hang without feedback, though.

Aha, I did find a hang in both notebookapp and jupyter_server that are exactly that - handles are grabbed on the event loop prior to initializing extensions (in TerminalManager.__init__ and HTTPServer.listen). That effectively means extensions cannot set the policy without a refactor of the App initialize methods to avoid creating handles on a loop that will never start.

PRs to fix those:

@choldgraf
Copy link
Contributor

Is this one now resolved?

@csantanaes
Copy link

i have a problem !! help. jupyter book not work in python 3.8
image
help !!!

@jbeanland
Copy link

@choldgraf
This seems to be fixed upstream now from Tornado 6.1 and pyzmq 22.1.0, at least I can now successfully build with jupyter-book on Windows and cpython 3.9. It looks like it was all down to Tornado's fix of the asyncio ProactorEventLoop change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants