Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible with jupyter #508

Closed
dmitriyshashkin opened this issue Nov 6, 2019 · 22 comments
Closed

Incompatible with jupyter #508

dmitriyshashkin opened this issue Nov 6, 2019 · 22 comments
Labels
bug Something isn't working concurrency Issues related to concurrency and usage of async libraries

Comments

@dmitriyshashkin
Copy link

import httpx
httpx.get('https://httpbin.org/get')

When run from Jupyter this snippet produces "Cannot run the event loop while another loop is running" error. Works fine in REPL. Apparently the use of asyncio conflicts with Jupters's own event loop.

httpx version: 0.7.6
os: Fedora 29
python: Python 3.6.9
ipykernel: 5.1.2
ipython: 7.8.0
jupyter: 5.4.1

@florimondmanca
Copy link
Member

Do you get the error when running these commands in an ipython shell (spawned directly from the command line)?

I'm on ipython==7.7, and cannot reproduce this issue on my side. Didn't try with Jupyter though.

Screenshot 2019-11-06 at 13 37 16

@florimondmanca
Copy link
Member

florimondmanca commented Nov 6, 2019

OK, I was able to reproduce. Details are below. The workaround right now is to take advantage of the fact that IPython 7+ supports running async/await code (source), and use an AsyncClient:

Screenshot 2019-11-06 at 13 50 00


Setup:

$ pip install jupyter
$ python -m jupyter --version
jupyter core     : 4.6.1
jupyter-notebook : 6.0.2
qtconsole        : 4.5.5
ipython          : 7.9.0
ipykernel        : 5.1.3
jupyter client   : 5.3.4
jupyter lab      : not installed
nbconvert        : 5.6.1
ipywidgets       : 7.5.1
nbformat         : 4.4.0
traitlets        : 4.3.3

$ python -m jupyter notebook

I created a notebook via the web UI with your sample script, and running it yields this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~/.pyenv/versions/3.8.0/lib/python3.8/site-packages/httpx/api.py in request(method, url, params, data, files, json, headers, cookies, auth, timeout, allow_redirects, cert, verify, stream, trust_env)
     80     with Client(http_versions=["HTTP/1.1"]) as client:
---> 81         return client.request(
     82             method=method,

~/.pyenv/versions/3.8.0/lib/python3.8/site-packages/httpx/client.py in request(self, method, url, data, files, json, params, headers, cookies, stream, auth, allow_redirects, cert, verify, timeout, trust_env)
    830         )
--> 831         return self.send(
    832             request,

~/.pyenv/versions/3.8.0/lib/python3.8/site-packages/httpx/client.py in send(self, request, stream, auth, allow_redirects, verify, cert, timeout, trust_env)
    869         }
--> 870         async_response = concurrency_backend.run(coroutine, *args, **kwargs)
    871 

~/.pyenv/versions/3.8.0/lib/python3.8/site-packages/httpx/concurrency/asyncio.py in run(self, coroutine, *args, **kwargs)
    259         try:
--> 260             return self.loop.run_until_complete(coroutine(*args, **kwargs))
    261         finally:

~/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py in run_until_complete(self, future)
    594         try:
--> 595             self.run_forever()
    596         except:

~/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py in run_forever(self)
    551         if events._get_running_loop() is not None:
--> 552             raise RuntimeError(
    553                 'Cannot run the event loop while another loop is running')

RuntimeError: Cannot run the event loop while another loop is running

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-2-62811d09cc75> in <module>
      1 import httpx
----> 2 httpx.get("https://httpbin.org")

~/.pyenv/versions/3.8.0/lib/python3.8/site-packages/httpx/api.py in get(url, params, headers, cookies, stream, auth, allow_redirects, cert, verify, timeout, trust_env)
    120     this function, as `GET` requests should not include a request body.
    121     """
--> 122     return request(
    123         "GET",
    124         url,

~/.pyenv/versions/3.8.0/lib/python3.8/site-packages/httpx/api.py in request(method, url, params, data, files, json, headers, cookies, auth, timeout, allow_redirects, cert, verify, stream, trust_env)
     79     """
     80     with Client(http_versions=["HTTP/1.1"]) as client:
---> 81         return client.request(
     82             method=method,
     83             url=url,

~/.pyenv/versions/3.8.0/lib/python3.8/site-packages/httpx/client.py in __exit__(self, exc_type, exc_value, traceback)
   1187         traceback: TracebackType = None,
   1188     ) -> None:
-> 1189         self.close()
   1190 
   1191 

~/.pyenv/versions/3.8.0/lib/python3.8/site-packages/httpx/client.py in close(self)
   1176         """
   1177         coroutine = self.dispatch.close
-> 1178         self.concurrency_backend.run(coroutine)
   1179 
   1180     def __enter__(self) -> "Client":

~/.pyenv/versions/3.8.0/lib/python3.8/site-packages/httpx/concurrency/asyncio.py in run(self, coroutine, *args, **kwargs)
    258             self._loop = asyncio.new_event_loop()
    259         try:
--> 260             return self.loop.run_until_complete(coroutine(*args, **kwargs))
    261         finally:
    262             self._loop = loop

~/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py in run_until_complete(self, future)
    593         future.add_done_callback(_run_until_complete_cb)
    594         try:
--> 595             self.run_forever()
    596         except:
    597             if new_task and future.done() and not future.cancelled():

~/.pyenv/versions/3.8.0/lib/python3.8/asyncio/base_events.py in run_forever(self)
    550             raise RuntimeError('This event loop is already running')
    551         if events._get_running_loop() is not None:
--> 552             raise RuntimeError(
    553                 'Cannot run the event loop while another loop is running')
    554         self._set_coroutine_origin_tracking(self._debug)

RuntimeError: Cannot run the event loop while another loop is running

It does seem to be a Jupyter-specific issue, since I don't get it when running with IPython only.

Kind of a blocker for data science folks in particular.

We could use the existing asyncio event loop in httpx.Client by default, but running .run_until_complete() would raise an error (something like "this event loop is already running" I think). So, not sure exactly at how to solve this right now.

@euri10
Copy link
Member

euri10 commented Nov 6, 2019 via email

@florimondmanca
Copy link
Member

I think this is a more general issue — what happens when people try to use the sync API in an async context? Generally, I'd argue that shouldn't be done, but there are times (like in Jupyter) when users don't know (and shouldn't need to know!) that they actually are in an async context. Which makes me think we might want to accomodate that use case.

@euri10

you can use nest asyncio and plug to the running loop

Could you expand on this?

@euri10
Copy link
Member

euri10 commented Nov 6, 2019 via email

@florimondmanca
Copy link
Member

florimondmanca commented Nov 6, 2019

@euri10 I think nest_asyncio might work in some cases, but it's probably not the most sustainable solution for us and our users.

So, I looked a bit into this, and as funny as it seems, running a coroutine and waiting for it to finish without using await in an async context seems like an incredibly difficult task. There's asyncio.run_coroutine_threadsafe(), but it has to be called in a different thread than the one the loop is currently running in, otherwise it just won't run (i.e. the returned future times out).

Anyway, the core truth here is that asyncio won't let you have multiple running event loops in a given thread. For this reason, what we currently do — i.e. replace the running loop with a new one:

def run(
self, coroutine: typing.Callable, *args: typing.Any, **kwargs: typing.Any
) -> typing.Any:
loop = self.loop
if loop.is_running():
self._loop = asyncio.new_event_loop()
try:
return self.loop.run_until_complete(coroutine(*args, **kwargs))
finally:
self._loop = loop

just can't work, as we're trying to force asyncio to run multiple loops, which it wasn't designed for.

So… if we can't properly handle that case, how about we just fail in that case? And tell users to use AsyncClient, because that's what they should be doing if they're in an async context anyway.

Thoughts @tomchristie?

@florimondmanca
Copy link
Member

florimondmanca commented Nov 6, 2019

Honestly though, I think this is a bug in Jupyter itself (more precisely in ipykernel, the IPython kernel for Jupyter).

What makes me think this is that IPython doesn't have any issues with running our sync APIs. In fact, if you open an IPython shell and run:

import asyncio
loop = asyncio.get_event_loop()
loop.is_running()

Then you'll get False, while running the same in a Jupyter cell returns True. The approach taken to support async/await doesn't seem to be the same in both projects, unfortunately…

@euri10
Copy link
Member

euri10 commented Nov 6, 2019

yep, using nest_ayncio feels more like a hack

iirc, but this is old in my mind, there are 2 ways to determine the current running loop, asyncio.get_event_loop or asyncio.get_running_loop

the main difference is that asyncio.get_running_loop returns the currently running loop if there is one and raise RuntimeError if there is none while asyncio.get_event_loop does a lot of checks and long story short instantiates a loop only if you are on the main thread.

So in that Jupyter case you are not on the main thread and the RuntimeError is raised.

now reading the use of the asyncio.new_event_loop() abovl don't you have to do asyncio.set_event_loop(self._loop) just after or the event loop policy was already initialized, I'm not falimirar with all internals of course?

@euri10
Copy link
Member

euri10 commented Nov 6, 2019

@florimondmanca I switched to Ipython just to test and I got the same RuntimeError as in Jypyter

I think this is an issue of importance for the day to day life, at least mine because it's tightly coupled with how I code / debug, sorry in advance if the below is a little bit long, it's just to put some context and maybe you'll tell me my workflow sucks, which is possible >_)

I code my apps in containers having most of the time a container X and its child container test-X that is the same but has all the test / format / lint / typing packages added

In that test-X container I also have in particular a Jupyter / Ipython terminal installed, so that when I hit a breakpoint in my code, I have completion, I can on the fly inspect variables, see methods, etc, AND last and not least I can try await statements.

I ended up using the nest_asyncio hack being fed up that have to put loggers every time I wanted to used await database.execute(my_failing_query) because of that RuntimeError, see log below.

Indeed it's way faster to adapt your query on the fly, in a test, on a breakpoint, than to reload the test container, let the debugger go up to where the query should execute, change small parts of the query, log the result and see it failed, relaunch container, rinse, repeat.

With the nest_asyncio hack I can on this breakpoint change my query, hit the db, see the result, change it again etc etc, it's way faster this way.
I wish I could do it without having to think about using nest_asyncio which put another dependency in my test_X container only, but sill, if there was a less involved situation that'd be good.

Connected to pydev debugger (build 192.7142.42)
backend-tests_1_98b1a91e5016 | pydev debugger: warning: trying to add breakpoint to file that does not exist: /home/lotso/.local/share/virtualenvs/sfjapi-pbHOij-q/lib/python3.7/site-packages/sentry_sdk/api.py (will have no effect)
backend-tests_1_98b1a91e5016 | ============================= test session starts ==============================
backend-tests_1_98b1a91e5016 | platform linux -- Python 3.7.3, pytest-5.2.2, py-1.8.0, pluggy-0.13.0
backend-tests_1_98b1a91e5016 | rootdir: /app
backend-tests_1_98b1a91e5016 | plugins: lazy-fixture-0.6.1, asyncio-0.10.0
backend-tests_1_98b1a91e5016 | collected 87 items
backend-tests_1_98b1a91e5016 | 
backend-tests_1_98b1a91e5016 | tests/test_internal.py 2019-11-06 15:12:34 INFO  [timing_asgi.middleware:39] ASGI scope of type lifespan is not supported yet
import sys; print('Python %s on %s' % (sys.version, sys.platform))
backend-tests_1_98b1a91e5016 | Python 3.7.3 (default, Mar 27 2019, 23:51:31) 
backend-tests_1_98b1a91e5016 | Type 'copyright', 'credits' or 'license' for more information
backend-tests_1_98b1a91e5016 | IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.
backend-tests_1_98b1a91e5016 | PyDev console: using IPython 7.5.0
backend-tests_1_98b1a91e5016 | 
backend-tests_1_98b1a91e5016 | Python 3.7.3 (default, Mar 27 2019, 23:51:31) 
backend-tests_1_98b1a91e5016 | [GCC 6.4.0] on linux
In[2]: await database.fetch_one("SELECT 1")
backend-tests_1_98b1a91e5016 | Traceback (most recent call last):
backend-tests_1_98b1a91e5016 |   File "/usr/local/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2874, in _run_cell
backend-tests_1_98b1a91e5016 |     return runner(coro)
backend-tests_1_98b1a91e5016 |   File "/usr/local/lib/python3.7/site-packages/IPython/core/async_helpers.py", line 27, in __call__
backend-tests_1_98b1a91e5016 |     return asyncio.get_event_loop().run_until_complete(coro)
backend-tests_1_98b1a91e5016 |   File "/usr/local/lib/python3.7/asyncio/base_events.py", line 571, in run_until_complete
backend-tests_1_98b1a91e5016 |     self.run_forever()
backend-tests_1_98b1a91e5016 |   File "/usr/local/lib/python3.7/asyncio/base_events.py", line 526, in run_forever
backend-tests_1_98b1a91e5016 |     raise RuntimeError('This event loop is already running')
backend-tests_1_98b1a91e5016 | RuntimeError: This event loop is already running
In[3]: import nest_asyncio
In[4]: nest_asyncio.apply()
In[5]: await database.fetch_one("SELECT 1")
backend-tests_1_98b1a91e5016 | Out[4]: <databases.backends.postgres.Record at 0x7fc158e7beb8>

@florimondmanca
Copy link
Member

Thanks for explaining your use case in detail @euri10. As I understand, the issue is not particularly tied to HTTPX then, is it? I assume the breakpoint is inside an async function, and IPython's async/await support collides with your own async environment as it tries to run its own thing on the event loop…

@euri10
Copy link
Member

euri10 commented Nov 6, 2019

yep that's a perfect summary @florimondmanca, it's not tied to httpx, or databases, it's more like a general "how to use await in a debugging session when you hit the breakpoint in an async function with an already running loop" 🤤

and I have no idea if it's the same use case as OP (seems slightly different and sorry to hijack) and if more importantly the libraries can do something about it !

@florimondmanca
Copy link
Member

florimondmanca commented Nov 6, 2019

and I have no idea if it's the same use case as OP

It seems to me the core issue is the same:

  • @dmitriyshashkin is having HTTPX run .run_until_complete() inside an async context (setup by Jupyter)
  • You are having IPython run .run_until_complete() inside an async context (your Databases-based program).

The difficulty is that ideally HTTPX and IPython should be able to detect if they're running in an async context, and in that case use await instead of run_until_complete() to run the coroutine. But that means that the enclosing function has to be async to handle all cases — and that often can't be the case. For example, httpx.get() is a plain synchronous call.

This is why I might be inclined towards failing loudly if we detect we're inside a running event loop, and hint our users to use AsyncClient to avoid any issues.

@tomchristie
Copy link
Member

This is why I might be inclined towards failing loudly if we detect we're inside a running event loop, and hint our users to use AsyncClient to avoid any issues.

Right, although it turns out there's a category of cases where you are inside a running event loop, but you don't actually want to know or care about that, and do want to be able to make bog-standard sync calls. (It's not ideal that you have to use httpx.AsyncClient inside the ipython repl, but you have to use httpx.Client inside the standard python repl.)

If we can just use the "current or new" event loop, and not treat it as an error then that's probably the right thing to do. (I'd assumed it'd be better to error in that case, on the assumption that it'd always be a programmer error anyways, but that's not actually the case.)

@florimondmanca
Copy link
Member

I do agree with you Tom — being able to handle both cases would be ideal. But it doesn’t seem possible to me. The only thing you can do with a running loop in a synchronous context is schedule a new task. But AFAICT from a sync function, you can’t tell a running loop “hey, run this coroutine for me and I’ll wait for the result”, which is exactly what we’d want to do in the backend’s run() method. In other words, we can’t await in a sync function.

(I guess it’s the kind of technical limitation that only gets tangible when we try to actually change the code and see that it doesn’t work.)

@sethmlarson
Copy link
Contributor

Just to float the idea, what sort of pain would it cause (besides more complex implementation for us) to make the SyncClient not use any async internally? Dispatchers would be split, middlewares would be split. What else?

@tomchristie
Copy link
Member

Careful, or I might start advocating that sync HTTP requests in Python are already well supported via requests, and that httpx should only target the async case. 😇

@pquentin
Copy link
Contributor

pquentin commented Nov 7, 2019

For what it's worth, this is one advantage of the code generation approach we use in the urllib3 fork: our sync code is really sync and only uses the standard library for the actual I/O. 99% of the code is shared, only the calls to the specific I/O libraries are different.

@pquentin
Copy link
Contributor

pquentin commented Nov 8, 2019

@tomchristie said that he liked that we learned from each other, so I'd like to explain my point above in more detail to explain why it's actually relevant! It's not actually the code generation approach that's important here, but the fact that we're using the standard library for I/O in the sync case, and not asyncio.

You can keep the async/await annotations and just rely on the fact that all calls will be secretly sync, and wrap the sync API with run_secretly_sync_async_fn (stolen from python-trio/hip#1 (comment)):

def run_secretly_sync_async_fn(async_fn, *args):
    coro = async_fn(*args)
    try:
        coro.send(None)
    except StopIteration as exc:
        return exc.value
    else:
        raise RuntimeError("you lied, this async function is not secretly synchronous")

The reason we're not doing this in our urllib3 work is because 1/ we want to support Python 2.7 and 2/ wrapping a large public API with run_secretly_sync_async_fn is actually cumbersome, and adds an unrelated function to the call stack.

But it avoids both code generation and an hidden asyncio loop!

@tomchristie
Copy link
Member

Ohh, that’s really neat!

@florimondmanca
Copy link
Member

florimondmanca commented Nov 9, 2019

@pquentin Super interesting stuff, thanks!

If we think about this in terms of our concurrency backend API, we could:

  1. Create a SyncBackend implemented via sync I/O, eg socket et. al.
  2. Use the trivial event loop as the backend’s run() implementation, ensuring that there are no async breakpoints in the backend’s implementation.
  3. Have Client use the SyncBackend in all cases (as opposed to asyncio today). That way, no need to get caught into issues that eg Allow using Client in an async environment #513 attempts to resolve, because those issues would be gone (eg we’d be sure that the sync API performs absolutely no interaction with an async framework like asyncio, preventing potential event loop clashes).

Am I getting this right? 😄 Quite excited about this actually!

@pquentin
Copy link
Contributor

pquentin commented Nov 9, 2019

Yes, I think that would work! Here is our sync backend: https://github.com/python-trio/urllib3/blob/bleach-spike/src/urllib3/_backends/sync_backend.py. It is complete enough to get the urllib3 test suite to pass

@florimondmanca florimondmanca added bug Something isn't working concurrency Issues related to concurrency and usage of async libraries labels Nov 16, 2019
@tomchristie
Copy link
Member

tomchristie commented Nov 27, 2019

No longer relevant, given #544

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working concurrency Issues related to concurrency and usage of async libraries
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants