Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve asyncio integration #1041

Closed
wants to merge 15 commits into from
Closed

Improve asyncio integration #1041

wants to merge 15 commits into from

Conversation

arvidfm
Copy link

@arvidfm arvidfm commented May 2, 2014

These changes make it easier to use asyncio features from Tornado applications. The additions include:

  • The wrap_asyncio_future and wrap_tornado_future functions for wrapping an asyncio.Future in a tornado.concurrent.Future and the other way around
  • A @task decorator that makes it possible to use asyncio coroutines with for example tornado.web.RequestHandler.
  • An __iter__ method in tornado.concurrent.Future that makes it possible to use yield from with Tornado futures from asyncio coroutines. The future will be automatically wrapped in an asyncio.Future.
  • A get_asyncio_loop method in BaseAsyncIOLoop to explicitly expose the asyncio event loop used by the Tornado IOLoop. This is especially useful with AsyncIOLoop since it creates a new event loop object otherwise not accessible outside the IOLoop object.

The most intrusive change here is probably the addition of the __iter__ method in tornado.concurrent.Future. The reasoning here is that you wouldn't normally try to iterate over futures and that the yield from syntax is sufficiently associated with asyncio that it makes sense to reserve it for use with asyncio.

Caveat: Since asyncio futures can be cancelled, wrap_tornado_future tries to account for this by running set_exception with a CancelledError when the asyncio.Future that wraps the Tornado Future is cancelled. This will however probably not work as expected seeing as the standard Tornado Future doesn't support being cancelled, so I would recommend avoiding trying to cancel any futures returned by wrap_tornado_future.

Example:

import asyncio
import subprocess

import tornado.httpclient
import tornado.web
import tornado.platform.asyncio

class RequestHandler(tornado.web.RequestHandler):
    lock = asyncio.Lock()

    @tornado.platform.asyncio.task
    def get(self):
        self.write("Hello")
        yield from self.some_function()
        res = yield from tornado.httpclient.AsyncHTTPClient().fetch("http://www.tornadoweb.org")
        print("Got response:", res)
        self.write(" World!")

    @asyncio.coroutine
    def some_function(self):
        yield from RequestHandler.lock # only allow one user every two seconds
        yield from asyncio.sleep(2)
        proc = yield from asyncio.create_subprocess_exec("ls", "/", stdout=subprocess.PIPE)
        self.write((yield from proc.communicate())[0])
        RequestHandler.lock.release()


tornado.platform.asyncio.AsyncIOMainLoop().install()
application = tornado.web.Application([
    (r"/?", RequestHandler)
])
application.listen(8080)

loop = asyncio.get_event_loop()
loop.run_forever()

@arvidfm
Copy link
Author

arvidfm commented May 2, 2014

It seems that sphinx-build for Python 2 fails when trying to parse tornado.platform.asyncio.py; I was testing with sphinx-build for Python 3 myself. It seems to me that the best solution would be to switch to building the documentation using the Python 3 version of sphinx-build; alternatively, the last commit refactoring the documentation can be reverted and the newly added docstrings moved to asyncio.rst.

@facebook-github-bot
Copy link

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla - and if you have received this in error or have any questions, please drop us a line at cla@fb.com. Thanks!

@bdarnell
Copy link
Member

bdarnell commented May 3, 2014

This change needs tests.

We have limited control over the sphinx environment on readthedocs.org (they do have a python 3 option in beta now, but it's project-wide so it might be tricky dealing with the older doc versions, and in any case such an intrusive change should be made on its own instead of incidental to some other change), so for now the docs for asyncio-related functionality should just go in the rst file instead of using autodoc.

_copy_future_state looks like it could be replaced with tornado.concurrent.chain_future.

I had been thinking that we'd support asyncio Futures directly in tornado.gen by adding asyncio.Future to tornado.concurrent.FUTURES. Do you see any reason that wouldn't work?

I'm unsure about tornado.platform.asyncio.task as a decorator. It feels like it's conflating decisions about the inner workings of a function and how it's used (i.e. it might be better to just use @asyncio.coroutine as the decorator even if it requires all callers to use a wrapping function). The name task is also odd, since it doesn't seem to be similar to either asyncio.Task or tornado.gen.Task. As a decorator it is most similar to the two @coroutine decorators, but I'm not sure how to name it to capture what it's doing.

Finally, while I'd be fine with adding a __iter__ to tornado.concurrent.Future that does something similar to what asyncio.Future.__iter__ does (essentially yield self), it feels wrong for that to also convert from a Tornado Future to an asyncio Future.

@arvidfm
Copy link
Author

arvidfm commented May 3, 2014

Yes, I'd completely missed the existence of tornado.concurrent.chain_future, thanks. It doesn't seem as though it will result in any problems with the cancel check either, since it returns if the Future is already done.

My initial approach was to have the tornado.concurrent module automatically use asyncio.Future if it was available, but the fact that asyncio.Future requires an asyncio event loop object, and crucially uses the event loop's call_soon to schedule callbacks, results in all sorts of questions about what to do when running on Python 3.4+ but without using asyncio. So I just scrapped the plan completely and just wrapped the futures both ways. But you're right, I don't see any reason why Tornado couldn't handle asyncio futures created by the user or asyncio itself even if it sticks to using the default tornado.concurrent.Future for internally-created futures. Any tornado.concurrent.Future objects still need to be wrapped for use with asyncio though, as asyncio explicitly checks that the yielded object is an asyncio.Future.

The @task name comes from a decorator that was previously present in tulip (asyncio), and it works identically. Note that if asyncio.coroutine is called with a function that's already a coroutine, the function object will be returned as is.

The point of the __iter__ function is much like the @task decorator to make it more seamless to use asyncio with Tornado. In my opinion it doesn't presently make sense to use yield from with Future objects unless you're using asyncio, and having the Future automatically wrap itself in an asyncio.Future means that there's no need to differentiate between Tornado and asyncio futures in asyncio coroutines.

@bdarnell
Copy link
Member

bdarnell commented May 9, 2014

A decorator named tulip.task used to exist, but it was deliberately removed, so that's not a great comparison. Your proposed task decorator is not similar to either of the two Task classes that currently exist.

For __iter__, it makes just as much sense for tornado Futures to implement __iter__ as something akin to yield self as it did for asyncio to do that. A tornado app that targets Python 3.3+ may want to use yield from with no decorators for performance reasons, and __iter__ on Futures is a convenient way to make a function that works with both yield and yield from. So we shouldn't just say that "yield from" switches into asyncio mode.

@arvidfm
Copy link
Author

arvidfm commented May 9, 2014

That's true, I mostly just wanted to explain the motivation behind it. I don't know about similarity, but there's certainly a connection to asyncio.Task in that the whole point of the decorator is to make it so that when an asyncio coroutine is called an asyncio.Task is returned instead of a generator object. I'd say that @task is similar to @gen.coroutine in that @gen.coroutine constructs a Runner, which is similar in function to asyncio.Task, and returns a Future much like @task (though in the case of @task the Future returned is the asyncio.Task object itself), but dissimilar to @asyncio.coroutine which doesn't really do anything except set an _is_coroutine attribute to True on the function object. One option could be to rename it @coroutine and rely on the namespace to resolve the ambiguity; another option might be to call it @asyncio_coroutine to remove the ambiguity completely. Other than that I don't really have any ideas for new names that might fit.

The thing about allowing the yield from paradigm for use with standard Tornado is that I see limited use for it as there would be no straightforward way to call undecorated generator functions outside of coroutines. Right now it's possible to call any function decorated with @gen.coroutine and have it return a Future, but Tornado lacks a function corresponding to asyncio.async for taking a generator object and constructing a Runner from it. Currently all the logic for constructing a Runner takes place inside _make_coroutine_wrapper which expects a generator function rather than a generator object, and it's similarly hard to create a Runner manually since it requires you to have already processed the first yielded value. The best you could do is basically something like gen.coroutine(generator_function)(arg1, arg2, ...).

Of course, this can be easily solved by just moving the logic on line 200-221 in gen.py to a separate function returning a Future, and if that's done then yes, you're right in that the yield from paradigm can be a useful way to avoid having to construct a new Runner for every call to other coroutines. Though a problem with implementing __iter__ in tornado.concurrent.Future for use with yield from is that a return statement in a generator function is a syntax error prior to Python 3.3, and given that concurrent.py needs to be parsable by earlier Python versions, simply running return self.result() is a no-go. A possible solution would be to manually raise a StopIteration, i.e. raise StopIteration(self.result()).

However, I still feel that it's important to enable use of yield from with futures even from asyncio coroutines, to avoid having to manually wrap every single Future object returned by Tornado methods. I can think of a few ways to go about doing this:

  • Assume that the user intends to use asyncio coroutines if they import tornado.platform.asyncio and modify the behaviour of Future.__iter__ from there -- super bad idea; importing modules shouldn't have that kind of side effect, and either way this is just naive guesswork that's certain to cause problems.
  • Add a method to tornado.concurrent.Future that the user can call to tell Future to wrap itself when running __iter__, perhaps by supplying the function that should be used to perform the wrapping -- this feels overly odd and specific for a Future method.
  • Make tornado.concurrent.Future inherit asyncio.Future whenever it's available to trick asyncio into thinking that it's an asyncio.Future, but override all its methods -- this is kind of a hack, and it's easy to miss some detail that might cause problems somewhere (for example, some code in asyncio might depend on the existence of a specific attribute). It's also not future-proof, since new methods may later be added to asyncio.Future.
  • Make tornado.concurrent.Future configurable, making it possible to explicitly replace the default Future implementation with either asyncio.Future or one that inherits from asyncio.Future -- I'm leaning towards this option.

A problem with the fourth option is that tornado.util.Configurable doesn't support duck typing, instead requiring classes to derive from the base class. In addition, Tornado relies on the existence of exc_info and set_exc_info for any internally-created Future objects. This could be solved using multiple inheritance and subclassing asyncio.Future, providing the missing methods:

# concurrent.py
class Future(tornado.util.Configurable):
  @classmethod
  def configurable_base(cls):
    return Future

  @classmethod
  def configurable_default(cls):
    return _Future

class _Future(Future):
  # what was previously tornado.concurrent.Future goes here
  ...

# asyncio.py
class AsyncIOFuture(tornado.concurrent.Future, asyncio.Future):
  def __init__(self, *, loop=None):
    asyncio.Future.__init__(self, loop=loop)

  def exc_info(self):
    try:
      exc = self.exception()
      return type(exc), exc, exc.__traceback__
    except (AttributeError, asyncio.InvalidStateError):
      return None

  def set_exc_info(self, exc_info):
    self.set_exception(exc_info[1]) # exception object stores traceback

# main.py
tornado.concurrent.Future.configure('tornado.platform.asyncio.AsyncIOFuture')

What do you think? Sorry this got overly long, I've been thinking about how best to approach this issue for a few days now.

@arvidfm
Copy link
Author

arvidfm commented May 9, 2014

To expand a bit on the connection between @task and asyncio.Task, this:

@tornado.platform.asyncio.task
def func():
  yield from asyncio.sleep(2)
  print("Finished running")

t = func()

Is equivalent to:

@asyncio.coroutine
def func():
  yield from asyncio.sleep(2)
  print("Finished running")

t = asyncio.Task(func())

So I do think there is a similarity; we're basically moving the asyncio.Task() call to a decorator. Perhaps it would be better to make it more explicit, so that you would have to write:

@tornado.platform.asyncio.task
@asyncio.coroutine
def func():
  yield from asyncio.sleep(2)
  print("Finished running")

t = func()

?

@bdarnell
Copy link
Member

Regarding @task and asyncio.Task: I understand the connection; my point is that the decorator form of @asyncio.task was removed because a decorator was considered to be the wrong design for this problem. Wrapping a coroutine in a task is a runtime decision of the caller, not an import-time decision of the coroutine itself.

Regarding the use of Tornado Futures from asyncio coroutines: I think this is a problem to be solved by introducing some sort of extensibility on the asyncio side. Even within the standard library, asyncio coroutines cannot use concurrent.futures.Futures without a wrapper. It's more scalable for each coroutine library to consume the others' yieldable objects than for each yieldable object to implement the union of the various interfaces that might be expected of it (also consider twisted's Deferred, which is not Future-shaped at all but would ideally be supported here).

@methane
Copy link
Contributor

methane commented Aug 23, 2014

@BeholdMyGlory I think this is better:

if not asyncio.tasks.iscoroutinefunction(func):
    func = asyncio.coroutine(func)

But I don't know about it should be removed or not.

@methane
Copy link
Contributor

methane commented Aug 23, 2014

How about removing Future.__iter__ and @task for now and merge this?
Using asyncio's library from Tornado is more important than allowing yield from tornado's future.

Conflicts:
	tornado/platform/asyncio.py
@arvidfm
Copy link
Author

arvidfm commented Aug 23, 2014

Sorry, I've kind of been neglecting this pull request, though I've been meaning to get back to it. I got stuck on a problem when trying to get Tornado's coroutines to accept asyncio Futures. In theory it should be easy enough, just add asyncio.Future to the FUTURES tuple in tornado.concurrent, but it seems that either the schedulers or the IO loops themselves have subtly different behaviour in Tornado and asyncio, leading to delicate timing issues that I couldn't manage to debug properly.

The following code exhibits the problematic behaviour in question:

import asyncio
import subprocess

import tornado.httpclient
import tornado.web
import tornado.platform.asyncio
import tornado.gen

class RequestHandler(tornado.web.RequestHandler):
    @tornado.gen.coroutine
    def get(self):
        proc = yield from asyncio.create_subprocess_exec("ls", "/", stdout=subprocess.PIPE)
        self.write((yield from proc.communicate())[0])


tornado.platform.asyncio.AsyncIOMainLoop().install()
application = tornado.web.Application([
    (r"/?", RequestHandler)
])
application.listen(8080)

loop = asyncio.get_event_loop()
loop.run_forever()

The expected behaviour is that when / is fetched on the server, the output of running ls on the root directory is outputted. However, about half of the time, the server returns error code 500 with the following traceback:

Traceback (most recent call last):
  File "/home/arvid/playground/tornado/tornado/web.py", line 1338, in _execute
    result = yield result
  File "/home/arvid/playground/tornado/tornado/gen.py", line 620, in run
    value = future.result()
  File "/home/arvid/playground/tornado/tornado/concurrent.py", line 184, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/arvid/playground/tornado/tornado/gen.py", line 625, in run
    yielded = self.gen.send(value)
  File "ioloop_test.py", line 23, in get
    proc = yield from asyncio.create_subprocess_exec("ls", "/", stdout=subprocess.PIPE)
  File "/usr/lib/python3.4/asyncio/subprocess.py", line 195, in create_subprocess_exec
    return Process(transport, protocol, loop)
  File "/usr/lib/python3.4/asyncio/subprocess.py", line 91, in __init__
    self.pid = transport.get_pid()
  File "/usr/lib/python3.4/asyncio/base_subprocess.py", line 48, in get_pid
    return self._proc.pid
AttributeError: 'NoneType' object has no attribute 'pid'

This is running on a Linux 3.16 machine with Python 3.4.1 and with the latest changes from the Tornado master branch at the time of writing merged.

From what I can recall from when I was debugging the issue a while back, the problem was that the asyncio.base_subprocess.BaseSubprocessTransport._call_connection_lost method, which sets the variable holding the internal subprocess.Popen object to None, is for some reason sometimes scheduled and run before the Process object, which in its constructor runs transport.get_pid(), is created in create_subprocess_exec.

If create_subprocess_exec is handled by asyncio's own scheduler, i.e. asyncio.Task, the error never occurs, e.g.:

import asyncio
import functools
import subprocess

import tornado.httpclient
import tornado.web
import tornado.platform.asyncio
import tornado.gen

def task(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return asyncio.Task(func(*args, **kwargs))
    return wrapper

class RequestHandler(tornado.web.RequestHandler):
    @task
    @asyncio.coroutine
    def get(self):
        proc = yield from asyncio.create_subprocess_exec("ls", "/", stdout=subprocess.PIPE)
        self.write((yield from proc.communicate())[0])

tornado.platform.asyncio.AsyncIOMainLoop().install()
application = tornado.web.Application([
    (r"/?", RequestHandler)
])
application.listen(8080)

loop = asyncio.get_event_loop()
loop.run_forever()

I don't expect that this problem surfaces in very many places, but I still don't think that this code should be merged until it's fixed. I don't know if the problem is in Tornado or if you could pass the blame onto Python itself.

I've made some additional changes since I last commented here, and I'll try to push them either today or tomorrow for further discussion.

@arvidfm
Copy link
Author

arvidfm commented Aug 26, 2014

All right, I've pushed my changes (note that most of the commits are a few months old and GitHub has thus placed them a bit earlier in the comment thread). The changes I've made are roughly:

  • Remove @task decorator in favour of letting @gen.coroutine handle asyncio.Future object directly.
  • Remove wrap_asyncio_future which is now unneeded.
  • Don't wrap the Future object in tornado.concurrent.Future.__iter__, just have it yield itself. (The raise StopIteration is for Python 2 compatibility, as Python 2 doesn't support returning values from generators using return.)
  • Add tornado.gen.YieldPoint.__iter__ which works like tornado.concurrent.Future.__iter__.
  • Make tornado.concurrent.Future configurable. I know you proposed that it would be better if asyncio added some sort of extensibility functionality, Ben, but I wanted to at least show what an asyncio- and Tornado-compatible Future could look like.
  • Add option for running tests with platform.asyncio.AsyncIOFuture as the default Future type.

I'm not sure how feasible it would be for asyncio to add support for third-party Future implementations, or how it might look (Some kind of abstract Future base class? A register_future_type method that takes class objects?), nor what the best channel would be for proposing such an addition (The tulip issue tracker? A PEP?).

Note that the issue mentioned in my previous comment is still present.

The code can be tested by running python3 -m tornado.test.runtests --ioloop=tornado.platform.asyncio.AsyncIOMainLoop --future=tornado.platform.asyncio.AsyncIOFuture. A lot of error messages will be printed to the console, but these are harmless as they are caused by the manner in which Tornado's tests detect exceptions, making the Future objects think that the error was never retrieved, thus causing them to log the error when destroyed.

An important difference between Tornado's and asyncio's Future implementations—the fact that asyncio.Future runs callbacks using the event loop's call_soon method, which means that errors will not propagate if the loop is not running—causes 11 tests to fail, however:

  • tornado.test.gen_test.GenEngineTest.test_raise_after_stop: The event loop is stopped before an exception is even raised, so no callbacks will be run.
  • tornado.test.ioloop_test.TestIOLoop.test_exception_logging_future: The event loop's stop method is added to the list of callbacks before the exception is raised and will thus be called before the Future object's callbacks are run.
  • All 9 tests in tornado.test.tcpclient_test.ConnectorTest: After setting the result of a Future object in resolve_connect, control is never handed back to the event loop, meaning that _Connector.on_connect_done never gets a chance to run, so the result is not propagated to _Connection.future, which is the future returned by ConnectorTest.start_connect, in time.

So, now that I've gone over that, I guess the question is where to go from here. I wouldn't mind reverting the changes made to the Future class(es) for the time being, basically keeping the change to the FUTURES tuple and maybe the wrap_tornado_future method (though I do hope to see more seamless integration between Tornado and asyncio in the future, including the possibility of directly yielding futures returned by Tornado in asyncio coroutines), but the problem outlined in my previous comment needs to be fixed before the changes are merged either way.

Also worth noting regarding the yield from idiom with Tornado's own types is that it may lead to unexpected results in some cases, particularly when yielding lists. Normally when yielding a list it is automatically wrapped in a tornado.gen.Multi, but if you use yield from with a list, the contents will be yielded sequentially instead which is pretty much the opposite of what you want.

@bdarnell
Copy link
Member

I don't think we should make Future subclass Configurable - it's too performance-critical to use the Configurable machinery. I think the right model here is a registry of adapters instead of a common base type (in python3.4 this could use functools.singledispatch; for older pythons you could either use the backported package or maybe something simpler). This would let you adapt future-like classes that do not resemble the Future interface, like Twisted's Deferreds.

@arvidfm
Copy link
Author

arvidfm commented Aug 27, 2014

I'm not sure I quite follow, can you give an example of what something like that might look like?

@methane
Copy link
Contributor

methane commented Aug 27, 2014

Tornado is network programming framework and application framework.
asyncio is also network programming framework, but not an application framework.

So I want to use libraries based on asyncio (or Trollius) from Tornado applications.
But I don't concern about using libraries based on Tornado from asyncio applications.

As my understanding, there are no reason for inheriting from asyncio.Future to support yielding it. are there?

@arvidfm
Copy link
Author

arvidfm commented Aug 27, 2014

If you only care about using asyncio in Tornado applications and not the other way around then yes, you won't have to concern yourself with the AsyncIOFuture which inherits from asyncio.Future. Personally though I see a use case for using Tornado in asyncio applications, for example to make use of Tornado's WebSocket client.

@methane
Copy link
Contributor

methane commented Aug 27, 2014

I feel using asyncio.Future in Tornado (including tornado.websocket) on Python 3.3+ is simpler.

try:
    from asyncio import Future
except ImportError:
    from tornado.concurrent import Future

@bdarnell How do you think?

@arvidfm
Copy link
Author

arvidfm commented Aug 27, 2014

asyncio.Future cannot be used with Tornado out of the box, since it doesn't support exc_info, and doesn't wrap the callbacks using stack_context.wrap which Tornado (or at Tornado's test suite) expects.

@arvidfm
Copy link
Author

arvidfm commented Aug 27, 2014

Additionally, since asyncio.Future depends on the asyncio event loop, Tornado can't default to it as it would break any applications that do not install tornado.platform.asyncio.AsyncIOMainLoop.

@bdarnell
Copy link
Member

The singledispatch idea is that instead of the current handle_yield code that converts lists, dicts, and YieldPoints into Futures, you'd have an extensible converter list:

@functools.singledispatch
def make_future(obj):
    raise BadYieldError()

@make_future.register(list)
@make_future.register(dict)
def _(obj):
    return Multi(obj)

# in platform/asyncio.py
@make_future.register(asyncio.Future)
def _(obj):
    f = tornado.concurrent.Future()
    tornado.concurrent.chain_future(obj, f)
    return f

# in platform/twisted.py
@make_future.register(twisted.Deferred)
def _(obj):
    f = tornado.concurrent.Future()
    # register callbacks on obj to transfer result to f
    return f

Even though asyncio.Future and tornado Futures are very similar, they may not be close enough that we can treat them as interchangeable. The safest way to handle this is with an explicit adapter, similar to what we'd need in the twisted case.

@bdarnell
Copy link
Member

To wrap this up for Tornado 4.1: I've just merged some changes that implement the singledispatch pattern so you can yield asyncio Futures (and twisted Deferreds) in a Tornado coroutine (05c3073...841b2d4). I've also documented the AsyncIOLoop.asyncio_loop attribute for public consumption (in lieu of adding a get_event_loop method) and added bidirectional conversion functions between the two kinds of Futures.

I have not added an __iter__ method that converts Tornado Futures to asyncio Futures because we may want to embrace the yield from pattern for performance even in a Tornado-only world (tracked in #1305). I think that consumption of Tornado Futures in asyncio coroutines is best addressed on the asyncio side (probably via a similar singledispatch mechanism), and in the meantime an explicit conversion is probably the best option.

@bdarnell bdarnell closed this Jan 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants