Asynchronous Driver For Python <Tornado and Tulip> #2622

v3ss0n · 2014-06-28T22:35:07Z

Change feed on 1.13 is very intersting feature.Multi-media-realtime HTML5 projects are getitng very popular these days , and i am building one right now too , a multi-media-webchat .It needs a lot of serverside notifications.
However , RethinkDB still needs async driver for Tornadoweb, like Motor driver by MongoDB does , that is only deal-blocking feature that blocking me changing from Mongo.
I Love ReQL and I really believe that RethinkDB is NoSQL-Done-Right.

Any Pointers on modifying current driver work async would be very good,I would like to contribute. (I am still quite new on Async and Tornado.

AtnNn · 2014-06-28T22:49:04Z

@v3ss0n Thanks for opening an issue. With the current driver you can use threads to perform asynchronous queries. Something like this should work:

from threading import Thread
from queue import Queue
stream = Queue()
def get_changes():
  conn = r.connect()
  for change in r.table(...).changes()...run(conn):
    stream.put(change)
Thread(target=get_changes).start()
print stream.get()

A Python driver with an async API would be better.

v3ss0n · 2014-06-29T00:12:17Z

Thanks a lot for code example AtnNn.
That would work , but wouldn't that waste everything Tornado tried to avoid?
It won't block Tornado's IOLoop but create new thread for every new requests that needs rethinkdb features.
This will nullify c10k problem that Tornado addressed, solved well.

On asynchronous Mortor ,MongoDB driver equivilant can be done without using threads :

@gen.coroutine
def tail_example():
    results = []
    collection = db.my_capped_collection
    cursor = collection.find(tailable=True, await_data=True)
    while True:
        if not cursor.alive:
            now = datetime.datetime.utcnow()
            # While collection is empty, tailable cursor dies immediately
            yield gen.Task(loop.add_timeout, datetime.timedelta(seconds=1))  ##Giving Control back to tornado IOLoop 
            cursor = collection.find(tailable=True, await_data=True)  ##<< This will block on normal Pymongo Driver

        if (yield cursor.fetch_next):
            results.append(cursor.next_object())
            print results

Motor driver uses Greenlets to address this problem, without modifying any line of code of mongodb's PyMongo Driver, very interesting approach.
https://github.com/mongodb/motor/blob/master/motor/__init__.py (just over 2600 lines of code).

His approach is detailed on this video and slide , and should be able to apply to any Blocking NetworkIO drivers without much modification - Thats what i am talking about :) .

http://emptysqua.re/blog/video-slides-and-code-about-async-python-and-mongodb/
https://speakerdeck.com/mongodb/asynchronous-mongodb-with-python-and-tornado-a-jesse-jiryu-davis-python-evangelist

It will be very good feature for rethinkdb to have it build in. I will also look into rethinkdb's python driver when i get time.

coffeemug · 2014-07-02T18:33:32Z

@v3ss0n -- thanks for chiming in. I'm moving this to backlog for now because there are much more pressing issues to take care of first, but I agree this would be valuable. As a data point, at least 5 people (may be more) have asked me for this.

I can't promise when we'll get to it, but I can say with almost complete certainty that there will be support for this eventually.

tonich-sh · 2014-09-23T07:23:10Z

I wrote a twisted connector for python driver. It may help... https://github.com/tonich-sh/rethinkdb-twisted.git

v3ss0n · 2014-09-23T18:54:12Z

Thanks , looking into it!

v3ss0n · 2014-10-26T04:45:08Z

I am now seriously thinking about writing Async Tornado driver for RethinkDB after liking the changefeeds compare to tailable_cursors of MongoDB.
With changefeed , we can listen for Updates (Like Vote Up/Down) which is quite nice. Its not possible for MongoDB unless you hack into oplog.
@tonich-sh , i saw re-writing RethinkDB client in twisted , but does it really necessary?

tonich-sh · 2014-10-26T05:58:43Z

Current client is too mixed with socket code but twisted's protocols is an abstraction. So i went on easiest way... I made it simple and left only the asynchronous logic.

csytan · 2014-10-26T16:41:18Z

I tried two approaches about a month ago, threading & asyncio (Python3). Unfortunately, I don't have the time to make anything more than a hack, but I think the asyncio approach is promising seeing as it's in the standard library.

Maybe this will be helpful to someone:
https://github.com/csytan/rethinkdb/commits/next

v3ss0n · 2014-10-27T00:44:36Z

@tonich-sh , so i think going gevent approach (the way motor did with mongodb ) will be easy for current client .

@csytan thanks a lot chris , i am looking into it.
EDIT: Very interesting approach , can be switched with tornado ioloop.

kzahel · 2015-01-27T03:08:04Z

Hi I'm just evaluating rethinkDB at this point for use in a new project. Having support for an asynchronous tornado driver would be very compelling for me. Out of the box support for a python api that mirrors the node/javascript API (using generators/coroutines) would be really great.

danielmewes · 2015-01-27T19:13:31Z

We're going to consider different options (including Tornado and Python 3 asyncio) for making changefeeds work better in Python as part of #3298. This will happen very soon.
I'm not sure yet to what degree we are going to convert the rest of the driver (apart of changefeeds) to an asynchronous interface.

techdragon · 2015-02-15T10:10:44Z

Ideally the outcome would be to refactor the python code to be event loop agnostic. The same "protocol" code reused in event loop drivers so their feature compatibility is ensured. I feel I have to point this out because unfortunately few libraries are written this way. I dont want to see a future where I'll be installing with pip install rethinkdb-twisted in one codebase and pip install rethinkdb-asyncio in another.

The Autobahn project has a great structure for this kind of thing, https://github.com/tavendo/AutobahnPython/

The Autobahn code is structured so if/when people need to use the protocol with some other unsupported event loop, they just have to rewrite the much smaller driver code to use their event loop.
In the case of rethinkdb there would be an additional 'synchronous' driver branch.

csytan · 2015-02-15T16:15:18Z

I think the cleanest way to get the driver to run on both Python 2 & 3 would be to use Tornado. Tornado coroutines are very similar to Asyncio's and are also compatible with Asyncio's event loop.

danielmewes · 2015-02-16T19:25:20Z

@techdragon We'd like to avoid duplication as well. Will see what we can do.
@csytan That's interesting, thanks for pointing it out.

Moving this to 2.1. We'll try to get Tornado integration done soon.

deontologician · 2015-02-16T19:31:20Z

Just a note from my personal experience, Tornado is the node equivalent right now in the python world. Twisted has been around longer, but isn't as widely used in the web-app community. In terms of which async loop to support first, Tornado would be a solid choice.

danielmewes · 2015-02-16T19:45:18Z

@deontologician Ok that's good to know. I have no personal preference with respect to whether to support Twisted or Tornado first. My understanding from what @larkost said is that Twisted tends to be slightly easier to install due to more widely available packages.

If Tornado is easier to support than Twisted that sounds like we should support Tornado first. Otherwise we should just pick one and then follow up the other one soon (maybe we can even do both for 2.0).
@gchpaco any opinions?

deontologician · 2015-02-16T19:48:55Z

I can't comment on which one is easier to implement, just in terms of who is most likely to make use of it

gchpaco · 2015-02-16T19:50:30Z

It has again been a while since I did bleeding edge Python, but if @csytan is right then it would be much less work to support Tornado-and-asyncio-etc than it would be to support only Twisted.

@techdragon That's interesting! I'm going to have to look at it in more detail. Certainly a possibility.

danielmewes · 2015-02-16T19:57:32Z

Ok let's do Tornado first then.

techdragon · 2015-02-17T04:49:28Z

@gchpaco & @danielmewes - If you refactor the logic to separate the drivers from the core logic the drivers rely on, then at least in theory, each driver should be simpler to build. Tornado is the current favourite among a lot of developers, and would be great, but when you refactor the logic you need to take into account that you need to maintain the existing synchronous driver.

Twisted has a LOT of documentation regarding its use.

I would try to refactor the current code to be broken apart like Autobahn has theirs, supporting twisted and regular synchronous python first, then adding tornado & asyncio in whichever order works best.
I dont think the drivers will be the hard part of getting this done, I would expect that after the logic refactoring, the drivers, "Synchronous", "Twisted", "Asyncio", and "Tornado" will all be fairly strait forward.

v3ss0n · 2015-02-17T15:14:18Z

Thanks a lot for this going forward.I haven't check for long due to being busy with Async chat Implementation on Tornado and Mongo.

@danielmewes , what @csytan suggested true that tornado's coroutines are compatible with asyncio of Python3. It gives you clean code using yield, without needing to worry about callbacks , and thats the reason for web devs choosing tornado over twisted.
Tornado have a twisted bridge , I think its bidrectional so if you support tornado , it will support twisted too.

One Stone , a flock of birds. Nice? :)

coffeemug · 2015-03-14T05:29:12Z

@gchpaco -- a couple of questions:

Could you post some minimal sample code here of how a user would use the driver with the async API?
Would the old synchronous code still work without modifications?
Since we have time before the release, could you start working on adding other Python async backends (starting with most common) when the code review goes through?

coffeemug · 2015-03-14T05:33:21Z

Specifically, with sample code it would be great to mirror @mlucy's mini-tutorial here #3678 (comment) in Python, so we understand how to use the API in different circumstances, how error handling works, etc.

gchpaco · 2015-03-17T04:22:24Z

Right now I'm trying to get it through review, but I'll try to do that. As a temporary stopgap, see test/rql_test/connections/tornado_connection.py which is an analog of connection.py.

v3ss0n · 2015-03-17T19:09:55Z

https://github.com/rethinkdb/rethinkdb/blob/68551064ddfb61a77735875adcc070cf77798eb4/test/rql_test/connections/tornado_connection.py

this one right?

gchpaco · 2015-03-18T00:38:44Z

Yup. Note that @Tryneus is working on refactoring some of it, so there may be some large changes when that goes through.

Tryneus · 2015-03-19T08:01:22Z

The refactor @gchpaco mentioned is up in review 2732, which is based off his branch and eclipses the previous review. It is available in the branch grey_issue_2622.

Tryneus · 2015-03-21T06:01:02Z

Ok, the python driver refactor has been approved and merged to next in commits 73d83e8 (@gchpaco's changes), 154efa8 (driver refactor), and 43cd9d7 (test updates following the refactor), and v2.0.x in commits a126ed8, 7e51398, and a3d41d0. Will be in release 2.0.

v3ss0n · 2015-03-22T14:12:49Z

Thank you so much , i like the way RethinkDB team handle projects , all official workflow integrated with github, that's so cool.
I learn a lot from you and will practice in my company.
I am pulling now and going to test. If everything ok i will make a rehtinkdb + tornado example chat demo.

danielmewes · 2015-03-24T18:16:49Z

Out of curiosity @gchpaco:
How does this work exactly (example from rethinkdb/docs#683)?

    # Print every row in the table.
    for future in (yield r.table('test').order_by(index='id').run(connection)):
        item = yield future
        print(item)

More specifically: how do we know in advance whether the cursor will have another result in the for loop? Does yield future always wait for the next batch to be loaded unless the cursor is a changefeed?

Tryneus · 2015-03-24T18:33:24Z

@danielmewes, I just tested this and it isn't the expected behavior (this is my fault due to some of the cursor changes in the refactor). yield future will raise a StopIteration exception once it is past the end of the cursor. Unfortunately, if the user doesn't catch this, it interacts poorly with Tornado, which assumes the StopIteration means that the coroutine has finished.

Opened #3974 for this issue.

Tryneus · 2015-03-24T18:34:20Z

So based on my suggestion in #3974, that loop would throw RqlCursorEmpty after the end of the cursor.

danielmewes · 2015-03-24T18:42:27Z

Having to handle the RqlCursorEmpty exception for every loop over a cursor sounds a little annoying.

Should we add a fetch_next like in Motor to guarantee that the next call to next doesn't throw?

cursor = yield r.table('test').order_by(index='id').run(connection)
while (yield cursor.fetch_next):
    item = yield cursor.next()
    print(item)

Tryneus · 2015-03-24T18:56:21Z

@danielmewes, I think that would be useful, I'll do it alongside #3974.

danielmewes · 2015-03-24T19:00:32Z

👍

ajdavis · 2015-03-24T19:25:05Z

Note in Motor, you can create the cursor without a yield since it hasn't begun I/O yet, just stored the query in a MotorCursor object. Only on the first yield cursor.fetch_next do we send the query to the server and get the first batch:

http://motor.readthedocs.org/en/stable/api/motor_cursor.html#motor.MotorCursor.fetch_next

Also, I chose the method name next_object instead of next to avoid interacting with Python's standard iterator protocol in any unexpected way, since a MotorCursor cannot act like a standard iterator:

https://docs.python.org/2/library/stdtypes.html#iterator-types

danielmewes · 2015-03-24T23:16:28Z

Thanks @ajdavis .
For RethinkDB we need to yield on creating the cursor because we don't know whether a given query actually returns one before getting the first response from the server (some queries return a single value, which we handle differently).

Good point about the incompatibility with the iterator interface.
We could certainly use a special next function for Tornado cursors (not sure how we would call it). I can't tell whether it's important enough to warrant the separate function though.
I wonder how exactly a script fails if it tries to use a Tornado cursor as an iterator. I'll try that later.

Tryneus · 2015-03-25T00:27:35Z

@danielmewes, it's would be fine to iterate over a TornadoCursor, as long as you didn't do it eagerly (you would end up with an infinite loop that would probably be OOM-killed), and as far as I can tell, we can't safely do a comprehension like arr = [(yield x) for x in cursor if (yield cursor.fetch_next())]. Taking this into account, we should probably remove iteration from the TornadoCursor implementation.

v3ss0n · 2015-04-02T07:00:24Z

No problem now , this code works well.

con = yield conn
curs = yield evt.run(con)
messages = []
while (yield curs.fetch_next()):
    item = yield curs.next()
    messages.append(item)

But i have a few other questions.

Right now Insert performance is not good with async , it takes 200-400 ms , locally , small document of < 2KB each (chat messages).

take a look at this code , i may be doing something wrong:

import logging
import tornado.escape
import tornado.ioloop
import tornado.web
import os.path
import uuid
import rethinkdb as r
from tornado.concurrent import Future
from tornado import gen
from tornado.options import define, options, parse_command_line
r.set_loop_type("tornado")
define("port", default=8080, help="run on the given port", type=int)
define("debug", default=False, help="run in debug mode")
conn = r.connect("localhost")
evt = r.db("rechat").table("events")
# Making this a non-singleton is left as an exercise for the reader.


class MainHandler(tornado.web.RequestHandler):

    @gen.coroutine
    def get(self):
        con = yield conn
        curs = yield evt.run(con)
        messages = []
        while (yield curs.fetch_next()):
            item = yield curs.next()
            messages.append(item)

        self.render("index.html", messages=messages)


class MessageNewHandler(tornado.web.RequestHandler):

    @gen.coroutine
    def post(self):
        con = yield conn
        message = {
            "body": self.get_argument("body")
        }
        # to_basestring is necessary for Python 3's json encoder,
        # which doesn't accept byte strings.
        messages = (yield evt.insert(message).run(con))
        message['id'] = messages['generated_keys'][0]
        message["html"] = tornado.escape.to_basestring(
            self.render_string("message.html", message=message))
        if self.get_argument("next", None):
            self.redirect(self.get_argument("next"))
        else:
            self.write(message)

danielmewes · 2015-04-02T17:56:11Z

@v3ss0n I can't see anything obviously wrong about your insert code.

Which revision of the RethinkDB repository are you on ($ git status)? I wonder if you might be affected by #3998. It's closed in the latest revision which you should get through git pull.

v3ss0n · 2015-04-02T18:29:50Z

i am at 01fa526

v3ss0n · 2015-04-02T18:31:25Z

what branch i should pool? next or v2.0x ?

danielmewes · 2015-04-02T18:38:46Z

@v3ss0n Both branches should be ok. The commit you mentioned actually includes the fix for that issue. So something else must be going on. I opened #4007 for further investigation into this.

shivekkhurana · 2017-06-25T07:43:05Z

After reviewing a lot of amazing work by David Beazley (https://github.com/dabeaz), and with the improvements in async python (mainly introduction of async/ await keyword and inclusion of asyncio into the stdlib), we can do this very easily.

I have created a gist here with a working example
https://gist.github.com/shivekkhurana/1de00e1e54c36d250a7f19905fe133b9

It gives a simple structure to listen to multiple change feeds asynchronously (tested on python 3.6)
Hope it helps.

coffeemug added this to the backlog milestone Jul 2, 2014

coffeemug added the tp:feature label Jul 2, 2014

AtnNn mentioned this issue Nov 7, 2014

Reading from multiple changefeeds in Python #3298

Closed

larkost mentioned this issue Jan 8, 2015

Add non-blocking option to cursors for changefeeds #3529

Closed

danielmewes modified the milestones: 2.1, backlog Feb 16, 2015

danielmewes mentioned this issue Feb 16, 2015

Twisted port of the Python driver's connection class #3785

Closed

danielmewes modified the milestones: 2.0, 2.1 Feb 16, 2015

Tryneus mentioned this issue Mar 19, 2015

Python driver next(wait=1) should throw something based on RqlError #3937

Closed

danielmewes mentioned this issue Mar 19, 2015

Document Python Tornado driver rethinkdb/docs#683

Closed

gchpaco mentioned this issue Mar 20, 2015

Test hang on Python connection tests on OS X #3954

Closed

Tryneus closed this as completed Mar 21, 2015

ajdavis mentioned this issue Apr 2, 2015

(sorry wrong repo) TypeError: 'TornadoCursor' object is not iterable tornadoweb/tornado#1403

Closed

danielmewes mentioned this issue Apr 2, 2015

200-400ms insert latency with Tornado driver #4007

Closed

mglukhovsky unassigned gchpaco Apr 2, 2015

Asynchronous Driver For Python <Tornado and Tulip> #2622

Asynchronous Driver For Python <Tornado and Tulip> #2622

Comments

v3ss0n commented Jun 28, 2014

AtnNn commented Jun 28, 2014

v3ss0n commented Jun 29, 2014

coffeemug commented Jul 2, 2014

tonich-sh commented Sep 23, 2014

v3ss0n commented Sep 23, 2014

v3ss0n commented Oct 26, 2014

tonich-sh commented Oct 26, 2014

csytan commented Oct 26, 2014

v3ss0n commented Oct 27, 2014

kzahel commented Jan 27, 2015

danielmewes commented Jan 27, 2015

techdragon commented Feb 15, 2015

csytan commented Feb 15, 2015

danielmewes commented Feb 16, 2015

deontologician commented Feb 16, 2015

danielmewes commented Feb 16, 2015

deontologician commented Feb 16, 2015

gchpaco commented Feb 16, 2015

danielmewes commented Feb 16, 2015

techdragon commented Feb 17, 2015

v3ss0n commented Feb 17, 2015

coffeemug commented Mar 14, 2015

coffeemug commented Mar 14, 2015

gchpaco commented Mar 17, 2015

v3ss0n commented Mar 17, 2015

gchpaco commented Mar 18, 2015

Tryneus commented Mar 19, 2015

Tryneus commented Mar 21, 2015

v3ss0n commented Mar 22, 2015

danielmewes commented Mar 24, 2015

Tryneus commented Mar 24, 2015

Tryneus commented Mar 24, 2015

danielmewes commented Mar 24, 2015

Tryneus commented Mar 24, 2015

danielmewes commented Mar 24, 2015

ajdavis commented Mar 24, 2015

danielmewes commented Mar 24, 2015

Tryneus commented Mar 25, 2015

v3ss0n commented Apr 2, 2015

danielmewes commented Apr 2, 2015

v3ss0n commented Apr 2, 2015

v3ss0n commented Apr 2, 2015

danielmewes commented Apr 2, 2015

shivekkhurana commented Jun 25, 2017