Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Python exceptions raised in a wait_callback do not always propagate #410
I haven't been able to put together a minimal example of this, so unfortunately I have only been able to get this problem to manifest within the scope of a project I am on. I will continue to see if I can put together an example, but in the meantime the best I can offer is some context.
The project is using SQLAlchemy on top of psycopg2, using psycogreen to provide this gevent wait_callback:
The problem I'm seeing is that an exception that is raised within this callback is sometimes not being propagated to the Python code that is making calls to psycopg2 via SQLAlchemy. Instead, I get an
To be specific, I am killing greenlets, which raises a
I've toyed around in the C layer of psycopg2, and I can see that
This seems to me like a bug along some specific code path, but I haven't been able to trace exactly where it goes astray and the exception gets lost.
Can you please run your tests with the current
If you manage to put together some testing rig to reproduce the issue let us know.
referenced this issue
Mar 3, 2017
@dvarrazzo Right, I was working on that, and I came up with a rather silly script, that does reproduce the issue, but depends on random chance a bit.
Install the following packages:
Run this code:
from gevent import monkey; monkey.patch_all() from psycogreen.gevent import patch_psycopg; patch_psycopg() from gevent import spawn from gevent.pool import Pool import psycopg2 from sqlalchemy import create_engine import time engine = create_engine('postgresql://localhost') def sleepy(): try: conn = engine.raw_connection() cur = conn.cursor() cur.execute('SELECT pg_sleep(0.1)') cur.close() conn.close() except: import traceback; traceback.print_exc() def killy(greenlets): for greenlet in greenlets: time.sleep(0.001) greenlet.kill() def main(): pool = Pool() for _ in range(100): greenlets = [pool.apply_async(sleepy) for _ in range(100)] spawn(killy, greenlets) pool.join() if __name__ == '__main__': main()
Out of 10000 queries this results in around 80
@underyx I've tried run your script, then I've tried running your script with 10000 instead of 100 in the outer loop. Twice. So it ran >2M queries, and I didn't see a single unknown :(
Maybe a source of difference is the libpq version? I've tried mine with 9.6.1:
For completeness, I've run my test on python 2.7 on Ubuntu 16.04 64 bits. Tests run on 2.7.1 rather than on master to verify if it is a case of #539, but even on 2.7.1 I don't seem able to reproduce it.
That's fascinating. The three environments where I've seen this error are:
I've published this repo with a Dockerfile and a docker-compose.yml you can use to launch a database and run the script in a way that triggers the failure. I got 500
PS: I'm sure you didn't make this mistake, but just in case: if
PPS: I timed my script execution and it came out to 20 seconds for 100 loops. That'd be 33 minutes for 10000 loops, which is a lot more than what it took for you. Since the timing is important here (you need to raise
I would have exactly asked you for a Dockers thing that I could have tested: thank you very much for providing that, I appreciate the effort and will try to look into it again.
Ok, that got me thinking about difference between network/unix socket connections... tested better and my run of the script was just failing to connect because missing password :\
I'll try to look a bit better if I can understand what causes them.
Yesterday we got 66 of these errors over the span of around an hour. We just ran for 40 minutes with psycopg2 at 4b4d279 and we got
So, I can confirm that this issue is fixed! Thanks again