-
Notifications
You must be signed in to change notification settings - Fork 73
Pool fails to reconnect when using Unix-domain socket (OperationalError: server closed the connection unexpectedly) #145
Comments
Oh yeah again: you can't see it in my little repro script, but this bug puts Pool in a bad state. Eg. in a long-running webapp, no database queries work after the disconnect. Everything fails immediately with that OperationalError, and momoko never recovers. Only workaround appears to be restarting the app. |
Can you please rerun your example with debug enabled - |
|
Another check please - can you change momoko's code to log the value of It should be |
So, to clarify, this is a psycopg2 bug? What exactly is the bug? Reporting errors differently over TCP socket and Unix-domain socket? |
Yes, exactly. The exception says that "server closed the connection...", but the connection object states that it's not closed. And this happens only sometimes and only over Unix-domain sockets. I remember back in early days of Momoko, the I'm closing the issue since there is nothing really I can do on the subject rather to advise you to use TCP. Feel free to disagree/reopen. |
I can't reproduce a problem with psycopg2. However, its connection object's Anyways, here is my psycopg2 test script:
I'm using the exact same setup as described before: specifically, psycopg2 is the ubuntu package of 2.6.1. Output using TCP socket:
Output using Unix-domain socket:
|
Here is the
The scenario you are reproducing with pure Can you update your latter test with killing the server the same way you did before and check again? - If I'm right, your code should raise even before entering try/except clause. |
Oops! My bad. I tried it... same result. Here is my revised psycopg2 test script:
Output using TCP socket:
Output using Unix-domain socket:
|
There must me some difference between the way mokoko example works and psycopg2 stand-alone. Because in mokoko output there is "Method failed synchronously" message, meaning that So far the difference I see is that synthetic example reuses the same cursor object through context manager, while Momoko creates new cursor object for every Can you please change your sample to obtain new cursor object before each |
Sure. I just tried it, and it made no difference: the psycopg2 connection object's I'm going to take a different approach: start with my momoko example, copy the necessary bits of momoko into my example script, and then rip out as much code as I can while still seeing the incorrect |
BTW, I can't reopen this bug. Would appreciate if you could reopen until I can clearly point to a problem with psycopg2. |
For the record, here is a repro script that uses momoko but focuses on the
Run it once on a Unix-domain socket:
Run it again, and get a different result:
|
Yeah, so there is definitely some fuzziness in psycopg2. I'm sure you'll catch it up. Sorry for not providing more assistance - pretty swamped with other things recently. |
OK I have a very clear reproduction script in psycopg2, and I found a fix. No PR for psycopg2 yet, but the bug is definitely there: psycopg/psycopg2#443. This bug is definitely not momoko's fault. @haizaar, thanks for your help in debugging this! |
Sure! Kudos on the thorough research. |
Extra confirmation: I just retried my two repro scripts that appeared to show a problem with momoko (lostconn.py and badclosed.py). I can still reproduce the problem using latest released psycopg2, but with my patch (psycopg/psycopg2#443), momoko behaves perfectly. |
If I use momoko.Pool to connect over TCP, it handles database disconnects and restarts perfectly. But if I connect over a Unix-domain socket, it does not. The first request after a disconnect crashes with "OperationalError: server closed the connection unexpectedly".
Here's an example script:
I'm testing with:
I've set things up with a
momoko
database that I can login to without authentication, either via TCP or Unix-domain socket:First, let's run the test script in "happy" mode, demonstrating that momoko successfully reconnects when using a TCP socket:
Good!
Now try the same thing via Unix-domain socket:
I've seen the same behaviour with v2.2.1, v2.2.2, v2.2.3, and current git master.
Of possible interest: the failure isn't absolutely reliable. I did a couple of trials of 100 runs each, and I get the above failure ~70% of the time, but success ~30% of the time. Tried it with both pool
size=1
andsize=5
: similar result.The text was updated successfully, but these errors were encountered: