-
-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails to detect broken connections in some cases #263
Comments
I can reproduce the issue, e.g.:
However the libpq doesn't consider the connection broken either:
When we find an error, e.g. in |
A C test confirms that the libpq informative functions return the connection in good state after the fd is closed. Let's see what can we do about that... |
On 22/09/14 06:32, Daniele Varrazzo wrote:
Hah, this bug keeps getting deeper :-)
Thanks for looking into this! Martín Ferrari (Tincho) |
After Tom Lane's message I had a quick run of testing with kill -9 of the backend. It seems that in that case we are mostly doing the right thing (only once triggered a case where the connection was dead but reported open, but I can't reproduce it anymore, and on poll sometimes we raise an exception with an empty message: I'll look better into these). However, I don't think psycopg should do much different than what is doing: when we receive an error we ask the libpq if the connection is still ok: if it says no we close it (usually with closed = 2). We cannot be more aggressive than that or we risk closing connections that are still ok. Can I please ask you to repeat your tests? Please check the Also note that the connection is not reported broken if not after an attempt to read: in psycopg this happens both on Thank you. |
Hi Daniele, On 23/09/14 11:19, Daniele Varrazzo wrote:
I expected that, I had also made other tests where psycopg did the right
I have added extra logging to my code to see what is the status of these I have created a minimal test case, which breaks the connection in two I have read Tom Lane's message, and I understand his point about this
In the test case I tried execute, poll right after failed execute, and
As you can see, the behaviour is pretty different. If I close the In the second case, although I don't know the semantics of the values, The code for the tests is this:
Thanks!! Martín Ferrari (Tincho) |
Thank you for the extensive testing: I'll try to reproduce and fix the case of
for the other ones I think you'll have to talk with PG developers: they could be handling some SSL error value inappropriately. |
Well, the fact that the error you got was "SSL SYSCALL error: Bad file descriptor" would tell you that the connection is not OK. But I assume psycopg2 doesn't want to get into the business of heuristics based on exceptions, which is fine! we already do it, and based on this thread it seems like we really can't rely on connection.closed very much for our purposes at all, so I'll keep adding more strings as users report them. |
On 24/09/14 14:32, mike bayer wrote:
It seems that you would need to do more than check closed, at least Martín Ferrari (Tincho) |
get_transaction_status() goes to 4 always after a network operation that we have initiated: if closed is not a good indicator we might be able to fix it. We have to identify the code path where this happens. If you can give me a hand, e.g. running your test that produces the inconsistent state you have found with debug enabled we may get a better picture. Otherwise I'll eventually take a look at it but it may take me a few days. About parsing the error messages: you are right, we never take any decision based on them. They are free to change and subject to L10N too. But the libpq has the exact errno of the problem so they may be failing to handle properly some of them and improving that would be beneficial. |
I would be happy to help. But I am not sure I understand what do you need me to do... The error we get in the production app is not reproducible yet (I still don't know the cause), so I can only add some debugging when I detect it.. Is there anything else I can do there? |
what is the resolution for this issue? |
That psycopg won't do anything actively such a poll to check the detection is broken. If your application finds a connection in broken/inconsistent state it is your responsibility to dispose of it and open a new one. If you use the connection pool it should do that for you (but on putconn, not on getconn) and if it doesn't it is our bug to fix. I have recently improved some internal handling quirknesses so some of the issues here could have been solved, but more along the line of returning a better error message, not automatic reconnection or anything too fancy. |
Fixed with #443 |
We decided to add pre_ping=True . Without it we were running into bug: psycopg/psycopg2#263.
Hi,
I have been trying to chase down a bug I am experiencing: sometimes the connection to the database would go away, and neither psycopg2 nor sqlalchemy detect the condition, and thus, the connection is not removed from the connection pool.
It seems to be the same issue as #196 , which is marked as solved, but I can reproduce this behaviour with psycopg 2.5.3, and problems other than the server restarting. In my case it seems that connections are timing out (the error is
SSL SYSCALL error: EOF detected
), but closing manually the filedescriptor triggers the same issue):It is worth noting that this does not seem to be an SSL-related issue, as I can reproduce the same problem with a UNIX domain socket:
The text was updated successfully, but these errors were encountered: