New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to retry connection after reconnect_failed callback #438
Comments
I think I see what you mean. The pool only reconnects if connection can be re-established and when a new connection is requested, because that's the moment in which we call In [1]: import logging
In [2]: logging.basicConfig(level=logging.INFO)
In [3]: logging.getLogger("psycopg.pool").setLevel(logging.INFO)
In [4]: from psycopg_pool import ConnectionPool
In [11]: p = ConnectionPool("port=54314", reconnect_timeout=5, open=True)
INFO:psycopg.pool:adding new connection to the pool
INFO:psycopg.pool:adding new connection to the pool
INFO:psycopg.pool:adding new connection to the pool
INFO:psycopg.pool:adding new connection to the pool
# Stop the server
In [12]: with p.connection() as conn:
...: conn.execute("select 1")
INFO:psycopg.pool:connection requested from 'pool-3'
INFO:psycopg.pool:connection given by 'pool-3'
INFO:psycopg.pool:returning connection to 'pool-3'
WARNING:psycopg.pool:discarding closed connection: <psycopg.Connection [BAD] at 0x7fcd55416a00>
WARNING:psycopg.pool:error connecting in 'pool-3': connection is bad: No such file or directory
Is the server running locally and accepting connections on that socket?
...
OperationalError: consuming input failed: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
# Check will pretty much empty the pool
In [13]: p.check()
WARNING:psycopg.pool:discarding broken connection: <psycopg.Connection [BAD] at 0x7fcd67f8ef10>
WARNING:psycopg.pool:error connecting in 'pool-3': connection is bad: No such file or directory
Is the server running locally and accepting connections on that socket?
...
WARNING:psycopg.pool:reconnection attempt in pool 'pool-3' failed after 5 sec
# Restart the server
In [14]: with p.connection() as conn:
...: print(conn.execute("select 1").fetchone()[0])
...:
INFO:psycopg.pool:connection requested from 'pool-3'
INFO:psycopg.pool:growing pool 'pool-3' to 1
WARNING:psycopg.pool:error connecting in 'pool-3': connection is bad: No such file or directory
Is the server running locally and accepting connections on that socket?
# This will be blocked 30s before failing
# Restart the server in the meantime, however, this will not save this connection attempt.
...
PoolTimeout: couldn't get a connection after 30.0 sec
# Following connection attempt will succeed immediately, after reconnection.
# Furthermore, the grow attempt triggered by `connection()` will replenish the pool.
In [15]: with p.connection() as conn:
...: print(conn.execute("select 1").fetchone()[0])
...:
INFO:psycopg.pool:connection requested from 'pool-3'
INFO:psycopg.pool:growing pool 'pool-3' to 1
INFO:psycopg.pool:adding new connection to the pool
INFO:psycopg.pool:growing pool 'pool-3' to 2
INFO:psycopg.pool:connection given by 'pool-3'
1
INFO:psycopg.pool:adding new connection to the pool
INFO:psycopg.pool:growing pool 'pool-3' to 3
INFO:psycopg.pool:returning connection to 'pool-3'
INFO:psycopg.pool:adding new connection to the pool
In [16]: INFO:psycopg.pool:growing pool 'pool-3' to 4
INFO:psycopg.pool:adding new connection to the pool We could definitely call The latter connection failure, by the way, only happens because in this test I have set a reconnect timeout (5s) shorter than the getconn timeout (30s). In a different experiment, with a reconnect timeout of 1 minute: In [6]: p = ConnectionPool("port=54314", reconnect_timeout=60, open=True)
INFO:psycopg.pool:adding new connection to the pool
INFO:psycopg.pool:adding new connection to the pool
INFO:psycopg.pool:adding new connection to the pool
INFO:psycopg.pool:adding new connection to the pool
# Stop the server
In [8]: p.check()
WARNING:psycopg.pool:discarding broken connection: <psycopg.Connection [BAD] at 0x7fd16c4a7310>
WARNING:psycopg.pool:error connecting in 'pool-1': connection is bad: No such file or directory
Is the server running locally and accepting connections on that socket?
...
WARNING:psycopg.pool:reconnection attempt in pool 'pool-1' failed after 60 sec
In [9]: with p.connection() as conn:
...: print(conn.execute("select 1").fetchone()[0])
...:
INFO:psycopg.pool:connection requested from 'pool-1'
INFO:psycopg.pool:growing pool 'pool-1' to 1
WARNING:psycopg.pool:error connecting in 'pool-1': connection is bad: No such file or directory
Is the server running locally and accepting connections on that socket?
WARNING:psycopg.pool:error connecting in 'pool-1': connection is bad: No such file or directory
Is the server running locally and accepting connections on that socket?
# This will wait for 30 seconds. In the meantime, restart the server:
INFO:psycopg.pool:adding new connection to the pool
INFO:psycopg.pool:growing pool 'pool-1' to 2
INFO:psycopg.pool:connection given by 'pool-1'
1
INFO:psycopg.pool:returning connection to 'pool-1'
INFO:psycopg.pool:adding new connection to the pool
INFO:psycopg.pool:growing pool 'pool-1' to 3
INFO:psycopg.pool:adding new connection to the pool
INFO:psycopg.pool:growing pool 'pool-1' to 4
INFO:psycopg.pool:adding new connection to the pool
# The getconn worked, and the pool is replenished. I believe this is a sane behaviour, which is provided by the current default (5m for reconnect timeout). Do you think the pool should do anything different, apart of triggering a maybe-grow on check? |
I think that's like 90% of the problem fixed! This feels like the more correct behavior of The only other part of this is more related to avoiding a loop when there's no hope of recovery... should_retry = True
def _update_connection_config(self: psycopg_pool.ConnectionPool):
# we've failed to reconnect
# and we might have been here before for this pool instance
nonlocal should_retry
if not should_retry:
return
# but if we haven't, try getting new creds
connection_config = get_new_creds()
# update conninfo with new creds
self.conninfo = connection_config
# don't retry recursively while growing the pool
# because if the new creds are bad, we'll end up back here
should_retry = False
# forcefully try to reconnect after updating creds
# we won't need this after your change
# pylint: disable=W0212
self._maybe_grow_pool()
# check each connection and throw out bad ones
self.check()
# if we have valid connections now, it's okay to retry in the future
pool_size = self.get_stats()["pool_size"]
logger.debug("pool size is %s after trying new creds", pool_size)
###################################
# this has always been true
# I think the AddConnection jobs are race-y
###################################
if pool_size > 0:
should_retry = True
logger.debug("will retry in the future")
return
return psycopg_pool.ConnectionPool(
connection_config,
reconnect_failed=_update_connection_config,
min_size=1,
max_size=1,
reconnect_timeout=1,
) If I'm swapping in different I've been trying to use |
Uhm... this use case is not supported, and not tested, but I see it being somewhat valid... I'll take a look if it's possible to support it.
As you are testing in this area, maybe you would like to propose a MR to add a maybe-grow in check? If not, I will work on these points, but it will have to wait a few days/weeks. |
The way I've been testing this, btw, is to start a while loop that queries and then sleeps, and while it's sleeping after a successful query, in a psql session I run alter role tester with password 'tester2'; select pg_terminate_backend(pid) from pg_stat_activity where datname='database_name' and application_name<>'psql'; to toggle the password to something else and drop existing pool connections. This forces the reconnect_failed callback to fire, and get_new_creds returns a new password. |
I'll look into getting a PR out for the _maybe_grow check. I may have time over the weekend, but I'm just starting to familiarize myself with this codebase, so it could be longer. Thanks for all of your help and all of your great work in general on this project!!! |
This test would be difficult to implement in a portable way in the test suite. What I would test is to use a different |
Also note that a valid way to switch connection target could be to usee a connection env var. For instance, if you want to switch host, you can leave the host unspecified in the connection string and change the |
|
Makes sense. I think the code is fine as is. I just meant "race-y" in the sense that AddConnection running in a scheduled task means this value is likely unpredictable from the |
There are circumstances where connections fail that can be resolved by updating
conninfo
, e.g. timed out temporary credentials (AWS RDS IAM Auth, etc.) which can be refetchedpsycopg/psycopg_pool/psycopg_pool/pool.py
Lines 522 to 540 in a816886
The successive call to
AddConnection
takes place in the else block, and doesn't fire regardless of any corrective actions present in thereconnect_failed
callbackLooking at all of the methods that invoke
AddConnection
for a public candidate onself
that can be used within thereconnect_failed
callback:self.check()
- won't work because an empty pool is not addressed hereself.resize()
- could work if the min size changes, but would mean altering the desired pool paramsI've found that calling
_maybe_grow_pool
in thereconnect_failed
callback is currently the best way to retryAddConnection
after modifyingself.conninfo
, but this is likely not public for a reason.One suggestion would be to allow the callback to return a value indicating whether or not AddConnection should be retried, and possibly supply a maximum number of times the callback can be invoked between successful connections to avoid an endless retry loop.
Another option would be to add a public method for modifying
conninfo
on-the-fly, that also verifies the new configuration allows for successful connections.If there's a more appropriate way to deal with this situation, please let me know.
The text was updated successfully, but these errors were encountered: