New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix closing of pools #784
Fix closing of pools #784
Conversation
@dvarrazzo how would you feel about merging this given test pass? Does the change make sense to you at all? |
# putconn will just close the returned connection. | ||
self._stop_workers(waiting, connections, timeout) | ||
# Now that the flag _closed is set, getconn will fail immediately, | ||
# putconn will just close the returned connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is now misplaced. "Now" refers to the code out of the self._lock
block, not to _stop_workers()
. (my bad, the lack of a blank line made it deceptive).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it'd be best you propose a diff / push the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let me try and figure out a different codepath.
# putconn will just close the returned connection. | ||
self._stop_workers(waiting, connections, timeout) | ||
|
||
self._pool.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you, I have no idea how you can get it the situation in which there are still connections in the pool. Maybe we call _stop_workers()
in a moment in which a worker is busy creating a connection, and it will receive the StopWorker
task only after it has added a new connection. Therefore I see how calling _pool.clear()
afterwards can make a difference.
One thing that is note done right here is the _pool.clear()
which doesn't close a connection eventually left there, which may result in a warning.
Looking at the _stop_worker()
, I find the connections
parameter extremely weird. It doesn't seem its responsibility to do that. Can you please try to refactor this code by removing the connections
parameter from _stop_workers
and then, as you did here, clear the pool, but closing the connection, after _stop_worker()
call? Something like:
with self._lock:
# [snip]
self._waiting.clear()
self._stop_workers(waiting, timeout)
for conn in self._pool:
conn.close()
self._pool.clear()
# Now that the flag _closed is set, getconn will fail immediately,
# putconn will just close the returned connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that _stop_workers()
is also called by __del__()
, but with no connection list, and also that it takes on itself the responsibility of closing the waiting clients... I don't know if this is really the best organisation of this code: feel free to propose cleanup refactoring if you see any obvious one.
The case has been reported in #784. While not easy to reproduce, it seems that it might be caused by the pool being closed while a worker is still trying to create a connection, which will be put in the _pool state after supposedly no other operation should have been performed. Stop the workers and then empty the pool only after they have stopped to run. Also refactor the cleanup of the pool and waiting queue, moving them to close(). There is no reason why a method called "stop workers" should empty them, and there is no other code path that use such feature. Close #784.
The case has been reported in #784. While not easy to reproduce, it seems that it might be caused by the pool being closed while a worker is still trying to create a connection, which will be put in the _pool state after supposedly no other operation should have been performed. Stop the workers and then empty the pool only after they have stopped to run. Also refactor the cleanup of the pool and waiting queue, moving them to close(). There is no reason why a method called "stop workers" should empty them, and there is no other code path that use such feature. Close #784.
The case has been reported in #784. While not easy to reproduce, it seems that it might be caused by the pool being closed while a worker is still trying to create a connection, which will be put in the _pool state after supposedly no other operation should have been performed. Stop the workers and then empty the pool only after they have stopped to run. Also refactor the cleanup of the pool and waiting queue, moving them to close(). There is no reason why a method called "stop workers" should empty them, and there is no other code path that use such feature. Close #784.
The case has been reported in #784. While not easy to reproduce, it seems that it might be caused by the pool being closed while a worker is still trying to create a connection, which will be put in the _pool state after supposedly no other operation should have been performed. Stop the workers and then empty the pool only after they have stopped to run. Also refactor the cleanup of the pool and waiting queue, moving them to close(). There is no reason why a method called "stop workers" should empty them, and there is no other code path that use such feature. Close #784.
I have a reproducible test where there are still connections left in
pool._pool
afterclose()
is run. This seems to fix it for my case but I have no idea why and can't really share a reproducible example. My guess is it has something to do with a race condition and/or weirdness with pytest and event loops.I'm opening this first to see what happens in CI.