Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix closing of pools #784

Closed

Conversation

adriangb
Copy link

@adriangb adriangb commented Apr 10, 2024

I have a reproducible test where there are still connections left in pool._pool after close() is run. This seems to fix it for my case but I have no idea why and can't really share a reproducible example. My guess is it has something to do with a race condition and/or weirdness with pytest and event loops.

I'm opening this first to see what happens in CI.

@adriangb adriangb marked this pull request as ready for review April 10, 2024 20:33
@adriangb adriangb marked this pull request as draft April 10, 2024 20:33
@adriangb adriangb marked this pull request as ready for review April 10, 2024 20:44
@adriangb
Copy link
Author

@dvarrazzo how would you feel about merging this given test pass? Does the change make sense to you at all?

# putconn will just close the returned connection.
self._stop_workers(waiting, connections, timeout)
# Now that the flag _closed is set, getconn will fail immediately,
# putconn will just close the returned connection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is now misplaced. "Now" refers to the code out of the self._lock block, not to _stop_workers(). (my bad, the lack of a blank line made it deceptive).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it'd be best you propose a diff / push the change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me try and figure out a different codepath.

# putconn will just close the returned connection.
self._stop_workers(waiting, connections, timeout)

self._pool.clear()
Copy link
Member

@dvarrazzo dvarrazzo Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you, I have no idea how you can get it the situation in which there are still connections in the pool. Maybe we call _stop_workers() in a moment in which a worker is busy creating a connection, and it will receive the StopWorker task only after it has added a new connection. Therefore I see how calling _pool.clear() afterwards can make a difference.

One thing that is note done right here is the _pool.clear() which doesn't close a connection eventually left there, which may result in a warning.

Looking at the _stop_worker(), I find the connections parameter extremely weird. It doesn't seem its responsibility to do that. Can you please try to refactor this code by removing the connections parameter from _stop_workers and then, as you did here, clear the pool, but closing the connection, after _stop_worker() call? Something like:

        with self._lock:
            # [snip]
            self._waiting.clear()

            self._stop_workers(waiting, timeout)

            for conn in self._pool:
                conn.close()
            self._pool.clear()

        # Now that the flag _closed is set, getconn will fail immediately,
        # putconn will just close the returned connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that _stop_workers() is also called by __del__(), but with no connection list, and also that it takes on itself the responsibility of closing the waiting clients... I don't know if this is really the best organisation of this code: feel free to propose cleanup refactoring if you see any obvious one.

dvarrazzo added a commit that referenced this pull request Apr 10, 2024
The case has been reported in #784. While not easy to reproduce, it
seems that it might be caused by the pool being closed while a worker is
still trying to create a connection, which will be put in the _pool
state after supposedly no other operation should have been performed.

Stop the workers and then empty the pool only after they have stopped to
run.

Also refactor the cleanup of the pool and waiting queue, moving them
to close(). There is no reason why a method called "stop workers" should
empty them, and there is no other code path that use such feature.

Close #784.
dvarrazzo added a commit that referenced this pull request Apr 10, 2024
The case has been reported in #784. While not easy to reproduce, it
seems that it might be caused by the pool being closed while a worker is
still trying to create a connection, which will be put in the _pool
state after supposedly no other operation should have been performed.

Stop the workers and then empty the pool only after they have stopped to
run.

Also refactor the cleanup of the pool and waiting queue, moving them
to close(). There is no reason why a method called "stop workers" should
empty them, and there is no other code path that use such feature.

Close #784.
dvarrazzo added a commit that referenced this pull request Apr 10, 2024
The case has been reported in #784. While not easy to reproduce, it
seems that it might be caused by the pool being closed while a worker is
still trying to create a connection, which will be put in the _pool
state after supposedly no other operation should have been performed.

Stop the workers and then empty the pool only after they have stopped to
run.

Also refactor the cleanup of the pool and waiting queue, moving them
to close(). There is no reason why a method called "stop workers" should
empty them, and there is no other code path that use such feature.

Close #784.
dvarrazzo added a commit that referenced this pull request Apr 12, 2024
The case has been reported in #784. While not easy to reproduce, it
seems that it might be caused by the pool being closed while a worker is
still trying to create a connection, which will be put in the _pool
state after supposedly no other operation should have been performed.

Stop the workers and then empty the pool only after they have stopped to
run.

Also refactor the cleanup of the pool and waiting queue, moving them
to close(). There is no reason why a method called "stop workers" should
empty them, and there is no other code path that use such feature.

Close #784.
@dvarrazzo dvarrazzo closed this in 910383f Apr 12, 2024
@adriangb adriangb deleted the fix-close-pool-questionmark branch April 12, 2024 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants