session: fix pool renewal race causing double statement execution by nyh · Pull Request #838 · scylladb/python-driver

nyh · 2026-05-03T19:06:52Z

When two or more nodes are bootstrapped concurrently the Python driver
can execute the same CQL statement twice, causing spurious "already
exists" errors in the caller. This has been observed as flaky test
failures across the ScyllaDB test suite for the past two years, and
worked around by using idempotent DDL forms (IF NOT EXISTS / IF EXISTS)
in dozens of tests.

Root cause

The race unfolds as follows:

Two on_add notifications arrive at roughly the same time, one for
each new node. Each one calls session.add_or_renew_pool(), which
submits run_add_or_renew_pool() to the thread pool and returns.
Both submissions are in-flight concurrently.
The first add_or_renew_pool() finishes and calls _finalize_add(),
which notifies load-balancing policies and then calls
session.update_created_pools() for every live session.
update_created_pools() iterates all known hosts. For the second
host, whose run_add_or_renew_pool() has not yet completed, it sees
self._pools.get(host) == None (or a shut-down pool) and therefore
submits another run_add_or_renew_pool() for that host.
Now two tasks are connecting to the same host. The first one
finishes and installs pool-A in self._pools, then runs a statement
(e.g. CREATE ROLE) that is in-flight on pool-A.
The second task finishes, reads the stale previous = self._pools.get(host)
value (captured before the lock was taken — another bug), installs
pool-B and then shuts down pool-A. The in-flight CREATE ROLE request
is orphaned; the driver retries it on pool-B. The server executes it
a second time and returns "Role ... already exists".

Fix

Three coordinated changes to cassandra/cluster.py:

Session.init: add self._pending_pool_futures = {}, a dict mapping
host -> Future for any in-flight pool creation, guarded by _lock.
add_or_renew_pool: before submitting run_add_or_renew_pool(), check
_pending_pool_futures under _lock. If an in-flight future already
exists for the host, return it immediately — this is the primary fix
that prevents the duplicate submission from update_created_pools.
Additionally, move the previous = self._pools.get(host) read inside
the lock so the live-pool check is atomic with the installation of the
new pool: if a concurrent creation has already installed a live pool
by the time we finish connecting, discard our new pool instead of
replacing the live one (defense-in-depth).
Cleanup of _pending_pool_futures is handled by a done_callback
registered on the future immediately after it is stored, both
operations performed under _lock. The callback only removes the entry
if it still points at the same future it was registered on, so a
concurrent remove_pool followed by a new add_or_renew_pool is not
affected. This guarantees cleanup under all exit paths including
unhandled exceptions inside run_add_or_renew_pool, and avoids the
race where a fast-completing task pops the key before the outer code
has stored the future.
remove_pool: clear _pending_pool_futures[host] under _lock so that
if a host is removed and immediately re-added, add_or_renew_pool
submits a fresh creation rather than reusing a stale done future.

Tests

Five new unit tests are added in PoolRenewalRaceTest
(tests/unit/test_cluster.py). They exercise the new code paths without
requiring a real cluster connection by constructing a minimal Session
via object.new and mocking the executor and profile manager:

test_add_or_renew_pool_reuses_inflight_future: places a pending
Future in _pending_pool_futures and verifies that add_or_renew_pool
returns it without submitting a new task to the executor.
test_add_or_renew_pool_discards_duplicate_when_live_pool_exists:
exercises the real production code path by patching HostConnection
to a lightweight stub and using a synchronous executor shim that
runs the submitted callable inline. Pre-installs a live pool for
the host, then calls add_or_renew_pool() and asserts that the live
pool is not replaced and the newly connected stub pool is shut down.
test_remove_pool_clears_pending_future: verifies that remove_pool
clears _pending_pool_futures so the next add_or_renew_pool call
submits a fresh task.
test_done_callback_clears_pending_future: verifies that the
done_callback fires and removes the entry from _pending_pool_futures
once the future completes.
test_done_callback_does_not_clear_newer_future: verifies the identity
guard — an old future's callback does not evict a newer future that
was installed in its place after a remove_pool + add_or_renew_pool.

Fixes: #317

nyh · 2026-05-03T19:09:35Z

This is another fix for #317. This one was written entirely by AI (vscode + Claude Sonnet 4.6). Unlike #835 which uses "optimistic locking" - it can create two pools but then closes the unneeded one - this implementation tries to avoid creating a new pool if one is in the process of being created, so the implementation is a bit longer and it's not clear if the performance advantage is worth it.

Copilot

Pull request overview

Fixes a race during concurrent host additions where Session.update_created_pools() could schedule duplicate pool creations for the same host, leading to an in-flight request being orphaned and retried on a new pool (double execution / “already exists” errors).

Changes:

Track in-flight per-host pool-creation tasks via Session._pending_pool_futures and reuse them to avoid duplicate scheduling.
Make pool replacement/discard decisions atomic by reading previous = self._pools.get(host) inside the session lock and discarding newly-created pools when a live pool already exists.
Add unit tests covering reuse of in-flight futures and pending-future invalidation on remove_pool.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`cassandra/cluster.py`	Adds `_pending_pool_futures` and uses it to dedupe concurrent pool creation / renewal work; adjusts locking to prevent replacing a live pool.
`tests/unit/test_cluster.py`	Adds regression tests for the pool-renewal race and for `_pending_pool_futures` behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nyh · 2026-05-04T06:18:02Z

@dkropachev one of the test runs, "libev 3.11", hung. I don't know what to make of it, or how to restart it.

Anyway, @avikivity @mykaul this is (issue #317) is one of the issues that hit MANY MANY Scylla "cluster tests" ever since the Scylla "cluster test" framework got a function that spawns several serers concurrently. This is perhaps the driver issues that caused most flaky tests ever. Let's get it fixed for good (there's this patch, and also #835, as candidate fixes).

avikivity · 2026-05-04T09:21:19Z

@dkropachev one of the test runs, "libev 3.11", hung. I don't know what to make of it, or how to restart it.

Anyway, @avikivity @mykaul this is (issue #317) is one of the issues that hit MANY MANY Scylla "cluster tests" ever since the Scylla "cluster test" framework got a function that spawns several serers concurrently. This is perhaps the driver issues that caused most flaky tests ever. Let's get it fixed for good (there's this patch, and also #835, as candidate fixes).

Thanks for fixing it!

dkropachev

I think this still has a few race holes around coalescing by host only:

cassandra/cluster.py:3321

_pending_pool_futures is keyed only by host, so a later add_or_renew_pool(host, is_host_addition=True) can reuse a future created by update_created_pools() with is_host_addition=False (cluster.py:3376). The submitted closure keeps the first flag and passes it to signal_connection_failure() at cluster.py:3254 / 3262. On reconnect, _HostReconnectionHandler dispatches based on that flag (pool.py:360-363), so a failed new-host add can reconnect through on_up() instead of on_add().

cassandra/cluster.py:3321

The reused future also keeps the first computed distance from cluster.py:3245, used to construct HostConnection at cluster.py:3251. If distance changes while creation is pending, the later call cannot replace the old creation. A later update_created_pools() can update pool.host_distance (cluster.py:3377-3382), but it cannot fix construction done with the old distance, e.g. a REMOTE pool created with connect_to_remote_hosts=False has no connections.

cassandra/cluster.py:3342

remove_pool() clears _pending_pool_futures, but the old running task can still publish its pool at cluster.py:3301. If a fresh creation starts after removal, the stale task can finish first, install its pool, and make the fresh task discard its pool via the live-pool check at cluster.py:3288. Need a generation/identity guard before publishing, not just cleanup of the pending map.

When two or more nodes are bootstrapped concurrently the Python driver can execute the same CQL statement twice, causing spurious "already exists" errors in the caller. This has been observed as flaky test failures across the ScyllaDB test suite for the past two years, and worked around by using idempotent DDL forms (IF NOT EXISTS / IF EXISTS) in dozens of tests. Root cause ---------- The race unfolds as follows: 1. Two on_add notifications arrive at roughly the same time, one for each new node. Each one calls session.add_or_renew_pool(), which submits run_add_or_renew_pool() to the thread pool and returns. Both submissions are in-flight concurrently. 2. The first add_or_renew_pool() finishes and calls _finalize_add(), which notifies load-balancing policies and then calls session.update_created_pools() for every live session. 3. update_created_pools() iterates all known hosts. For the second host, whose run_add_or_renew_pool() has not yet completed, it sees self._pools.get(host) == None (or a shut-down pool) and therefore submits *another* run_add_or_renew_pool() for that host. 4. Now two tasks are connecting to the same host. The first one finishes and installs pool-A in self._pools, then runs a statement (e.g. CREATE ROLE) that is in-flight on pool-A. 5. The second task finishes, reads the stale `previous = self._pools.get(host)` value (captured *before* the lock was taken — another bug), installs pool-B and then shuts down pool-A. The in-flight CREATE ROLE request is orphaned; the driver retries it on pool-B. The server executes it a second time and returns "Role ... already exists". Fix --- Three coordinated changes to cassandra/cluster.py: * Session.__init__: add self._pending_pool_futures = {}, a dict mapping host -> entry (with future, creation_id, distance, is_host_addition_cell) for any in-flight pool creation, guarded by _lock. * add_or_renew_pool: before submitting run_add_or_renew_pool(), check _pending_pool_futures under _lock. If an in-flight future already exists for the host with the same distance, return it immediately — this is the primary fix that prevents the duplicate submission from update_created_pools. If is_host_addition=True on the new call but the existing entry has False, upgrade it in-place via a shared is_host_addition_cell so the closure passes the correct flag to signal_connection_failure() and _HostReconnectionHandler dispatches through on_add() instead of on_up() on reconnect. If the distance changed, submit a fresh task (the old HostConnection was constructed with stale distance, e.g. no connections for REMOTE with connect_to_remote_hosts=False). Each submission gets a unique creation_id token. The closure checks _pending_pool_futures[host].creation_id before installing its pool: if remove_pool() ran and a fresher creation was submitted while this task was connecting, the stale task discards its pool rather than overwriting the fresher one. Additionally, move the `previous = self._pools.get(host)` read inside the lock so the live-pool check is atomic with the installation of the new pool: if a concurrent creation has already installed a live pool by the time we finish connecting, discard our new pool instead of replacing the live one (defense-in-depth). Cleanup of _pending_pool_futures is handled by a done_callback registered on the future. The callback acquires _lock and only clears the entry if it still holds the same creation_id it was registered on, so a concurrent remove_pool followed by a new add_or_renew_pool is not affected. The entry is stored before calling submit() so that the closure always finds a valid creation_id in the dict, even when the executor runs the task synchronously. * remove_pool: clear _pending_pool_futures[host] under _lock so that if a host is removed and immediately re-added, add_or_renew_pool submits a fresh creation rather than reusing a stale done future. Tests ----- Five new unit tests are added in PoolRenewalRaceTest (tests/unit/test_cluster.py). They exercise the new code paths without requiring a real cluster connection by constructing a minimal Session via object.__new__ and mocking the executor and profile manager. The tests use the new dict-based entry format for _pending_pool_futures: * test_add_or_renew_pool_reuses_inflight_future: places a pending entry in _pending_pool_futures and verifies that add_or_renew_pool returns the existing future without submitting a new task. * test_add_or_renew_pool_discards_duplicate_when_live_pool_exists: exercises the real production code path by patching HostConnection to a lightweight stub and using a synchronous executor shim that runs the submitted callable inline. Pre-installs a live pool for the host, then calls add_or_renew_pool() and asserts that the live pool is not replaced and the newly connected stub pool is shut down. * test_remove_pool_clears_pending_future: verifies that remove_pool clears _pending_pool_futures so the next add_or_renew_pool call submits a fresh task. * test_done_callback_clears_pending_future: verifies that the done_callback fires and removes the entry from _pending_pool_futures once the future completes. * test_done_callback_does_not_clear_newer_future: verifies the creation_id guard — an old future's callback does not evict a newer entry installed in its place after a remove_pool + add_or_renew_pool. Fixes: scylladb#317

nyh · 2026-05-04T11:41:07Z

(My AI) fixed all three issues identified by @dkropachev ('s AI).

Thanks for the detailed review! You identified three real races. Here's what the new version does to fix each one:

is_host_addition flag race

Instead of a plain bool, the closure now captures is_host_addition_cell = [is_host_addition] — a mutable one-element list. If a second call arrives with is_host_addition=True while the first future is still in-flight, the coalescing path upgrades the cell in-place (entry['is_host_addition_cell'][0] = True) before returning the existing future. The closure then reads is_host_addition_cell[0] at call time, so it always sees the stricter on_add() reconnect path when needed.

Distance coalescing race

The entry dict now stores the distance computed at submission time. When a second add_or_renew_pool() call arrives and finds an in-flight entry, it only coalesces if entry['distance'] == distance. If the distance has changed, the code falls through and submits a fresh task with the new distance. The creation_id guard (see below) ensures the fresh task's pool wins.

Stale task publishing stale pool

Each submission generates a unique creation_id = object(). Before calling submit(), the entry (including creation_id) is stored in _pending_pool_futures under _lock. The closure, still under _lock, checks entry['creation_id'] is creation_id before installing the pool — if it doesn't match (because remove_pool() cleared the entry, or a fresh submission replaced it), the closure discards its pool. remove_pool() pops the entry under _lock, so any task that started before the removal and checks afterward will see the mismatch and discard. The done-callback that clears the entry also verifies the creation_id before removing, so a concurrent fresh submission isn't cleared.

The entry is stored before submit() returns so the guard works even when the executor runs the task synchronously (e.g. in tests with a direct-call executor).

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+
        def run_add_or_renew_pool():
            try:
               new_pool = HostConnection(host, distance, self)


dkropachev · 2026-05-04T14:59:49Z

@nyh , i have found 3 more correcness issues and it will go like this forever, I have already tried to solve it this way.
Unfrtunately driver code is too interconnected that does not allow solve problems like this this way.

I am working on another solution that will work better at - #836 that basically do event queuing and fencing, it will 100% work better.

nyh · 2026-05-04T15:20:40Z

I am working on another solution that will work better at - #836

Funny how we didn't touch this issue #317 for a couple of years (since test/cluster became more aggressive in creating nodes in parallel), and now in two days we have three different pull requests to fix it ;-)

nyh requested review from andrzej-jackowski-scylladb, Copilot and dkropachev May 3, 2026 19:06

Copilot started reviewing on behalf of nyh May 3, 2026 19:07 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread cassandra/cluster.py

Comment thread cassandra/cluster.py Outdated

Comment thread tests/unit/test_cluster.py Outdated

nyh mentioned this pull request May 3, 2026

[DRAFT] Fix pool renewal race #835

Draft

8 tasks

nyh requested a review from Lorak-mmk May 3, 2026 19:14

nyh force-pushed the fix-317 branch from 638fa6d to fb3ce33 Compare May 3, 2026 19:23

nyh requested a review from Copilot May 3, 2026 19:25

Copilot started reviewing on behalf of nyh May 3, 2026 19:25 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread cassandra/cluster.py Outdated

nyh force-pushed the fix-317 branch from fb3ce33 to 4cd70c9 Compare May 3, 2026 19:31

nyh requested a review from Copilot May 3, 2026 19:32

Copilot started reviewing on behalf of nyh May 3, 2026 19:32 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

nyh added the CI-Stability label May 3, 2026

dkropachev requested changes May 4, 2026

View reviewed changes

nyh force-pushed the fix-317 branch from 4cd70c9 to b2872d1 Compare May 4, 2026 11:38

nyh requested a review from Copilot May 4, 2026 11:41

Copilot started reviewing on behalf of nyh May 4, 2026 11:41 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

Comment thread cassandra/cluster.py

def run_add_or_renew_pool():

try:

new_pool = HostConnection(host, distance, self)

Conversation

nyh commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Fix

Tests

Uh oh!

nyh commented May 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

nyh commented May 4, 2026

Uh oh!

avikivity commented May 4, 2026

Uh oh!

dkropachev left a comment

Choose a reason for hiding this comment

Uh oh!

nyh commented May 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

dkropachev commented May 4, 2026

Uh oh!

nyh commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nyh commented May 3, 2026 •

edited

Loading