Skip to content

Add LWTRetryPolicy: retry CAS timeouts on same host with backoff#783

Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:feature/lwt-retry-policy
Draft

Add LWTRetryPolicy: retry CAS timeouts on same host with backoff#783
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:feature/lwt-retry-policy

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Apr 1, 2026

Summary

LWT queries use Paxos consensus where the first replica (Paxos coordinator/leader) drives the consensus rounds. When a CAS write times out, retrying on a different host causes Paxos contention — the new coordinator must compete with the original, potentially causing cascading timeouts across the cluster.

Currently, no built-in retry policy retries CAS write timeouts at all — they are all RETHROWN immediately:

  • RetryPolicy.on_write_timeout: CAS → RETHROW
  • ExponentialBackoffRetryPolicy.on_write_timeout: CAS → RETHROW
  • DowngradingConsistencyRetryPolicy.on_write_timeout: CAS → RETHROW

This PR adds LWTRetryPolicy, a new retry policy that extends ExponentialBackoffRetryPolicy with LWT-aware behavior:

Scenario Decision Rationale
CAS write timeout RETRY same host + backoff Stay on Paxos coordinator to avoid contention
Serial read timeout RETRY same host + backoff CAS read at serial CL, same coordinator logic
Serial unavailable RETRY next host + backoff Paxos quorum lost on this node, try another
Non-CAS operations Delegate to parent Standard ExponentialBackoffRetryPolicy behavior

This is modeled after gocql's LWTRetryPolicy interface, which retries LWT queries on the same host to avoid Paxos contention. The key comment from gocql (line 188):

"Retrying on a different host is fine for normal (non-LWT) queries, but in case of LWTs it will cause Paxos contention and possibly even timeouts if other clients send statements touching the same partition to the same time."

Usage

from cassandra.cluster import Cluster
from cassandra.policies import LWTRetryPolicy

# Use as the default retry policy
cluster = Cluster(default_retry_policy=LWTRetryPolicy(max_num_retries=3))

# Or assign to a specific statement
statement.retry_policy = LWTRetryPolicy(max_num_retries=5)

Changes

  • cassandra/policies.py: Added LWTRetryPolicy class (extends ExponentialBackoffRetryPolicy)
  • tests/unit/test_policies.py: Added LWTRetryPolicyTest with 21 tests

Tests

21 new tests covering:

  • CAS write timeout retries on same host with backoff
  • Backoff delay increases with retry attempts
  • Max retries exceeded → RETHROW
  • Consistency level preserved across retries
  • Non-CAS writes delegate to parent (SIMPLE→RETHROW, BATCH_LOG→RETRY, COUNTER→RETHROW)
  • Serial read timeout retries on same host (SERIAL and LOCAL_SERIAL)
  • Serial unavailable retries on next host
  • Non-serial operations delegate to parent policy
  • Request errors inherit parent behavior
  • Constructor defaults and customization
  • All methods return proper 3-tuples

All 103 tests in tests/unit/test_policies.py pass.

Related

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 7, 2026

CC @calebxyz
It needs more review (for me first of all), but looks important to push for at some point.

@calebxyz
Copy link
Copy Markdown

calebxyz commented Apr 7, 2026

CC @calebxyz It needs more review (for me first of all), but looks important to push for at some point.

If this behavior is something that we have on go drivers it should be good, do we know the performance for LWT on go vs java for example? Or vs python.
Cc @temichus

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 7, 2026

CC @calebxyz It needs more review (for me first of all), but looks important to push for at some point.

If this behavior is something that we have on go drivers it should be good, do we know the performance for LWT on go vs java for example? Or vs python. Cc @temichus

@calebxyz - it's pointless to compare the different drivers' performance - they differ greatly. What is important is the correct and optimized behavior - and there we still have gaps. I think we are very far from testing the correct behavior - we need many more system level tests on one hand (and on the other hand, I'm against testing it in full setup - which is why I've created scylladb/scylla-ccm#731 (that is probably not ready yet , but that's a different issue)

@calebxyz
Copy link
Copy Markdown

calebxyz commented Apr 7, 2026

CC @calebxyz It needs more review (for me first of all), but looks important to push for at some point.

If this behavior is something that we have on go drivers it should be good, do we know the performance for LWT on go vs java for example? Or vs python. Cc @temichus

@calebxyz - it's pointless to compare the different drivers' performance - they differ greatly. What is important is the correct and optimized behavior - and there we still have gaps.

This is sad, the amount of unpredictability is horrible

@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 7, 2026

CC @calebxyz It needs more review (for me first of all), but looks important to push for at some point.

If this behavior is something that we have on go drivers it should be good, do we know the performance for LWT on go vs java for example? Or vs python. Cc @temichus

@calebxyz - it's pointless to compare the different drivers' performance - they differ greatly. What is important is the correct and optimized behavior - and there we still have gaps.

This is sad, the amount of unpredictability is horrible

That's one of the major reasons to move some to be Rust based - Rust, CPP-over-Rust, NodeJS-over-Rust, Python-over-Rust. (and we'll stay with Java and Go, I reckon).
SAME situation with our Alternator clients!

LWT queries use Paxos consensus where the coordinator is the Paxos leader.
Retrying on a different host causes Paxos contention — the new coordinator
must compete with the original one, potentially causing cascading timeouts.

LWTRetryPolicy (extends ExponentialBackoffRetryPolicy) handles this by:
- CAS write timeouts: retry on SAME host with exponential backoff
- Serial consistency read timeouts: retry on SAME host with backoff
- Serial consistency unavailable: retry on NEXT host (paxos quorum lost)
- Non-CAS operations: delegate to base ExponentialBackoffRetryPolicy

Modeled after gocql's LWTRetryPolicy interface.
@mykaul mykaul force-pushed the feature/lwt-retry-policy branch from f1a865b to d2a8538 Compare April 7, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants