Skip to content

Client routes: partial CLIENT_ROUTES_CHANGE updates can break connection-id stickiness for a host #813

@dkropachev

Description

@dkropachev

cassandra/client_routes.py appears to mishandle partial CLIENT_ROUTES_CHANGE updates.

Problem

The route store keeps only one selected route per host_id, and _select_preferred_routes() tries to preserve stickiness by preferring the currently selected connection_id when that same connection_id is present in the newly fetched candidates.

However, for partial CLIENT_ROUTES_CHANGE handling, the fetched candidates may contain only a subset of connection IDs for an affected host.

Failure mode

Consider this sequence:

  1. Host X is currently using sticky route (connection_id=A, host_id=X).
  2. A CLIENT_ROUTES_CHANGE event arrives for host X, but only for connection IDs B / C.
  3. _query_routes_for_change_event() fetches only routes matching the event-derived filters.
  4. _select_preferred_routes() cannot keep A, because A is absent from the fetched candidates.
  5. merge() drops the old route for affected host X and replaces it with one of the newly fetched routes.

This means the driver can switch away from A even though it has not learned that (A, X) was removed. It only learned that some other routes for host X changed.

Relevant code

  • Explicit stickiness selection: _select_preferred_routes()
  • Partial event merge: merge()
  • Event handling path: handle_client_routes_change()
  • Event query: _query_routes_for_change_event()

Expected behavior

The driver should not switch away from the currently used connection_id for a host unless it has enough information to conclude that the sticky route is no longer valid.

In other words, absence of (A, X) from a partial event query result should not be treated as proof that (A, X) was deleted.

Possible direction

One likely fix is to store all known routes for each host, not only the currently selected one, and track stickiness separately. That would allow the driver to:

  1. preserve stickiness when unrelated connection IDs for the same host change,
  2. delete routes only when the event/query gives enough information to do so safely,
  3. choose a replacement deterministically if the sticky route is actually removed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions