Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token aware routing #824

Merged
merged 22 commits into from
Oct 5, 2022
Merged

Token aware routing #824

merged 22 commits into from
Oct 5, 2022

Conversation

conorbros
Copy link
Member

@conorbros conorbros commented Sep 23, 2022

Token Aware Routing

Changes

NodePool

  • Added this struct to contain the logic for holding our node list and selecting which nodes to use for queries
  • We will need to add more fields for metadata (keyspace info) and I'll put those in there

TokenMap

  • Map of tokens->host_id

routing_key

  • Functions for calculating the routing key and generating Murmur3 token
  • Tests for calculating the routing key and asserting that NodePool and TokenMap return the correct nodes

Token aware routing workflow

Client sends a PREPARE statement to Shotover, we forward this to all nodes.

Drivers typically send a PREPARE statement to one node. If it tries to use an EXECUTE statement on a node that doesn't have the query prepared it will re-prepare on that node before sending the EXECUTE statement again. The problem with this on Shotover is that we don't know which node returned the unprepared error, so we don't know which one we should be sending the re-prepared statement to. By preparing on all nodes, we can avoid this. If a node is added and it returns an unprepared error, the driver will re-prepare it again and Shotover will forward that to all nodes.

Cache the metadata from the PREPARED result to a PREPARE statement

We need to store this information to calculate the routing key to subsequent EXECUTE statements. Uses the prepared metadata id field as a key.

Perform token aware routing on EXECUTE statements

Using the prepared metadata, calculate the routing key and then the Murmur3 token. If a replica doesn't exist just perform random routing.

Followup

  • Address the TODOs
  • Some easy performance wins can be made, mostly around avoiding calling the frame method on every message and instead checking the opcode bytes directly first.
  • We should be performing round robin load balancing instead of random.
  • Create new connections concurrently for prepared statements
  • Share prepared metadata across connections

Copy link
Member

@benbromhead benbromhead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far mate, def need to think a bit more about how we make sure queries are prepped though

Copy link
Member

@rukai rukai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I havent thoroughly reviewed the copy/pasted code because the plan is to upstream most of it back to cassandra-protocol.
But the rest all looks good to me!

@shotover shotover deleted a comment from github-actions bot Oct 5, 2022
Copy link
Member

@benbromhead benbromhead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one mate. This is an important bit of functionality!

We have a bit to clean up and work on, but just wanted to say great work!

@rukai rukai enabled auto-merge (squash) October 5, 2022 02:33
@rukai rukai merged commit 9386f81 into shotover:main Oct 5, 2022
@conorbros conorbros deleted the token-aware-routing branch October 5, 2022 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants