Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid routing to cluster replicas that are loading data #1923

Closed
mp911de opened this issue Nov 29, 2021 · 5 comments
Closed

Avoid routing to cluster replicas that are loading data #1923

mp911de opened this issue Nov 29, 2021 · 5 comments
Labels
for: team-attention An issue we need to discuss as a team to make progress type: enhancement A general enhancement
Milestone

Comments

@mp911de
Copy link
Collaborator

mp911de commented Nov 29, 2021

Nodes that have joined the cluster and still load data should not receive any traffic unless they are fully replicated. As per @madolson, if the master_repl_offset is 0, we should skip that node from command routing.

@mp911de mp911de added the type: enhancement A general enhancement label Nov 29, 2021
@mp911de mp911de added the for: team-attention An issue we need to discuss as a team to make progress label Jan 7, 2022
@mp911de
Copy link
Collaborator Author

mp911de commented Jan 7, 2022

Removing replicas based on the repl-offset will also remove replicas that cannot contact the master/upstream node although the node is functional otherwise. That is quite a significant change in comparison to leniently accept all replicas.

@jhmartin
Copy link

jhmartin commented Jan 7, 2022

Is there a middle ground? Accept nodes that have /ever/ replicated successfully, and ignore nodes that have /never/ replicated?

@madolson
Copy link

madolson commented Jan 9, 2022

Starting in 6.2, repl-offset is preserved until the point the data is flushed, which means that it is no longer able to usefully serve traffic (except for pubsub, but I would posit we don't care about that here).

@mp911de
Copy link
Collaborator Author

mp911de commented Jan 10, 2022

For the time being, we have that switch avoids command routing to replicas without data within the routing mechanism. If it turns out to be too invasive, we can move things into the individual ReadFrom strategies so it can be controlled by applications whether they want to talk to replicas without data.

@mp911de mp911de added this to the 6.1.6 milestone Jan 10, 2022
@mp911de
Copy link
Collaborator Author

mp911de commented Jan 10, 2022

Refined the description, we're using INFO REPLICATION and the master_repl_offset field to determine the repl-offset.

mp911de added a commit that referenced this issue Jan 10, 2022
We now no longer route traffic to read replicas that have not successfully replicated yet (master_repl_offset = 0) to avoid LOADING error replies. To determine the replication offset, we query each node about its repl_offset upon obtaining the topology.
mp911de added a commit that referenced this issue Jan 10, 2022
We now no longer route traffic to read replicas that have not successfully replicated yet (master_repl_offset = 0) to avoid LOADING error replies. To determine the replication offset, we query each node about its repl_offset upon obtaining the topology.
@mp911de mp911de closed this as completed Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: team-attention An issue we need to discuss as a team to make progress type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

3 participants