Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add resilience in case the cluster falls bellow 1 node #53

Closed
wants to merge 3 commits into from

Conversation

narvikd
Copy link
Owner

@narvikd narvikd commented Feb 27, 2023

Even if it isn't supported to go bellow 1 leader and 1 node, it can happen.
In that case the node/cluster will be stuck without observables to prevent serious side effects until it reaches the state where the two nodes are again back online.

Maybe that state never comes, that's why it should try to join an existent leader if there's already one.

Signed-off-by: narvikd <84069271+narvikd@users.noreply.github.com>
@narvikd
Copy link
Owner Author

narvikd commented Feb 27, 2023

Merging this PR can cause the following:

When a node is stuck in a Candidate state, it will trigger elections, which will cause the system to increment the term.
This can lead to a situation where an offline node has the most current state of the system, and the sick node doesn't.
The offline node will be forced to join this node when it comes online. Losing information in the process.

@narvikd
Copy link
Owner Author

narvikd commented Feb 27, 2023

If a leader is found, the only right option to prevent rewrites it's a reinstall:
hashicorp/raft#530
hashicorp/raft#477
Not relevant but interesting: hashicorp/raft#525

Signed-off-by: narvikd <84069271+narvikd@users.noreply.github.com>
@narvikd
Copy link
Owner Author

narvikd commented Feb 27, 2023

It seems to work, but this is very dangerous:
In case of a lack of nodes, because the whole cluster crashed it should be reviewed manually to prevent data losses.

@narvikd narvikd added the wontfix This will not be worked on label Feb 27, 2023
@narvikd
Copy link
Owner Author

narvikd commented Feb 27, 2023

Will not fix.

@narvikd narvikd closed this Feb 27, 2023
@narvikd narvikd deleted the feat/recover_from_just_one_node branch February 28, 2023 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant