New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple anti-entropy mechanism #24
Simple anti-entropy mechanism #24
Conversation
In order for FastPaxos to also have its set of Settings, MemberShipService needs to be passed a common Settings class that implements both FastPaxos.ISettings and MembershipService.ISettings
It can happen that a node misses a part of the consensus messages whilst still being able to send out its own vote (unidirectional network partition, message overload, ...). In this case, the rest of the group will see this node as being part of the group and the monitoring mechanism will still be working as expected, but the stale node will run an old configuration. In order to enforce consistency in this case, the following new anti-entropy mechanism is used: - each node maintains a set of configurations it has been part of - probe messages now contain the configuration ID of the observer - when a node receives a probe message with a configuration ID it does not know, it will start a background task to check again after a configured timeout (1 minute by default) - if the configuration ID is still unknown after the timeout has reached, the node leaves (using the LEAVE protocol)
Thanks for the patch @manuelbernhardt. If a node is receiving probe messages but with a different configuration ID, it means it is already part of the membership. It might be better for the node to simply learn how to get to the current configuration ID rather than leave. What we need to maintain at nodes is a log of membership changes and the corresponding configuration IDs (there was a time when I did maintain such a log in MembershipService for this precise use case but removed it because I wasn't using it). A node X that is part of a membership but has not heard enough votes from the previous consensus round can safely learn the new value of the membership by asking any one of the nodes Y in the new round (because if Y actually moved to the new round, then it is guaranteed to be safe because a Fast Paxos quorum of nodes approved that move). The nodes X is receiving probe messages from are good candidates to ask. In doing so, we avoid unnecessary removals. The protocol I just mentioned, as far as the state machine is concerned, will behave like simply receiving missing messages. Another concern is to make sure this works even when using custom failure detectors. Thoughts? |
@lalithsuresh yes I was thinking about a recovery synchronization (like the one in Lifeguard) - the particular case that led me to implement this now is that I'm facing rogue nodes that exhibit what seems to be short-lived partitions for some of the other nodes in the network. There seem to be a few of those nodes that are "flaky" - sometimes they work normally, sometimes they don't receive messages from some of the other nodes in the network (but they still are able to send). Even if they could recover, I don't necessarily want these nodes around as I don't trust them. So I'm thinking perhaps this feature could be configured to behave either one way or the other. That is, I should've marked this branch "work in progress" - I definitely think that the synchronization should be part of it. Do you have any thoughts about how to go about asking one of the observers about the new configuration? It looks like a new message type to me |
@lalithsuresh I implemented a synchronization mechanism, in addition to the leave behavior. Actually I'm thinking of getting rid of that one since it feels like too many possibilities. The way it works is to maintain a history of change proposals and their details (joiner metadata and UUID). Seems to be working quite nicely. It is implemented in such a way that a node can survive missing multiple rounds - I'm thinking about the case of having e.g two quick rounds with a slow / lenient failure detector. |
@manuelbernhardt
|
It can happen that a node misses a part of the consensus messages whilst still being able to send out its own vote (unidirectional network partition, message overload, ...). In this case, the rest of the group will see this node as being part of the group and the monitoring mechanism will still be working as expected, but the stale node will run an old configuration.
In order to enforce consistency in this case, the following new anti-entropy mechanism is introduced: