Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a NonVoter node should never be able to transition to a Candidate state #483

Merged
merged 4 commits into from Jan 5, 2022

Conversation

dhiaayachi
Copy link
Contributor

This behaviour was observed in Consul for read-replicas nodes (which are raft nonVoters node). When the cluster is not stable (leader transition) there is a possibility to a nonVoter node to step up as leader (which is not supposed to happen)

@mkeeler and me where able to reproduce the behaviour of having a nonVoter node transition to a Candidate state, which is an indication that this node is possibly able to win an election. The test do the following steps:

  1. Create a cluster with 3 Voters node and 1 nonVoter node and wait for stability
  2. Remove one follower node to trigger a configuration update
  3. after the leader send an appendEntry log to the nonVoter to update the latest configuration and before it send the appendEntry to commit the new configuration, partition the leader
  4. Check that the nonVoter transition to a Candidate state

Unfortunately step 3 is very time sensitive and we were not able to come with a test that would consistently reproduce the issue but we were able to confirm that after the fix in this PR the issue could not be reproduced.

raft.go Outdated Show resolved Hide resolved
Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. It's a shame it's so hard to validate deterministically with a unit test.

This is probably another case for our "one day" refactor of this library to make it easier to mock and drive in a controlled way in unit tests. I don't think we should block this fix on that though as it seems pretty clear that it's never right for a non-voter to enter Candidate state.

@dhiaayachi
Copy link
Contributor Author

I think this looks good. It's a shame it's so hard to validate deterministically with a unit test.

This is probably another case for our "one day" refactor of this library to make it easier to mock and drive in a controlled way in unit tests. I don't think we should block this fix on that though as it seems pretty clear that it's never right for a non-voter to enter Candidate state.

@banks reading your comment about testing I had the idea to test this at a more unit level to at least avoid a future regression and have the transition out of follower state defined in a test. Let me know what you think about that?

go env1.raft.runFollower()

// wait enough time to have HeartbeatTimeout
time.Sleep(tt.fields.conf.HeartbeatTimeout * 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having a test for this. I'm not sure how reliable this will be over time given that it's concurrent and timing dependent. Perhaps worth re-running this in CI a few times and seeing if it fails? We could increase the multiplier here if it does too I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it locally few times, I will try on the CI. The wait is deterministic because the follower loop will run after HeartbeatTimeout and there is no concurrency other then the test and the runFollower loop ( because we avoid starting the raft routines) so the state is in a controlled environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds low risk then. In the past we've found even "safe" waits like this can eventually end up causing flakes in highly CPU starved CI environments where t.Parallel means that many more tests are running than real CPU cores available and so scheduling delays can last hundreds of milliseconds. If we also avoid running in parallel that probably helps too.

For now it seems OK just wanted to note that based on previous tests that have ended up flaky even 3 * heartbeat might not be enough in the long run!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in consul we have a lot of those. I will retry the CI a couple of times before merging to see how it hold.

@dhiaayachi dhiaayachi merged commit 656e6c0 into main Jan 5, 2022
@dhiaayachi dhiaayachi deleted the nonvoter_candidate branch January 5, 2022 20:15
rboyer added a commit that referenced this pull request Mar 1, 2022
rboyer added a commit that referenced this pull request Mar 1, 2022
lgfa29 added a commit to hashicorp/nomad that referenced this pull request Aug 30, 2022
PR #12130 refactored the test to use the `wantPeers` helper, but this
function only returns the number of voting peers, which in this test
should be equal to 2.

I think the tests were passing back them because of a bug in Raft
(hashicorp/raft#483) where a non-voting server
was able to transition to candidate state.

One possible evidence of this is that a successful test run would have
the following log line:

```
raft@v1.3.5/raft.go:1058: nomad.raft: updating configuration: command=AddVoter server-id=127.0.0.1:9101 server-addr=127.0.0.1:9101 servers="[{Suffrage:Voter ID:127.0.0.1:9107 Address:127.0.0.1:9107} {Suffrage:Voter ID:127.0.0.1:9105 Address:127.0.0.1:9105} {Suffrage:Voter ID:127.0.0.1:9103 Address:127.0.0.1:9103} {Suffrage:Voter ID:127.0.0.1:9101 Address:127.0.0.1:9101}]"
```

This commit reverts the test logic to check for peer count, regardless
of voting status.
lgfa29 added a commit to hashicorp/nomad that referenced this pull request Aug 30, 2022
PR #12130 refactored the test to use the `wantPeers` helper, but this
function only returns the number of voting peers, which in this test
should be equal to 2.

I think the tests were passing back them because of a bug in Raft
(hashicorp/raft#483) where a non-voting server
was able to transition to candidate state.

One possible evidence of this is that a successful test run would have
the following log line:

```
raft@v1.3.5/raft.go:1058: nomad.raft: updating configuration: command=AddVoter server-id=127.0.0.1:9101 server-addr=127.0.0.1:9101 servers="[{Suffrage:Voter ID:127.0.0.1:9107 Address:127.0.0.1:9107} {Suffrage:Voter ID:127.0.0.1:9105 Address:127.0.0.1:9105} {Suffrage:Voter ID:127.0.0.1:9103 Address:127.0.0.1:9103} {Suffrage:Voter ID:127.0.0.1:9101 Address:127.0.0.1:9101}]"
```

This commit reverts the test logic to check for peer count, regardless
of voting status.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants