Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix raft node getting stuck in candidate state #2418

Merged
merged 5 commits into from
Apr 24, 2015
Merged

Fix raft node getting stuck in candidate state #2418

merged 5 commits into from
Apr 24, 2015

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Apr 24, 2015

This PR fixes an issue where a raft peer gets stuck in candidate state and never increments it's index. This happens after an election and somewhat sporadically. The root issue appears to be that the node in candidate state should return to follower state if it starts receiving heartbeats from a new leader. The node was not returning to follower state causing it to become inconsistent w/ the cluster.

In addition to this fix, when running multiple nodes locally w/ raft tracing, it's very difficult
to determine which node is logging. This adds the nodes state(leader,follower,candidate) and
id to all the log messages so we can trace the nodes states more easily.

Also fixes test output alignment when there is a failure in an integration test.

When running multiple nodes locally w/ raft tracing, it's very difficult
to determine which node is logging.  This adds the nodes state(leader,follower,candiate) and
id to all the log messages so we can tarce the nodes states more easily.
@jwilder jwilder changed the title Raft logging id/state Fix raft node getting stuck in candidate state Apr 24, 2015
jwilder added a commit that referenced this pull request Apr 24, 2015
}
}

func (l *Log) printf(msg string, v ...interface{}) {
l.Logger.Printf(fmt.Sprintf("%s[%d]: ", l.state, l.id)+msg+"\n", v...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A worthy change. I was doing something similar when debugging myself.

During an election, a node can sometimes get stuck in candidate state
causing it to never read from the new leader.  This would prevent
it from incrementing it's index and staying consisistent w/ the
leader.
@otoolep
Copy link
Contributor

otoolep commented Apr 24, 2015

+1 on green build.

jwilder added a commit that referenced this pull request Apr 24, 2015
Fix raft node getting stuck in candidate state
@jwilder jwilder merged commit 2d2c806 into master Apr 24, 2015
@jwilder jwilder deleted the jw-int-tests branch April 24, 2015 22:18
@otoolep
Copy link
Contributor

otoolep commented Apr 24, 2015

@benbjohnson -- please double-check this change.

otoolep added a commit that referenced this pull request Apr 24, 2015
This may have been fixed by PR #2418.
l.leaderID = hb.leaderID
l.unlock()
return Follower
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this demotion occur from the message on the l.terms channel received below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbjohnson -- yes, that looks correct. There should be no need to return follower here, since the case statement below should be triggered by the signal sent by mustSetTermIfHigher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants