Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add external state support #18

Merged
merged 4 commits into from
Jul 12, 2017
Merged

Add external state support #18

merged 4 commits into from
Jul 12, 2017

Conversation

tylertreat
Copy link
Contributor

Add support for external state (log) to influence leader voting. This,
in effect, implements lastLogIndex and lastLogTerm sent on RequestVote
RPCs from the Raft paper. This works by exposing two callbacks: one that
calls into the user on RequestVote to get the candidate's state and one
that calls into the user upon receiving a RequestVote to determine if a
vote should be granted based on comparing the logs.

From Raft:

  1. Reply false if term < currentTerm (§5.1)
  2. If votedFor is null or candidateId, and candidate’s log is at
    least as up-to-date as receiver’s log, grant vote (§5.2, §5.4)

@nats-io/core

Add support for external state (log) to influence leader voting. This,
in effect, implements lastLogIndex and lastLogTerm sent on RequestVote
RPCs from the Raft paper. This works by exposing two callbacks: one that
calls into the user on RequestVote to get the candidate's state and one
that calls into the user upon receiving a RequestVote to determine if a
vote should be granted based on comparing the logs.

From Raft:
1. Reply false if term < currentTerm (§5.1)
2. If votedFor is null or candidateId, and candidate’s log is at
least as up-to-date as receiver’s log, grant vote (§5.2, §5.4)
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 90.765% when pulling 433585a on log_state into 4629e3c on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 90.765% when pulling 23bf473 on log_state into 4629e3c on master.

Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as requesting changes, but could be approve if we don't find a better name for the new handler/APIs.

chan_handler.go Outdated

package graft

// ChanHandler is a convenience handler when a user wants to simply use
// channels for the async handling of errors and state changes.
type ChanHandler struct {
LogPositionHandler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply to everything else in the PR: Is Log a good choice here? It may imply log replication, which as you know is not done in graft. Would something about vote or state be more appropriate?

Copy link
Contributor Author

@tylertreat tylertreat Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the name might be confusing. I decided against "state" because it was also confusing. There are already a lot of references to "state" in graft with respect to node state changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I could think of something better, but can't. There isn't full RAFT spec log replication, but the value passed is actually the position in the log in the context of leadership election. If not a name change, I think a comment to that effect might help clarify.

handler_test.go Outdated
@@ -43,7 +58,8 @@ func TestStateChangeHandler(t *testing.T) {
// Use ChanHandler
scCh := make(chan StateChange)
errCh := make(chan error)
chHand := NewChanHandler(scCh, errCh)
lpHandler := &logPositionHandler{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't use lpHandler elsewhere, maybe you could pass &logPositionHandler{} directly in the NewChanHandler() call. True for 2 other occurrences in this test file.

node.go Outdated
@@ -78,8 +80,24 @@ type ClusterInfo struct {
Size int
}

// LogPositionHandler is used to interrogate the state of the log.
type LogPositionHandler interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sold on the name. Again, Log may not be the best choice here, but not sure what to call it yet.

n.rpc.SendVoteResponse(vreq.Candidate, deny)
return false
}

// Save state flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the test above is vreq.Term < n.term and saveState was previously set to true if vreq.Term > n.term, which means that if they were equal, we would not save the state. You are changing that. Is it intentional?

Copy link
Contributor Author

@tylertreat tylertreat Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we send a vote response, the node's state changes. Before, this only happened if the term was higher, but now it also happens if term is equal but "log" is higher. My test was failing because the node state wasn't being persisted in this case. I guess we can still avoid saving if term is equal and the "log" is equal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. The point was to make sure that the change was intentional, which I understand it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If term is equal and log is equal, we can avoid saving, but IMO just keep it simple and safe. How often are we expecting this to happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ColinSullivan1 I agree, I don't think the optimization is worth the added complexity.

chan_handler.go Outdated
@@ -20,10 +22,11 @@ type StateChange struct {
To State
}

func NewChanHandler(scCh chan<- StateChange, errCh chan<- error) *ChanHandler {
func NewChanHandler(logHandler LogPositionHandler, scCh chan<- StateChange, errCh chan<- error) *ChanHandler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are here, can you comment this API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, what would you think about a new creation API, or a way to set the LogPositionHandler? A default handler could provide today's behavior, and we'd keep backward compatibility with existing users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible and not too awkward, that would be best indeed. If not using new API, old behavior. Internally, it means that we would not invoke the callback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the default handler callback can treat it as equal log positions (grant).

Copy link

@petemiron petemiron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment on name/logIndex size. Still need to run through tests.

handler_test.go Outdated
@@ -12,6 +13,20 @@ import (
"github.com/nats-io/graft/pb"
)

type logPositionHandler struct {
logIndex uint32

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just use currentSequence instead of logIndex? Also, should this be at least a uint64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a test handler.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 90.625% when pulling 6692c9d on log_state into 4629e3c on master.

Copy link

@petemiron petemiron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tested and reviewed code.

return true
}
// Write our state.
if err := n.writeState(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know a lot has gone into this. I wonder if writing the state should be the responsibility of the state machine interface implementor. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is writing state needed for leadership election, so seems like it should be graft's responsibility?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see your point, but was thinking that users may want a way to persist state themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could provide a hook that calls out to user code (via the StateMachineHandler) to signal them to persist. I think that could be added later though.

Copy link
Member

@ColinSullivan1 ColinSullivan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tests passed for me. I left a comment regarding writing state - IMO that could be addressed here or in another PR.

@tylertreat tylertreat merged commit ab0519d into master Jul 12, 2017
@tylertreat tylertreat deleted the log_state branch July 12, 2017 18:28
@derekcollison
Copy link
Member

We should be looking at IO performance at some point two, for each shared state update we should be doing one write IMO, not multiple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants