-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add external state support #18
Conversation
Add support for external state (log) to influence leader voting. This, in effect, implements lastLogIndex and lastLogTerm sent on RequestVote RPCs from the Raft paper. This works by exposing two callbacks: one that calls into the user on RequestVote to get the candidate's state and one that calls into the user upon receiving a RequestVote to determine if a vote should be granted based on comparing the logs. From Raft: 1. Reply false if term < currentTerm (§5.1) 2. If votedFor is null or candidateId, and candidate’s log is at least as up-to-date as receiver’s log, grant vote (§5.2, §5.4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking as requesting changes, but could be approve if we don't find a better name for the new handler/APIs.
chan_handler.go
Outdated
|
||
package graft | ||
|
||
// ChanHandler is a convenience handler when a user wants to simply use | ||
// channels for the async handling of errors and state changes. | ||
type ChanHandler struct { | ||
LogPositionHandler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply to everything else in the PR: Is Log
a good choice here? It may imply log replication, which as you know is not done in graft. Would something about vote or state be more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the name might be confusing. I decided against "state" because it was also confusing. There are already a lot of references to "state" in graft with respect to node state changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish I could think of something better, but can't. There isn't full RAFT spec log replication, but the value passed is actually the position in the log in the context of leadership election. If not a name change, I think a comment to that effect might help clarify.
handler_test.go
Outdated
@@ -43,7 +58,8 @@ func TestStateChangeHandler(t *testing.T) { | |||
// Use ChanHandler | |||
scCh := make(chan StateChange) | |||
errCh := make(chan error) | |||
chHand := NewChanHandler(scCh, errCh) | |||
lpHandler := &logPositionHandler{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't use lpHandler
elsewhere, maybe you could pass &logPositionHandler{}
directly in the NewChanHandler()
call. True for 2 other occurrences in this test file.
node.go
Outdated
@@ -78,8 +80,24 @@ type ClusterInfo struct { | |||
Size int | |||
} | |||
|
|||
// LogPositionHandler is used to interrogate the state of the log. | |||
type LogPositionHandler interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sold on the name. Again, Log may not be the best choice here, but not sure what to call it yet.
n.rpc.SendVoteResponse(vreq.Candidate, deny) | ||
return false | ||
} | ||
|
||
// Save state flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the test above is vreq.Term < n.term
and saveState
was previously set to true
if vreq.Term > n.term
, which means that if they were equal, we would not save the state. You are changing that. Is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we send a vote response, the node's state changes. Before, this only happened if the term was higher, but now it also happens if term is equal but "log" is higher. My test was failing because the node state wasn't being persisted in this case. I guess we can still avoid saving if term is equal and the "log" is equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. The point was to make sure that the change was intentional, which I understand it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If term is equal and log is equal, we can avoid saving, but IMO just keep it simple and safe. How often are we expecting this to happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ColinSullivan1 I agree, I don't think the optimization is worth the added complexity.
chan_handler.go
Outdated
@@ -20,10 +22,11 @@ type StateChange struct { | |||
To State | |||
} | |||
|
|||
func NewChanHandler(scCh chan<- StateChange, errCh chan<- error) *ChanHandler { | |||
func NewChanHandler(logHandler LogPositionHandler, scCh chan<- StateChange, errCh chan<- error) *ChanHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are here, can you comment this API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, what would you think about a new creation API, or a way to set the LogPositionHandler? A default handler could provide today's behavior, and we'd keep backward compatibility with existing users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible and not too awkward, that would be best indeed. If not using new API, old behavior. Internally, it means that we would not invoke the callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the default handler callback can treat it as equal log positions (grant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment on name/logIndex size. Still need to run through tests.
handler_test.go
Outdated
@@ -12,6 +13,20 @@ import ( | |||
"github.com/nats-io/graft/pb" | |||
) | |||
|
|||
type logPositionHandler struct { | |||
logIndex uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just use currentSequence
instead of logIndex
? Also, should this be at least a uint64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is a test handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Tested and reviewed code.
return true | ||
} | ||
// Write our state. | ||
if err := n.writeState(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know a lot has gone into this. I wonder if writing the state should be the responsibility of the state machine interface implementor. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is writing state needed for leadership election, so seems like it should be graft's responsibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see your point, but was thinking that users may want a way to persist state themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could provide a hook that calls out to user code (via the StateMachineHandler
) to signal them to persist. I think that could be added later though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tests passed for me. I left a comment regarding writing state - IMO that could be addressed here or in another PR.
We should be looking at IO performance at some point two, for each shared state update we should be doing one write IMO, not multiple. |
Add support for external state (log) to influence leader voting. This,
in effect, implements lastLogIndex and lastLogTerm sent on RequestVote
RPCs from the Raft paper. This works by exposing two callbacks: one that
calls into the user on RequestVote to get the candidate's state and one
that calls into the user upon receiving a RequestVote to determine if a
vote should be granted based on comparing the logs.
From Raft:
least as up-to-date as receiver’s log, grant vote (§5.2, §5.4)
@nats-io/core