Permalink
Browse files

Cap nextIndex to not exceed one past the follower's last log index

This optimizes setting nextIndex on leaders by capping it to just past
the follower's last log index. It helps with followers that are new or
have fallen far behind, speeding up their progress.

I implemented this because I got tired of waiting for the nextIndex to
decrement by one during testing, not because we ran into any particular
issue.
  • Loading branch information...
ongardie committed Jun 15, 2015
1 parent be018cf commit fcbacbb3cc46c280dbff421df4e7223210b9a5c7
Showing with 41 additions and 3 deletions.
  1. +2 −2 Protocol/Raft.proto
  2. +3 −0 RELEASES.md
  3. +9 −0 Server/RaftConsensus.cc
  4. +1 −1 Server/RaftConsensus.h
  5. +26 −0 Server/RaftConsensusTest.cc
@@ -231,8 +231,8 @@ message AppendEntries {
required bool success = 2;
/**
* The recipient's last log index (after it's applied this RPC's
* changes to the log), which might be useful later to speed up finding
* where the leader and a behind follower have diverged.
* changes to the log). This is used to speed up finding the correct
* value for nextIndex with a follower that is far behind the leader.
*/
optional uint64 last_log_index = 3;

@@ -18,6 +18,9 @@ Version 1.0.1-alpha.0 (In Development)
Improvements:
- Adds gcc 5.1 (which required no changes; issue #141) and clang 3.4, 3.5, 3.6,
and 3.7 (issue #9) as supported compilers.
- Optimizes setting nextIndex on leaders by capping it to just past the
follower's last log index. This helps with followers that are new or have
fallen far behind.

Bug fixes (high severity):
- Fixes packaging up very large AppendEntries requests (issue #160). Before, it
@@ -2333,6 +2333,15 @@ RaftConsensus::appendEntries(std::unique_lock<Mutex>& lockGuard,
} else {
if (peer.nextIndex > 1)
--peer.nextIndex;
// A server that hasn't been around for a while might have a much
// shorter log than ours. The AppendEntries reply contains the
// index of its last log entry, and there's no reason for us to
// set nextIndex to be more than 1 past that (that would leave a
// gap, so it will always be rejected).
if (response.has_last_log_index() &&
peer.nextIndex > response.last_log_index() + 1) {
peer.nextIndex = response.last_log_index() + 1;
}
}
}
if (response.has_server_capabilities()) {
@@ -381,7 +381,7 @@ class Peer : public Server {

/**
* The index of the next entry to send to the follower. Only used when
* leader.
* leader. Minimum value of 1.
*/
uint64_t nextIndex;

@@ -2364,6 +2364,32 @@ TEST_F(ServerRaftConsensusPATest, appendEntries_ok)
// TODO(ongaro): test catchup code
}

TEST_F(ServerRaftConsensusPATest, appendEntries_mismatch)
{
// if the follower's log is too short, need to decrement nextIndex
std::unique_lock<Mutex> lockGuard(consensus->mutex);

// decrementing by one
peer->nextIndex = 5;
request.set_prev_log_index(4);
request.set_prev_log_term(6);
request.clear_entries();
response.set_success(false);
response.set_last_log_index(300);
peerService->reply(Protocol::Raft::OpCode::APPEND_ENTRIES,
request, response);
consensus->appendEntries(lockGuard, *peer);
EXPECT_EQ(4U, peer->nextIndex);

// capping to last log index + 1
peer->nextIndex = 5;
response.set_last_log_index(0);
peerService->reply(Protocol::Raft::OpCode::APPEND_ENTRIES,
request, response);
consensus->appendEntries(lockGuard, *peer);
EXPECT_EQ(1U, peer->nextIndex);
}

TEST_F(ServerRaftConsensusPATest, appendEntries_serverCapabilities)
{
auto& cap = *response.mutable_server_capabilities();

0 comments on commit fcbacbb

Please sign in to comment.