Permalink
Browse files

Limit Raft transaction MutableView depth to MAX_MUTABLE_VIEW_DEPTH.

  • Loading branch information...
archiecobbs committed Sep 8, 2018
1 parent fb3e3b3 commit a1eeb711279db6c86fe063241cd82a82403d0f20
@@ -35,19 +35,18 @@
// Sanity check
assert raft != null;
assert Thread.holdsLock(raft);
assert maxIndex >= -1;
assert maxIndex >= 0;
// Grab a snapshot of the key/value store
this.snapshot = raft.kv.snapshot();
// Create a view of just the state machine keys and values and successively layer unapplied log entries
// If we require a committed view, then stop when we get to the first uncomitted log entry
// Create a view of just the state machine keys and values and successively layer unapplied log entries up to maxIndex
KVStore kview = PrefixKVStore.create(snapshot, raft.getStateMachinePrefix());
this.config = new HashMap<>(raft.log.getLastAppliedConfig());
long viewIndex = raft.log.getLastAppliedIndex();
long viewTerm = raft.log.getLastAppliedTerm();
for (LogEntry logEntry : raft.log.getUnapplied()) {
if (maxIndex != -1 && logEntry.getIndex() > maxIndex)
if (logEntry.getIndex() > maxIndex)
break;
final Writes writes = logEntry.getWrites();
if (!writes.isEmpty())
@@ -365,6 +365,7 @@
static final int FOLLOWER_LINGER_HEARTBEATS = 3; // how long to keep updating removed followers
static final float MAX_CLOCK_DRIFT = 0.01f; // max clock drift per heartbeat as a percentage ratio
static final int MAX_APPLIED_ENTRIES = 256; // how many already-applied log entries to keep around
static final int MAX_MUTABLE_VIEW_DEPTH = 20; // max depth for a stack of MutableView's
// File prefixes and suffixes
static final String TX_FILE_PREFIX = "tx-";
@@ -1475,8 +1476,10 @@ public synchronized RaftKVTransaction createTransaction(Consistency consistency,
// Base transaction on the most recent log entry (if !committed). This is itself a form of optimistic locking: we assume
// that the most recent log entry has a high probability of being committed (in the Raft sense), which is of course
// required in order to commit any transaction based on it.
final MostRecentView view = new MostRecentView(this, consistency.isBasedOnCommittedLogEntry() ? this.commitIndex : -1);
// required in order to commit any transaction based on it. But limit to at most MAX_MUTABLE_VIEW_DEPTH log entries.
final long maxIndex = consistency.isBasedOnCommittedLogEntry() ?
this.commitIndex : Math.min(this.log.getLastIndex(), this.log.getLastAppliedIndex() + MAX_MUTABLE_VIEW_DEPTH);
final MostRecentView view = new MostRecentView(this, maxIndex);
final long baseTerm = view.getTerm();
final long baseIndex = view.getIndex();
@@ -49,6 +49,9 @@
When a transaction is created, a MutableView is setup using the log entry corresponding to the transaction's base
term+index as the underlying read-only data. The transaction's consistency determines whether this log entry is
the last log entry (LINEARIZABLE, EVENTUAL, UNCOMMITTED) or the last committed log entry (EVENTUAL_COMMITTED).
In the former case, if the current unapplied log has more than MAX_MUTABLE_VIEW_DEPTH entries, we will choose the
log entry that is MAX_MUTABLE_VIEW_DEPTH from the last applied log entry, to limit the performance degradation
caused by multiple nested MutableView's.
For non-LINEARIZABLE transactions, the commit term+index can always be determined immediately: for EVENTUAL and
EVENTUAL_COMMITTED, it is just the base log entry; for UNCOMMITTED, both values are zero. Therefore, UNCOMMITTED
@@ -59,6 +62,7 @@ and EVENTUAL_COMMITTED transactions can always commit() immediately, because the
the transaction's view up-to-date. This is called "rebasing" the transaction. The rebase operation can fail due
to read/write conflicts, whereby a newly added log entry mutates a key that has already been read by the transaction.
If such a conflict is detected, the transaction must retry because it has seen what is now out-of-date information.
On leaders, if a conflict occurs when rebasing a high priority transaction, the other transaction fails instead.
For LINEARIZABLE transactions, the leader must be consulted (via CommitRequest) to determine the commit term+index.
If the transaction is read-only, the commit term+index is taken from the leader's last log entry at the time the
@@ -105,12 +109,13 @@ When commit() is invoked, the thread blocks until the commit term+index log entr
For leaders, the situation is more complicated. Applying committed log entries too aggressively can cause these issues:
o If some follower has not received a log entry, but the leader has applied that log entry to its state machine, then
the only way the follower can be synchronized is via InstallSnapshot (i.e., full state machine dump), which is costly.
o If some follower has not received a log entry, but the leader has applied that log entry to its state machine and
discarded it, then the only way the follower can be synchronized is via InstallSnapshot (i.e., full state machine dump),
which is costly.
o In order to detect conflicts in a mutating LINEARIZABLE transaction received in a follower's CommitRequest, a leader
must have access to all log entries after the transaction's base term+index. If any of these have already been applied
to the state machine, the leader has no choice but to return a retry error.
to the state machine and discarded, the leader has no choice but to return a retry error.
Actually, these issues also apply to followers, in the sense that they could become leaders at any time.

0 comments on commit a1eeb71

Please sign in to comment.