Skip to content

Commit

Permalink
SERVER-46721 Secondary readers should read at the no-overlap time ins…
Browse files Browse the repository at this point in the history
…tead of lastApplied

The no-overlap time, ReadSource::kNoOverlap, is the minimum of replication's lastApplied timestamp
and WiredTiger's all_durable time. This time is independent of replication state and ensures
queries do not see oplog holes after state transitions from secondary to primary.

(cherry picked from commit 25c694f)
(cherry picked from commit 5ddf9db)

SERVER-44529 Query yield recovery after a stepdown should switch to read at the no-overlap time

After yielding and reacquiring locks in a query, the preconditions that were used to select our
ReadSource initially need to be checked again. Queries hold an AutoGetCollectionForRead RAII
lock for their lifetime, which may select a ReadSource based on state (e.g. replication
state). After a query yields its locks, this state may have changed, invalidating our current
choice of ReadSource.

(cherry picked from commit b3a5b52)
(cherry picked from commit ce57ddc)

SERVER-48475 Reimplement lastApplied for secondary reads

This partially reverts work to use the kNoOverlap ReadSource on secondaries since the all_durable
calculation is unnecessary and expensive.

(cherry picked from commit ff92d44)
(cherry picked from commit 4e3f3b9)
  • Loading branch information
louiswilliams authored and Evergreen Agent committed Aug 6, 2020
1 parent 708fff2 commit caef661
Show file tree
Hide file tree
Showing 31 changed files with 618 additions and 272 deletions.
86 changes: 86 additions & 0 deletions jstests/replsets/dont_read_oplog_hole_on_step_up.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
/*
* Tests that we don't read an oplog hole when we step up while waiting for a tailable oplog query.
* This test creates a configuration where one secondary, 'secondary', is syncing from a different
* secondary, 'newPrimary', which is soon to become primary. As the new node becomes primary, the
* other secondary oplog tailer should not observe any oplog holes.
*
* @tags: [
* multiversion_incompatible,
* ]
*/
(function() {
'use strict';

load("jstests/replsets/rslib.js");
load("jstests/libs/fail_point_util.js");

var rst = new ReplSetTest({
name: TestData.testName,
// The long election timeout results in a 30-second getMore, plenty of time to hit bugs.
settings: {chainingAllowed: true, electionTimeoutMillis: 60 * 1000},
nodes: [
{},
{},
{rsConfig: {priority: 0}},
],
});
const nodes = rst.startSet();
const oldPrimary = nodes[0];
const newPrimary = nodes[1];
const secondary = nodes[2];

// Make sure this secondary syncs only from the node bound to be the new primary.
assert.commandWorked(secondary.adminCommand({
configureFailPoint: "forceSyncSourceCandidate",
mode: "alwaysOn",
data: {hostAndPort: newPrimary.host}
}));
rst.initiate();

// Make sure when the original primary syncs, it's only from the secondary; this avoids spurious log
// messages.
assert.commandWorked(oldPrimary.adminCommand({
configureFailPoint: "forceSyncSourceCandidate",
mode: "alwaysOn",
data: {hostAndPort: secondary.host}
}));

assert.commandWorked(oldPrimary.getDB(TestData.testName).test.insert({x: 1}));
rst.awaitReplication();

// Force the the secondary tailing the newPrimary to yield its getMore.
const planExecFP = configureFailPoint(newPrimary, "planExecutorHangWhileYieldedInWaitForInserts");

jsTestLog("Stepping up new primary");
assert.commandWorked(newPrimary.adminCommand({replSetStepUp: 1}));
assert.eq(newPrimary, rst.getPrimary());

const createCollFP = configureFailPoint(newPrimary, "hangBeforeLoggingCreateCollection");
const createShell = startParallelShell(() => {
// Implicitly creates the collection.
assert.commandWorked(db.getSiblingDB(TestData.testName).newcoll.insert({y: 2}));
}, newPrimary.port);

jsTestLog("Waiting for oplog tailer to yield");
planExecFP.wait();

jsTestLog("Waiting for collection creation to hang");
createCollFP.wait();

jsTestLog("Creating hole and resuming oplog tail");
assert.commandWorked(newPrimary.getDB(TestData.testName).test.insert({x: 2}));
planExecFP.off();

// Give enough time for the oplog tailer to resume and observe the oplog hole. The expectation is
// that the secondary oplog tailer should not see any holes. If it does, and misses the collection
// creation oplog entry, then it will crash because it will attempt to apply the insert operation on
// a non-existent namespace. While this specific scenario produces a crash, in general this type of
// bug can introduce data corruption.
sleep(3000);

createCollFP.off();
createShell();

rst.awaitReplication();
rst.stopSet();
}());
2 changes: 2 additions & 0 deletions src/mongo/db/SConscript
Original file line number Diff line number Diff line change
Expand Up @@ -694,6 +694,7 @@ env.Library(
LIBDEPS_PRIVATE=[
"catalog/database_holder",
"$BUILD_DIR/mongo/idl/server_parameter",
'storage/snapshot_helper',
],
)

Expand Down Expand Up @@ -1294,6 +1295,7 @@ env.Library(
's/sharding_api_d',
'stats/serveronly_stats',
'storage/oplog_hack',
'storage/snapshot_helper',
'storage/storage_options',
'storage/remove_saver',
'update/update_driver',
Expand Down
4 changes: 2 additions & 2 deletions src/mongo/db/catalog_raii_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -261,8 +261,8 @@ TEST_F(ReadSourceScopeTest, RestoreReadSource) {
ReadSourceScope scope(opCtx());
ASSERT_EQ(opCtx()->recoveryUnit()->getTimestampReadSource(), ReadSource::kUnset);

opCtx()->recoveryUnit()->setTimestampReadSource(ReadSource::kLastApplied);
ASSERT_EQ(opCtx()->recoveryUnit()->getTimestampReadSource(), ReadSource::kLastApplied);
opCtx()->recoveryUnit()->setTimestampReadSource(ReadSource::kNoOverlap);
ASSERT_EQ(opCtx()->recoveryUnit()->getTimestampReadSource(), ReadSource::kNoOverlap);
ASSERT_EQ(opCtx()->recoveryUnit()->getPointInTimeReadTimestamp(), boost::none);
}
ASSERT_EQ(opCtx()->recoveryUnit()->getTimestampReadSource(), ReadSource::kProvided);
Expand Down
2 changes: 2 additions & 0 deletions src/mongo/db/commands/getmore_cmd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ void applyCursorReadConcern(OperationContext* opCtx, repl::ReadConcernArgs rcArg
switch (rcArgs.getMajorityReadMechanism()) {
case repl::ReadConcernArgs::MajorityReadMechanism::kMajoritySnapshot: {
// Make sure we read from the majority snapshot.
opCtx->recoveryUnit()->abandonSnapshot();
opCtx->recoveryUnit()->setTimestampReadSource(
RecoveryUnit::ReadSource::kMajorityCommitted);
uassertStatusOK(opCtx->recoveryUnit()->obtainMajorityCommittedSnapshot());
Expand All @@ -146,6 +147,7 @@ void applyCursorReadConcern(OperationContext* opCtx, repl::ReadConcernArgs rcArg
case repl::ReadConcernArgs::MajorityReadMechanism::kSpeculative: {
// Mark the operation as speculative and select the correct read source.
repl::SpeculativeMajorityReadInfo::get(opCtx).setIsSpeculativeRead();
opCtx->recoveryUnit()->abandonSnapshot();
opCtx->recoveryUnit()->setTimestampReadSource(RecoveryUnit::ReadSource::kNoOverlap);
break;
}
Expand Down
Loading

0 comments on commit caef661

Please sign in to comment.