Skip to content

Commit

Permalink
Fix Race in testGetSnapshotsRequest (#61694) (#61701)
Browse files Browse the repository at this point in the history
The fact that the data node is already blocked on writing
data files did not guarantee that the cluster state that made
the data node start snapshotting is already applied on master.
This could lead to races where the get snapshots action still
runs based on a state without the snapshot in it, tripping the assertion.
Much safer to handle this by waiting on the non-blocking snapshot create
to return, which guarantees that the CS has been applied on master.

Closes #61541
  • Loading branch information
original-brownbear committed Aug 31, 2020
1 parent 7d2eca9 commit 29d3a8d
Showing 1 changed file with 2 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2707,11 +2707,10 @@ public void testGetSnapshotsRequest() throws Exception {
// take initial snapshot with a block, making sure we only get 1 in-progress snapshot returned
// block a node so the create snapshot operation can remain in progress
final String initialBlockedNode = blockNodeWithIndex(repositoryName, indexName);
ActionFuture<CreateSnapshotResponse> responseListener =
client.admin().cluster().prepareCreateSnapshot(repositoryName, "snap-on-empty-repo")
client.admin().cluster().prepareCreateSnapshot(repositoryName, "snap-on-empty-repo")
.setWaitForCompletion(false)
.setIndices(indexName)
.execute();
.get();
waitForBlock(initialBlockedNode, repositoryName, TimeValue.timeValueSeconds(60)); // wait for block to kick in
getSnapshotsResponse = client.admin().cluster()
.prepareGetSnapshots("test-repo")
Expand All @@ -2720,7 +2719,6 @@ public void testGetSnapshotsRequest() throws Exception {
assertEquals(1, getSnapshotsResponse.getSnapshots().size());
assertEquals("snap-on-empty-repo", getSnapshotsResponse.getSnapshots().get(0).snapshotId().getName());
unblockNode(repositoryName, initialBlockedNode); // unblock node
responseListener.actionGet(TimeValue.timeValueMillis(10000L)); // timeout after 10 seconds
client.admin().cluster().prepareDeleteSnapshot(repositoryName, "snap-on-empty-repo").get();

final int numSnapshots = randomIntBetween(1, 3) + 1;
Expand Down

0 comments on commit 29d3a8d

Please sign in to comment.