Fixes maintaining the shards a snapshot is waiting on #24289

abeyad · 2017-04-24T14:12:35Z

There was a bug in the calculation of the shards that a snapshot must
wait on, due to their relocating or initializing, before the snapshot
can proceed safely to snapshot the shard data. In this bug, an
incorrect key was used to look up the index of the waiting shards,
resulting in the fact that each index would have at most one shard in
the waiting state causing the snapshot to pause. This could be
problematic if there are more than one shard in the relocating or
initializing state, which would result in a snapshot prematurely
starting because it thinks its only waiting on one relocating or
initializing shard (when in fact there could be more than one). While
not a common case and likely rare in practice, it is still problematic.

This commit fixes the issue by ensuring the correct key is used to look
up the waiting indices map as it is being built up, so the list of
waiting shards for each index (those shards that are relocating or
initializing) are aggregated for a given index instead of overwritten.

There was a bug in the calculation of the shards that a snapshot must wait on, due to their relocating or initializing, before the snapshot can proceed safely to snapshot the shard data. In this bug, an incorrect key was used to look up the index of the waiting shards, resulting in the fact that each index would have at most one shard in the waiting state causing the snapshot to pause. This could be problematic if there are more than one shard in the relocating or initializing state, which would result in a snapshot prematurely starting because it thinks its only waiting on one relocating or initializing shard (when in fact there could be more than one). While not a common case and likely rare in practice, it is still problematic. This commit fixes the issue by ensuring the correct key is used to look up the waiting indices map as it is being built up, so the list of waiting shards for each index (those shards that are relocating or initializing) are aggregated for a given index instead of overwritten.

jasontedor

I left a suggestion to your discretion, fire at will.

jasontedor · 2017-04-24T14:21:48Z

core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java

+        State state;
+        do {
+            state = randomFrom(State.values());
+        } while (state == State.WAITING);


Rather than a spin I think you can say:

diff --git a/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java b/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java index 75ac8993fd..4d1a1a6e58 100644 --- a/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java +++ b/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java @@ -31,6 +31,7 @@ import org.elasticsearch.test.ESTestCase; import java.util.Arrays; import java.util.List; +import java.util.stream.Collectors; /** * Unit tests for the {@link SnapshotsInProgress} class and its inner classes. @@ -72,10 +73,6 @@ public class SnapshotsInProgressTests extends ESTestCase { } private State randomNonWaitingState() { - State state; - do { - state = randomFrom(State.values()); - } while (state == State.WAITING); - return state; + return randomFrom(Arrays.stream(State.values()).filter(s -> s != State.WAITING).collect(Collectors.toSet())); } }

++, i like this better

There was a bug in the calculation of the shards that a snapshot must wait on, due to their relocating or initializing, before the snapshot can proceed safely to snapshot the shard data. In this bug, an incorrect key was used to look up the index of the waiting shards, resulting in the fact that each index would have at most one shard in the waiting state causing the snapshot to pause. This could be problematic if there are more than one shard in the relocating or initializing state, which would result in a snapshot prematurely starting because it thinks its only waiting on one relocating or initializing shard (when in fact there could be more than one). While not a common case and likely rare in practice, it is still problematic. This commit fixes the issue by ensuring the correct key is used to look up the waiting indices map as it is being built up, so the list of waiting shards for each index (those shards that are relocating or initializing) are aggregated for a given index instead of overwritten.

abeyad · 2017-04-24T15:18:37Z

5.x commit: 55ab609
5.4 commit: eb81649

abeyad · 2017-04-24T15:18:43Z

thanks for the review @jasontedor

abeyad added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >bug v5.4.0 v6.0.0-alpha1 labels Apr 24, 2017

abeyad requested a review from jasontedor April 24, 2017 14:12

abeyad mentioned this pull request Apr 24, 2017

Use the correct key when checking the map. #23999

Closed

jasontedor approved these changes Apr 24, 2017

View reviewed changes

better non-waiting state

a52d98c

abeyad merged commit c5b6f52 into elastic:master Apr 24, 2017

abeyad added the v5.5.0 label Apr 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes maintaining the shards a snapshot is waiting on #24289

Fixes maintaining the shards a snapshot is waiting on #24289

abeyad commented Apr 24, 2017

jasontedor left a comment

jasontedor Apr 24, 2017

abeyad Apr 24, 2017

abeyad commented Apr 24, 2017

abeyad commented Apr 24, 2017

Fixes maintaining the shards a snapshot is waiting on #24289

Fixes maintaining the shards a snapshot is waiting on #24289

Conversation

abeyad commented Apr 24, 2017

jasontedor left a comment

Choose a reason for hiding this comment

jasontedor Apr 24, 2017

Choose a reason for hiding this comment

abeyad Apr 24, 2017

Choose a reason for hiding this comment

abeyad commented Apr 24, 2017

abeyad commented Apr 24, 2017