Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes maintaining the shards a snapshot is waiting on #24289

Merged
merged 2 commits into from
Apr 24, 2017

Conversation

abeyad
Copy link

@abeyad abeyad commented Apr 24, 2017

There was a bug in the calculation of the shards that a snapshot must
wait on, due to their relocating or initializing, before the snapshot
can proceed safely to snapshot the shard data. In this bug, an
incorrect key was used to look up the index of the waiting shards,
resulting in the fact that each index would have at most one shard in
the waiting state causing the snapshot to pause. This could be
problematic if there are more than one shard in the relocating or
initializing state, which would result in a snapshot prematurely
starting because it thinks its only waiting on one relocating or
initializing shard (when in fact there could be more than one). While
not a common case and likely rare in practice, it is still problematic.

This commit fixes the issue by ensuring the correct key is used to look
up the waiting indices map as it is being built up, so the list of
waiting shards for each index (those shards that are relocating or
initializing) are aggregated for a given index instead of overwritten.

There was a bug in the calculation of the shards that a snapshot must
wait on, due to their relocating or initializing, before the snapshot
can proceed safely to snapshot the shard data.  In this bug, an
incorrect key was used to look up the index of the waiting shards,
resulting in the fact that each index would have at most one shard in
the waiting state causing the snapshot to pause.  This could be
problematic if there are more than one shard in the relocating or
initializing state, which would result in a snapshot prematurely
starting because it thinks its only waiting on one relocating or
initializing shard (when in fact there could be more than one).  While
not a common case and likely rare in practice, it is still problematic.

This commit fixes the issue by ensuring the correct key is used to look
up the waiting indices map as it is being built up, so the list of
waiting shards for each index (those shards that are relocating or
initializing) are aggregated for a given index instead of overwritten.
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a suggestion to your discretion, fire at will.

State state;
do {
state = randomFrom(State.values());
} while (state == State.WAITING);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a spin I think you can say:

diff --git a/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java b/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java
index 75ac8993fd..4d1a1a6e58 100644
--- a/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java
+++ b/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java
@@ -31,6 +31,7 @@ import org.elasticsearch.test.ESTestCase;
 
 import java.util.Arrays;
 import java.util.List;
+import java.util.stream.Collectors;
 
 /**
  * Unit tests for the {@link SnapshotsInProgress} class and its inner classes.
@@ -72,10 +73,6 @@ public class SnapshotsInProgressTests extends ESTestCase {
     }
 
     private State randomNonWaitingState() {
-        State state;
-        do {
-            state = randomFrom(State.values());
-        } while (state == State.WAITING);
-        return state;
+        return randomFrom(Arrays.stream(State.values()).filter(s -> s != State.WAITING).collect(Collectors.toSet()));
     }
 }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, i like this better

@abeyad abeyad merged commit c5b6f52 into elastic:master Apr 24, 2017
@abeyad abeyad added the v5.5.0 label Apr 24, 2017
abeyad pushed a commit that referenced this pull request Apr 24, 2017
There was a bug in the calculation of the shards that a snapshot must
wait on, due to their relocating or initializing, before the snapshot
can proceed safely to snapshot the shard data.  In this bug, an
incorrect key was used to look up the index of the waiting shards,
resulting in the fact that each index would have at most one shard in
the waiting state causing the snapshot to pause.  This could be
problematic if there are more than one shard in the relocating or
initializing state, which would result in a snapshot prematurely
starting because it thinks its only waiting on one relocating or
initializing shard (when in fact there could be more than one).  While
not a common case and likely rare in practice, it is still problematic.

This commit fixes the issue by ensuring the correct key is used to look
up the waiting indices map as it is being built up, so the list of
waiting shards for each index (those shards that are relocating or
initializing) are aggregated for a given index instead of overwritten.
abeyad pushed a commit that referenced this pull request Apr 24, 2017
There was a bug in the calculation of the shards that a snapshot must
wait on, due to their relocating or initializing, before the snapshot
can proceed safely to snapshot the shard data.  In this bug, an
incorrect key was used to look up the index of the waiting shards,
resulting in the fact that each index would have at most one shard in
the waiting state causing the snapshot to pause.  This could be
problematic if there are more than one shard in the relocating or
initializing state, which would result in a snapshot prematurely
starting because it thinks its only waiting on one relocating or
initializing shard (when in fact there could be more than one).  While
not a common case and likely rare in practice, it is still problematic.

This commit fixes the issue by ensuring the correct key is used to look
up the waiting indices map as it is being built up, so the list of
waiting shards for each index (those shards that are relocating or
initializing) are aggregated for a given index instead of overwritten.
@abeyad
Copy link
Author

abeyad commented Apr 24, 2017

5.x commit: 55ab609
5.4 commit: eb81649

@abeyad
Copy link
Author

abeyad commented Apr 24, 2017

thanks for the review @jasontedor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants