-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes maintaining the shards a snapshot is waiting on #24289
Conversation
There was a bug in the calculation of the shards that a snapshot must wait on, due to their relocating or initializing, before the snapshot can proceed safely to snapshot the shard data. In this bug, an incorrect key was used to look up the index of the waiting shards, resulting in the fact that each index would have at most one shard in the waiting state causing the snapshot to pause. This could be problematic if there are more than one shard in the relocating or initializing state, which would result in a snapshot prematurely starting because it thinks its only waiting on one relocating or initializing shard (when in fact there could be more than one). While not a common case and likely rare in practice, it is still problematic. This commit fixes the issue by ensuring the correct key is used to look up the waiting indices map as it is being built up, so the list of waiting shards for each index (those shards that are relocating or initializing) are aggregated for a given index instead of overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a suggestion to your discretion, fire at will.
State state; | ||
do { | ||
state = randomFrom(State.values()); | ||
} while (state == State.WAITING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than a spin I think you can say:
diff --git a/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java b/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java
index 75ac8993fd..4d1a1a6e58 100644
--- a/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java
+++ b/core/src/test/java/org/elasticsearch/cluster/SnapshotsInProgressTests.java
@@ -31,6 +31,7 @@ import org.elasticsearch.test.ESTestCase;
import java.util.Arrays;
import java.util.List;
+import java.util.stream.Collectors;
/**
* Unit tests for the {@link SnapshotsInProgress} class and its inner classes.
@@ -72,10 +73,6 @@ public class SnapshotsInProgressTests extends ESTestCase {
}
private State randomNonWaitingState() {
- State state;
- do {
- state = randomFrom(State.values());
- } while (state == State.WAITING);
- return state;
+ return randomFrom(Arrays.stream(State.values()).filter(s -> s != State.WAITING).collect(Collectors.toSet()));
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, i like this better
There was a bug in the calculation of the shards that a snapshot must wait on, due to their relocating or initializing, before the snapshot can proceed safely to snapshot the shard data. In this bug, an incorrect key was used to look up the index of the waiting shards, resulting in the fact that each index would have at most one shard in the waiting state causing the snapshot to pause. This could be problematic if there are more than one shard in the relocating or initializing state, which would result in a snapshot prematurely starting because it thinks its only waiting on one relocating or initializing shard (when in fact there could be more than one). While not a common case and likely rare in practice, it is still problematic. This commit fixes the issue by ensuring the correct key is used to look up the waiting indices map as it is being built up, so the list of waiting shards for each index (those shards that are relocating or initializing) are aggregated for a given index instead of overwritten.
There was a bug in the calculation of the shards that a snapshot must wait on, due to their relocating or initializing, before the snapshot can proceed safely to snapshot the shard data. In this bug, an incorrect key was used to look up the index of the waiting shards, resulting in the fact that each index would have at most one shard in the waiting state causing the snapshot to pause. This could be problematic if there are more than one shard in the relocating or initializing state, which would result in a snapshot prematurely starting because it thinks its only waiting on one relocating or initializing shard (when in fact there could be more than one). While not a common case and likely rare in practice, it is still problematic. This commit fixes the issue by ensuring the correct key is used to look up the waiting indices map as it is being built up, so the list of waiting shards for each index (those shards that are relocating or initializing) are aggregated for a given index instead of overwritten.
thanks for the review @jasontedor |
There was a bug in the calculation of the shards that a snapshot must
wait on, due to their relocating or initializing, before the snapshot
can proceed safely to snapshot the shard data. In this bug, an
incorrect key was used to look up the index of the waiting shards,
resulting in the fact that each index would have at most one shard in
the waiting state causing the snapshot to pause. This could be
problematic if there are more than one shard in the relocating or
initializing state, which would result in a snapshot prematurely
starting because it thinks its only waiting on one relocating or
initializing shard (when in fact there could be more than one). While
not a common case and likely rare in practice, it is still problematic.
This commit fixes the issue by ensuring the correct key is used to look
up the waiting indices map as it is being built up, so the list of
waiting shards for each index (those shards that are relocating or
initializing) are aggregated for a given index instead of overwritten.