Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to snapshot replicating primary shards #6139

Merged

Conversation

@imotov
Copy link
Member

commented May 12, 2014

This change adds a new cluster state that waits for the replication of a shard to finish before starting snapshotting process. Because this change adds a new snapshot state, it will not be possible to do rolling upgrade from 1.1 branch to 1.2 with snapshot is being executed in the cluster.

Closes #5531

@imotov imotov added bug labels May 12, 2014
@s1monw

This comment has been minimized.

Copy link
Contributor

commented May 13, 2014

I want to do a deeper review but how do we handle bw compat here if a node is from a prev version and reads the "WAITING" state which is doesn't know about it just fails?

@imotov

This comment has been minimized.

Copy link
Member Author

commented May 13, 2014

Yes, the node with old version will fail on trying to process new cluster state if there is a running snapshot that was started while a primary shard participating in the snapshot was relocated. That's why, as I mentioned in the comment, we cannot have snapshots during 1.1->1.2 rolling upgrade. We will need to add this to release notes. Alternatively, I could also add a check for the presence of older version nodes and if they are present - fallback to the old behavior. That's not going to be bulletproof solution though, since an older version node can suddenly show up later and will not be able to join the cluster. What do you think?

@s1monw

This comment has been minimized.

Copy link
Contributor

commented May 13, 2014

+1 for best effort but It might make sense to have this as a separate commit so it's easier to review? I will do a review of this in the meanwhile.

@imotov

This comment has been minimized.

Copy link
Member Author

commented May 14, 2014

Added version check as a separate commit.

@s1monw
s1monw reviewed May 14, 2014
View changes
src/main/java/org/elasticsearch/cluster/metadata/SnapshotMetaData.java Outdated
} else {
this.shards = shards;
HashMap<String, ImmutableList.Builder<ShardId>> waitingIndices = null;

This comment has been minimized.

Copy link
@s1monw

s1monw May 14, 2014

Contributor

can we name this differently it's super hard to differentiate between the member and the local var. I also would love to not use null as the marker but make it final and check if it is empty? The extra object creation here is not worth the complexity IMO. We might even think of putting this into a separate method for readability ie Map<X,Y> findWaitingIndices()?

@s1monw
s1monw reviewed May 14, 2014
View changes
src/test/java/org/elasticsearch/snapshots/SharedClusterSnapshotRestoreTests.java Outdated

logger.info("--> wait for relocations to start");
Thread.sleep(100);

This comment has been minimized.

Copy link
@s1monw

s1monw May 14, 2014

Contributor

can we have a call that does that I really want to prevent Thread.sleep in our tests... await busy might be ok as well but we can also do a cluster health with wait for relocation?

@s1monw
s1monw reviewed May 14, 2014
View changes
src/main/java/org/elasticsearch/snapshots/SnapshotsService.java Outdated
if (snapshotStatus != null) {
snapshotStatus.abort();
if (entry.state() == State.STARTED ){
HashMap<ShardId, IndexShardSnapshotStatus> startedShards = null;

This comment has been minimized.

Copy link
@s1monw

s1monw May 14, 2014

Contributor

This pattern adds a boat load of complexity all over our code can we prevent using null as an invariant?

@s1monw
s1monw reviewed May 14, 2014
View changes
src/main/java/org/elasticsearch/snapshots/SnapshotsService.java Outdated
RoutingTable routingTable = currentState.routingTable();
MetaData.Builder mdBuilder = MetaData.builder(currentState.metaData());
SnapshotMetaData snapshots = metaData.custom(SnapshotMetaData.TYPE);
if (snapshots == null) {

This comment has been minimized.

Copy link
@s1monw

s1monw May 14, 2014

Contributor

can we invert this to have only one return statement?

@s1monw
s1monw reviewed May 14, 2014
View changes
src/main/java/org/elasticsearch/snapshots/SnapshotsService.java Outdated

@Override
public void onFailure(String source, Throwable t) {
logger.warn("failed to update snapshot state after shards started");

This comment has been minimized.

Copy link
@s1monw

s1monw May 14, 2014

Contributor

should we log the source as well as the exception?

@s1monw
s1monw reviewed May 14, 2014
View changes
src/main/java/org/elasticsearch/snapshots/SnapshotsService.java Outdated
logger.trace("starting shard that we were waiting for [{}] on node [{}]", shardEntry.getKey(), shardStatus.nodeId());
shards.put(shardEntry.getKey(), new ShardSnapshotStatus(shardRouting.primaryShard().currentNodeId()));
continue;
} else if (shardRouting.primaryShard().initializing() || shardRouting.primaryShard().relocating()) {

This comment has been minimized.

Copy link
@s1monw

s1monw May 14, 2014

Contributor

dump question what if it is unassigned? do we fail earlier?

This comment has been minimized.

Copy link
@imotov

imotov May 16, 2014

Author Member

If shard goes unassigned or its routing table disappears completely we fail 6 lines below, right after the comment (// Shard that we were waiting for went into unassigned state or disappeared - giving up)

@s1monw
s1monw reviewed May 14, 2014
View changes
src/main/java/org/elasticsearch/snapshots/SnapshotsService.java Outdated
@@ -638,52 +743,65 @@ private void processIndexShardSnapshots(SnapshotMetaData snapshotMetaData) {

// For now we will be mostly dealing with a single snapshot at a time but might have multiple simultaneously running
// snapshots in the future
HashMap<SnapshotId, SnapshotShards> newSnapshots = null;
HashMap<SnapshotId, HashMap<ShardId, IndexShardSnapshotStatus>> newSnapshots = null;

This comment has been minimized.

Copy link
@s1monw

s1monw May 14, 2014

Contributor

should this be Map vs. HashMap?

@s1monw

This comment has been minimized.

Copy link
Contributor

commented May 14, 2014

so bascially if we snapshot an index that has a primary that is initializing or relocating we just wait for the shards to start up. I think we should documentt this behavior and also test that we can abort such a waiting snapshot just in case it never get's to a state where we are able to snapshot. I wonder if we also should add tests where we have snapshots in multiple states ie. a chaos snapshotting while indexing and relocating. And just make sure we are not seeing any failures or broken snapshots?

@imotov

This comment has been minimized.

Copy link
Member Author

commented May 16, 2014

@s1monw I've implemented suggested changes. Thanks!

@kevinkluge kevinkluge added the review label May 18, 2014
@s1monw
s1monw reviewed May 19, 2014
View changes
src/test/java/org/elasticsearch/snapshots/DedicatedClusterSnapshotRestoreTests.java Outdated
long start = System.currentTimeMillis();
// Produce chaos for 30 sec or until snapshot is done whatever comes first
int randomIndices = 0;
while(System.currentTimeMillis() - start < 30000 && !snapshotIsDone("test-repo", "test-snap")) {

This comment has been minimized.

Copy link
@s1monw

s1monw May 19, 2014

Contributor

could we make this test somehow operations based rather than time based. The Problem with time-based tests is that you have a very very hard time to get anywhere near reproducability. I know this is multithreaded anyways so it won't really reproduces but we can nicely randomize the num ops per thread which make it a bit nicer :)

final int cur = i;
asyncIndexThreads[i] = new Thread(new Runnable() {
@Override
public void run() {

This comment has been minimized.

Copy link
@s1monw

s1monw May 19, 2014

Contributor

can you put a latch in here to make sure we are not half way done once the last thread starts?

@s1monw
s1monw reviewed May 19, 2014
View changes
src/test/java/org/elasticsearch/snapshots/DedicatedClusterSnapshotRestoreTests.java Outdated
// Produce chaos for 30 sec or until snapshot is done whatever comes first
int randomIndices = 0;
while(System.currentTimeMillis() - start < 30000 && !snapshotIsDone("test-repo", "test-snap")) {
Thread.sleep(100);

This comment has been minimized.

Copy link
@s1monw

s1monw May 19, 2014

Contributor

instead of sleeping can you fire up an operation and wait until it comes back like a list snapshots or so?

@s1monw
s1monw reviewed May 19, 2014
View changes
src/test/java/org/elasticsearch/snapshots/SharedClusterSnapshotRestoreTests.java Outdated
}

private boolean waitForIndex(String index, TimeValue timeout) throws InterruptedException {
long start = System.currentTimeMillis();
while (System.currentTimeMillis() - start < timeout.millis()) {

This comment has been minimized.

Copy link
@s1monw

s1monw May 19, 2014

Contributor

you should use the awaitBusy method here which doesn't have a fixed sleep but increase the sleep interval in quadratic steps...

@@ -597,6 +602,105 @@ public void onFailure(String source, Throwable t) {
}
}

private void processStartedShards(ClusterChangedEvent event) {
if (waitingShardsStartedOrUnassigned(event)) {

This comment has been minimized.

Copy link
@s1monw

s1monw May 19, 2014

Contributor

I wonder if it would make sense to add a method that allows us to filter the snapshots here and only return the ones relevant to this method. This method is crazy :)

This comment has been minimized.

Copy link
@s1monw

s1monw May 19, 2014

Contributor

like a method that returns all started shards ie public void Map<ShardId, ShardSnapshotStatus> getStarted(SnapshotMetaData meta) this would make this method soo much simpler

@s1monw s1monw removed the bug label May 19, 2014
@s1monw s1monw removed v1.2.0 labels May 19, 2014
@s1monw

This comment has been minimized.

Copy link
Contributor

commented May 19, 2014

FYI - I removed the labels in favor of the issue

@s1monw

This comment has been minimized.

Copy link
Contributor

commented May 20, 2014

LGTM

This change adds a new cluster state that waits for the replication of a shard to finish before starting snapshotting process. Because this change adds a new snapshot state, an pre-1.2.0 nodes will not be able to join the 1.2.0 cluster that is currently running snapshot/restore operation.

Closes #5531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.