Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep snapshot restore state and routing table in sync #20836

Merged
merged 3 commits into from Oct 12, 2016

Conversation

Projects
None yet
4 participants
@ywelsch
Copy link
Contributor

ywelsch commented Oct 10, 2016

The snapshot restore state tracks information about shards being restored from a snapshot in the cluster state. For example it records if a shard has been successfully restored or if restoring it was not possible due to a corruption of the snapshot. Recording these events is usually based on changes to the shard routing table, i.e., when a shard is started after a successful restore or failed after an unsuccessful one. As of now, there were two communication channels to transmit recovery failure / success to update the routing table and the restore state. This lead to issues where a shard was failed but the restore state was not updated due to connection issues between data and master node. In some rare situations, this lead to an issue where the restore state could not be properly cleaned up anymore by the master, making it impossible to start new restore operations. The following change updates routing table and restore state in the same cluster state update so that both always stay in sync. It also eliminates the extra communication channel for restore operations and uses the standard cluster state listener mechanism to update restore listener upon successful completion of a snapshot restore.

Closes #19774

Update snapshot restore state and routing table in sync
The snapshot restore state tracks information about shards being restored from a snapshot in the cluster state. For
example it records if a shard has been successfully restored or if restoring it was not possible due to a corruption of
the snapshot. Recording these events is usually based on changes to the shard routing table, i.e., when a shard is
started after a successful restore or failed after an unsuccessful one. As of now, there were two communication channels
to transmit recovery failure / success to update the routing table and the restore state. This lead to issues where a
shard was failed but the restore state was not updated due to connection issues between data and master node. In some
rare situations, this lead to an issue where the restore state could not be properly cleaned up anymore by the master,
making it impossible to start new restore operations. The following change updates routing table and restore state in
the same cluster state update so that both always stay in sync. It also eliminates the extra communication channel for
restore operations and uses standard cluster state listener mechanism to update restore listener upon successful
completion of a snapshot.
@imotov

imotov approved these changes Oct 11, 2016

Copy link
Member

imotov left a comment

Left a couple of minor comments. Otherwise, LGTM.

RoutingTable oldRoutingTable = oldState.routingTable();
RoutingNodes newRoutingNodes = allocation.routingNodes();
final RoutingTable newRoutingTable = new RoutingTable.Builder().updateNodes(oldRoutingTable.version(), newRoutingNodes).build();
MetaData newMetaData = allocation.updateMetaDataWithRoutingChanges(newRoutingTable);
assert newRoutingTable.validate(newMetaData); // validates the routing table is coherent with the cluster state metadata
final ClusterState newState = ClusterState.builder(oldState).routingTable(newRoutingTable).metaData(newMetaData).build();
final RestoreInProgress restoreInProgress = allocation.custom(RestoreInProgress.TYPE);
RestoreInProgress updatedRestoreInProgress = allocation.updateRestoreInfoWithRoutingChanges(restoreInProgress);

This comment has been minimized.

Copy link
@imotov

imotov Oct 11, 2016

Member

That doesn't seem to cause any issues, but I think moving this into the if statement bellow might help to clarify the logic. This method can be called when no restore takes place and we kind of plunge head on into updating restore info without even checking if restore actually takes place. We check it inside updateRestoreInfoWithRoutingChanges, but I think it might make the logic clearer if we checked it here.

This comment has been minimized.

Copy link
@ywelsch

ywelsch Oct 11, 2016

Author Contributor

agree

RecoverySource recoverySource = failedShard.recoverySource();
if (recoverySource.getType() == RecoverySource.Type.SNAPSHOT) {
Snapshot snapshot = ((SnapshotRecoverySource) recoverySource).snapshot();
if (Lucene.isCorruptionException(unassignedInfo.getFailure().getCause())) {

This comment has been minimized.

Copy link
@imotov

imotov Oct 11, 2016

Member

Could you add a comment explaining why we only fail in case of lucene corruption?

This comment has been minimized.

Copy link
@ywelsch

ywelsch Oct 11, 2016

Author Contributor

sure

ywelsch added some commits Oct 11, 2016

@ywelsch ywelsch merged commit 0750470 into elastic:master Oct 12, 2016

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details
@ywelsch

This comment has been minimized.

Copy link
Contributor Author

ywelsch commented Oct 12, 2016

thanks for the review @imotov

ywelsch added a commit to ywelsch/elasticsearch that referenced this pull request Oct 31, 2016

Keep snapshot restore state and routing table in sync (elastic#20836)
The snapshot restore state tracks information about shards being restored from a snapshot in the cluster state. For example it records if a shard has been successfully restored or if restoring it was not possible due to a corruption of the snapshot. Recording these events is usually based on changes to the shard routing table, i.e., when a shard is started after a successful restore or failed after an unsuccessful one. As of now, there were two communication channels to transmit recovery failure / success to update the routing table and the restore state. This lead to issues where a shard was failed but the restore state was not updated due to connection issues between data and master node. In some rare situations, this lead to an issue where the restore state could not be properly cleaned up anymore by the master, making it impossible to start new restore operations. The following change updates routing table and restore state in the same cluster state update so that both always stay in sync. It also eliminates the extra communication channel for restore operations and uses standard cluster state listener mechanism to update restore listener upon successful
completion of a snapshot.
@ILMostro

This comment has been minimized.

Copy link

ILMostro commented Aug 31, 2017

Is there any hope of this being backported to 1.7 version?

@jasontedor

This comment has been minimized.

Copy link
Member

jasontedor commented Aug 31, 2017

No, sorry, 1.7 is end-of-life.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.