Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write shard state metadata as soon as shard is created / initializing #16625

Merged

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Feb 12, 2016

As we now rely on active allocation ids persisted in the cluster state to select
the primary shard copy, we can write shard state metadata on the allocated node
as soon as the node knows about receiving this shard. This also ensures that
in case of primary relocation, when the relocation target is marked as started
by the master node, the shard state metadata with the correct allocation id has
already been written on the relocation target. Before this change, shard state
metadata was only written once the node knows it is marked as started. In case
of failures between master marking the node as started and the node
receiving and processing this event, the relation between the shard copy on disk
and the cluster state could get lost. This means that manual allocation of
the shard using the reroute command allocate_stale_primary was necessary.

Relates to #14739

@@ -342,8 +342,6 @@ public void cleanFiles(int totalTranslogOps, Store.MetadataSnapshot sourceMetaDa
// first, we go and move files that were created with the recovery id suffix to
// the actual names, its ok if we have a corrupted index here, since we have replicas
// to recover from in case of a full cluster shutdown just when this code executes...
indexShard().deleteShardState(); // we have to delete it first since even if we fail to rename the shard
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bleskes I'm not sure what effect removing this has. The issue that made me remove this is that the shard state metadata was written when shard is created, then it was removed again if shard was recovery target, and not updated anymore since the shard state metadata did not change from point of view of IndexShard.persistMetadata(). With writing shard state metadata directly, we now know that the shard state metadata is up-to-date before we do recovery (hence no need to delete shard state?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added as a protection against a failure during the rename, leaving the shard in a corrupted state (#10053) . We later added better checks in another place ( #11269 ) . I think it's OK to remove.

@ywelsch
Copy link
Contributor Author

ywelsch commented Feb 22, 2016

@bleskes ping

assert nodeShardState.allocationId() == null : "Allocation id and legacy version cannot be both present";
logger.trace("[{}] on node [{}] has version [{}] of shard", shard, nodeShardState.getNode(), version);
} else {
// shard was already selected in a 3.x cluster as best candidate for recovery but did not make it to STARTED state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand this correctly, this part is relevant where we assigned a primary after a cluster upgrade and the shard initialized (and wrote a new state file) but we never got around to activating it before crushing again. if that's correct, can you add this to the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

@bleskes
Copy link
Contributor

bleskes commented Feb 29, 2016

change looks good to me. Left some suggestions and questions re testing..

@ywelsch ywelsch force-pushed the fix/fail-on-persist-shard-state-metadata branch from fe713f0 to ef3f69e Compare February 29, 2016 11:33
@ywelsch
Copy link
Contributor Author

ywelsch commented Feb 29, 2016

Pushed another commit addressing review comments. Also found a copy-paste bug in a test.

@bleskes
Copy link
Contributor

bleskes commented Feb 29, 2016

LGTM. Thanks @ywelsch

As we rely on active allocation ids persisted in the cluster state to select
the primary shard copy, we can write shard state metadata on the allocated node
as soon as the node knows about receiving this shard. This also ensures that
in case of primary relocation, when the relocation target is marked as started
by the master node, the shard state metadata with the correct allocation id has
already been written on the relocation target. Before this change, shard state
metadata was only written once the node knows it is marked as started. In case
of failures between master marking the node as started and the node
receiving and processing this event, the relation between the shard copy on disk
and the cluster state could get lost. This means that manual allocation of
the shard using the reroute command allocate_stale_primary was necessary.

Closes elastic#16625
@ywelsch ywelsch force-pushed the fix/fail-on-persist-shard-state-metadata branch from ef3f69e to d76161d Compare February 29, 2016 12:49
ywelsch pushed a commit that referenced this pull request Feb 29, 2016
…e-metadata

Write shard state metadata as soon as shard is created / initializing
@ywelsch ywelsch merged commit 7fc9f03 into elastic:master Feb 29, 2016
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants