-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes issue with dangling index being deleted instead of re-imported #19666
Fixes issue with dangling index being deleted instead of re-imported #19666
Conversation
@ywelsch FYI |
/** | ||
* Determines whether or not the current cluster state represents an entirely | ||
* different cluster from the previous cluster state, which will happen when a | ||
* master node is elected that has never been part of the cluster before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this also happen when the node first joins the cluster? if fos we should doc it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's correct, i'll add it to the documentation
2fafd8b
to
de2b36d
Compare
update with a brand new cluster UUID but without an initial persistence block could cause indices to be wiped out, preventing them from being reimported as dangling indices. This commit only removes the in-memory data structures and thus, are subsequently reimported as dangling indices.
de2b36d
to
4d87e39
Compare
logger.warn("[{}] isn't part of metadata but is part of in memory structures. removing", index); | ||
indicesService.deleteIndex(index, "isn't part of metadata (explicit check)"); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ywelsch @bleskes I did some more investigation and I believe this block of code is unnecessary and hence removed it. This is because right after the deleteIndices
, we call removeUnallocatedIndices
which does the same thing as here - it removes the in-memory structures of indices no longer assigned to the node in the cluster state, and on a brand new cluster state, this will be the case. The tests I wrote pass for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can make this behavior more prominent in the code, e.g. by changing the line
if (state.blocks().disableStatePersistence()) {
in IndicesClusterStateService.clusterChanged
to
if (state.blocks().disableStatePersistence() || event.isNewCluster()) {
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by removing this code altogether we also lose the assertion which informs us about new unexpected cases where in-memory index is not deleted by ClusterChangedEvent.indicesDeleted().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state.blocks().disableStatePersistence()
in IndicesClusterStateService.clusterChanged
removes the indices and then returns early, not processing anything else. I am not sure that is what we want in the case of having a new cluster, but having missed the earlier cluster state updates, we are now processing this cluster state update with the new cluster UUID. It seems to me that we still want to process the cluster state update through the various methods (create indices, delete indices, etc).
I do agree that we lose the assertion regarding this case in specific of a new cluster. I'm not sure that this alone is worth keeping the code for. We could keep the loop and the assertion, and let the removal happen in removeUnallocatedIndices
? Or add a comment to removeUnallocatedIndices
as well as an assertion such as:
private void removeUnallocatedIndices(final ClusterChangeEvent event) {
// same as before
for (AllocatedIndex<? extends Shard> indexService : indicesService) {
Index index = indexService.index();
if (indicesWithShards.contains(index) == false) {
// if the index does not exist in the cluster state, then it should have been deleted earlier, unless its a brand new cluster (new cluster UUID)
assert event.state.metaData().index(index) != null || event.isNewCluster();
logger.debug("{} removing index, no shards allocated", index);
indicesService.removeIndex(index, "removing index (no shards allocated)");
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, we should not return early when cluster uuid changes. Adding the assertion assert event.state.metaData().index(index) != null || event.isNewCluster();
to removeUnallocatedIndices
sounds great as a solution.
directory test to verify the data itself is not deleted from the node.
@@ -322,6 +310,9 @@ private void removeUnallocatedIndices(final ClusterState state) { | |||
for (AllocatedIndex<? extends Shard> indexService : indicesService) { | |||
Index index = indexService.index(); | |||
if (indicesWithShards.contains(index) == false) { | |||
assert state.metaData().index(index) != null || event.isNewCluster() : | |||
"if the index does not exist in the cluster state, it should either " + | |||
"have been deleted or the cluster must be new"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain in a comment here why it's ok to remove the index instead of deleting it when event.isNewCluster()
(i.e. dangling indices)?
Can you also add index
to the assertion error message.
// pick a data node to simulate the cluster state change on | ||
DiscoveryNode node = null; | ||
for (DiscoveryNode discoveryNode : clusterStateServiceMap.keySet()) { | ||
if (discoveryNode.isDataNode()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you should pick a node that has actually a shard of the index assigned to it. This allows to check that the node actually had in-memory structure of the index before it was removed.
DiscoveryNode nodeWithShard = state.nodes().get(randomFrom(state.routingTable().index(name).shardsWithState(INITIALIZING)).currentNodeId());
3f7f1a9
to
1588d87
Compare
LGTM. Thanks @abeyad! |
Thanks @ywelsch ! |
Fixes an issue where a node that receives a cluster state
update with a brand new cluster UUID but without an
initial persistence block could cause indices to be wiped out,
preventing them from being reimported as dangling indices.
This commit only removes the in-memory data structures and
thus, are subsequently reimported as dangling indices.