Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes issue with dangling index being deleted instead of re-imported #19666

Merged
merged 4 commits into from
Aug 4, 2016

Conversation

abeyad
Copy link

@abeyad abeyad commented Jul 28, 2016

Fixes an issue where a node that receives a cluster state
update with a brand new cluster UUID but without an
initial persistence block could cause indices to be wiped out,
preventing them from being reimported as dangling indices.
This commit only removes the in-memory data structures and
thus, are subsequently reimported as dangling indices.

@abeyad
Copy link
Author

abeyad commented Jul 28, 2016

@ywelsch FYI

/**
* Determines whether or not the current cluster state represents an entirely
* different cluster from the previous cluster state, which will happen when a
* master node is elected that has never been part of the cluster before.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this also happen when the node first joins the cluster? if fos we should doc it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's correct, i'll add it to the documentation

@abeyad abeyad force-pushed the fix/deleted-indices-on-new-cluster branch from 2fafd8b to de2b36d Compare July 29, 2016 03:18
update with a brand new cluster UUID but without an
initial persistence block could cause indices to be wiped out,
preventing them from being reimported as dangling indices.
This commit only removes the in-memory data structures and
thus, are subsequently reimported as dangling indices.
@abeyad abeyad force-pushed the fix/deleted-indices-on-new-cluster branch from de2b36d to 4d87e39 Compare July 29, 2016 03:23
logger.warn("[{}] isn't part of metadata but is part of in memory structures. removing", index);
indicesService.deleteIndex(index, "isn't part of metadata (explicit check)");
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ywelsch @bleskes I did some more investigation and I believe this block of code is unnecessary and hence removed it. This is because right after the deleteIndices, we call removeUnallocatedIndices which does the same thing as here - it removes the in-memory structures of indices no longer assigned to the node in the cluster state, and on a brand new cluster state, this will be the case. The tests I wrote pass for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can make this behavior more prominent in the code, e.g. by changing the line
if (state.blocks().disableStatePersistence()) { in IndicesClusterStateService.clusterChanged to
if (state.blocks().disableStatePersistence() || event.isNewCluster()) {.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by removing this code altogether we also lose the assertion which informs us about new unexpected cases where in-memory index is not deleted by ClusterChangedEvent.indicesDeleted().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state.blocks().disableStatePersistence() in IndicesClusterStateService.clusterChanged removes the indices and then returns early, not processing anything else. I am not sure that is what we want in the case of having a new cluster, but having missed the earlier cluster state updates, we are now processing this cluster state update with the new cluster UUID. It seems to me that we still want to process the cluster state update through the various methods (create indices, delete indices, etc).

I do agree that we lose the assertion regarding this case in specific of a new cluster. I'm not sure that this alone is worth keeping the code for. We could keep the loop and the assertion, and let the removal happen in removeUnallocatedIndices? Or add a comment to removeUnallocatedIndices as well as an assertion such as:

private void removeUnallocatedIndices(final ClusterChangeEvent event) {
        // same as before

        for (AllocatedIndex<? extends Shard> indexService : indicesService) {
            Index index = indexService.index();
            if (indicesWithShards.contains(index) == false) {
                // if the index does not exist in the cluster state, then it should have been deleted earlier, unless its a brand new cluster (new cluster UUID)
                assert event.state.metaData().index(index) != null || event.isNewCluster(); 
                logger.debug("{} removing index, no shards allocated", index);
                indicesService.removeIndex(index, "removing index (no shards allocated)");
            }
        }
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, we should not return early when cluster uuid changes. Adding the assertion assert event.state.metaData().index(index) != null || event.isNewCluster(); to removeUnallocatedIndices sounds great as a solution.

directory test to verify the data itself is not deleted
from the node.
@abeyad
Copy link
Author

abeyad commented Jul 29, 2016

@ywelsch I pushed 6d45c82

@@ -322,6 +310,9 @@ private void removeUnallocatedIndices(final ClusterState state) {
for (AllocatedIndex<? extends Shard> indexService : indicesService) {
Index index = indexService.index();
if (indicesWithShards.contains(index) == false) {
assert state.metaData().index(index) != null || event.isNewCluster() :
"if the index does not exist in the cluster state, it should either " +
"have been deleted or the cluster must be new";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain in a comment here why it's ok to remove the index instead of deleting it when event.isNewCluster() (i.e. dangling indices)?
Can you also add index to the assertion error message.

@abeyad
Copy link
Author

abeyad commented Jul 30, 2016

@ywelsch I pushed 23c1ded that addresses your feedback

// pick a data node to simulate the cluster state change on
DiscoveryNode node = null;
for (DiscoveryNode discoveryNode : clusterStateServiceMap.keySet()) {
if (discoveryNode.isDataNode()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you should pick a node that has actually a shard of the index assigned to it. This allows to check that the node actually had in-memory structure of the index before it was removed.

DiscoveryNode nodeWithShard = state.nodes().get(randomFrom(state.routingTable().index(name).shardsWithState(INITIALIZING)).currentNodeId());

@abeyad abeyad force-pushed the fix/deleted-indices-on-new-cluster branch from 3f7f1a9 to 1588d87 Compare August 3, 2016 20:14
@abeyad
Copy link
Author

abeyad commented Aug 3, 2016

@ywelsch I pushed 1588d87
I think (hope) you will like the organization of the test better here.

@ywelsch
Copy link
Contributor

ywelsch commented Aug 4, 2016

LGTM. Thanks @abeyad!

@abeyad
Copy link
Author

abeyad commented Aug 4, 2016

Thanks @ywelsch !

@abeyad abeyad merged commit 8bbc312 into elastic:master Aug 4, 2016
@abeyad abeyad deleted the fix/deleted-indices-on-new-cluster branch August 4, 2016 12:47
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v5.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants