Fixes issue with dangling index being deleted instead of re-imported #19666

abeyad · 2016-07-28T18:39:31Z

Fixes an issue where a node that receives a cluster state
update with a brand new cluster UUID but without an
initial persistence block could cause indices to be wiped out,
preventing them from being reimported as dangling indices.
This commit only removes the in-memory data structures and
thus, are subsequently reimported as dangling indices.

abeyad · 2016-07-28T18:40:08Z

@ywelsch FYI

bleskes · 2016-07-28T19:16:57Z

core/src/main/java/org/elasticsearch/cluster/ClusterChangedEvent.java

+    /**
+     * Determines whether or not the current cluster state represents an entirely
+     * different cluster from the previous cluster state, which will happen when a
+     * master node is elected that has never been part of the cluster before.


doesn't this also happen when the node first joins the cluster? if fos we should doc it.

yes that's correct, i'll add it to the documentation

update with a brand new cluster UUID but without an initial persistence block could cause indices to be wiped out, preventing them from being reimported as dangling indices. This commit only removes the in-memory data structures and thus, are subsequently reimported as dangling indices.

abeyad · 2016-07-29T03:25:35Z

core/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

-                logger.warn("[{}] isn't part of metadata but is part of in memory structures. removing", index);
-                indicesService.deleteIndex(index, "isn't part of metadata (explicit check)");
-            }
-        }


@ywelsch @bleskes I did some more investigation and I believe this block of code is unnecessary and hence removed it. This is because right after the deleteIndices, we call removeUnallocatedIndices which does the same thing as here - it removes the in-memory structures of indices no longer assigned to the node in the cluster state, and on a brand new cluster state, this will be the case. The tests I wrote pass for this.

I wonder if we can make this behavior more prominent in the code, e.g. by changing the line
if (state.blocks().disableStatePersistence()) { in IndicesClusterStateService.clusterChanged to
if (state.blocks().disableStatePersistence() || event.isNewCluster()) {.

by removing this code altogether we also lose the assertion which informs us about new unexpected cases where in-memory index is not deleted by ClusterChangedEvent.indicesDeleted().

The state.blocks().disableStatePersistence() in IndicesClusterStateService.clusterChanged removes the indices and then returns early, not processing anything else. I am not sure that is what we want in the case of having a new cluster, but having missed the earlier cluster state updates, we are now processing this cluster state update with the new cluster UUID. It seems to me that we still want to process the cluster state update through the various methods (create indices, delete indices, etc).

I do agree that we lose the assertion regarding this case in specific of a new cluster. I'm not sure that this alone is worth keeping the code for. We could keep the loop and the assertion, and let the removal happen in removeUnallocatedIndices? Or add a comment to removeUnallocatedIndices as well as an assertion such as:

private void removeUnallocatedIndices(final ClusterChangeEvent event) { // same as before for (AllocatedIndex<? extends Shard> indexService : indicesService) { Index index = indexService.index(); if (indicesWithShards.contains(index) == false) { // if the index does not exist in the cluster state, then it should have been deleted earlier, unless its a brand new cluster (new cluster UUID) assert event.state.metaData().index(index) != null || event.isNewCluster(); logger.debug("{} removing index, no shards allocated", index); indicesService.removeIndex(index, "removing index (no shards allocated)"); } } }

correct, we should not return early when cluster uuid changes. Adding the assertion assert event.state.metaData().index(index) != null || event.isNewCluster(); to removeUnallocatedIndices sounds great as a solution.

directory test to verify the data itself is not deleted from the node.

abeyad · 2016-07-29T16:09:30Z

@ywelsch I pushed 6d45c82

ywelsch · 2016-07-29T16:22:29Z

core/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

@@ -322,6 +310,9 @@ private void removeUnallocatedIndices(final ClusterState state) {
        for (AllocatedIndex<? extends Shard> indexService : indicesService) {
            Index index = indexService.index();
            if (indicesWithShards.contains(index) == false) {
+                assert state.metaData().index(index) != null || event.isNewCluster() :
+                    "if the index does not exist in the cluster state, it should either " +
+                    "have been deleted or the cluster must be new";


can you explain in a comment here why it's ok to remove the index instead of deleting it when event.isNewCluster() (i.e. dangling indices)?
Can you also add index to the assertion error message.

abeyad · 2016-07-30T17:09:36Z

@ywelsch I pushed 23c1ded that addresses your feedback

ywelsch · 2016-08-03T18:00:58Z

...est/java/org/elasticsearch/indices/cluster/IndicesClusterStateServiceRandomUpdatesTests.java

+        // pick a data node to simulate the cluster state change on
+        DiscoveryNode node = null;
+        for (DiscoveryNode discoveryNode : clusterStateServiceMap.keySet()) {
+            if (discoveryNode.isDataNode()) {


maybe you should pick a node that has actually a shard of the index assigned to it. This allows to check that the node actually had in-memory structure of the index before it was removed.

DiscoveryNode nodeWithShard = state.nodes().get(randomFrom(state.routingTable().index(name).shardsWithState(INITIALIZING)).currentNodeId());

abeyad · 2016-08-03T20:14:39Z

@ywelsch I pushed 1588d87
I think (hope) you will like the organization of the test better here.

ywelsch · 2016-08-04T10:22:40Z

LGTM. Thanks @abeyad!

abeyad · 2016-08-04T12:47:25Z

Thanks @ywelsch !

abeyad added >bug review v5.0.0-beta1 labels Jul 28, 2016

bleskes reviewed Jul 28, 2016
View reviewed changes

abeyad force-pushed the fix/deleted-indices-on-new-cluster branch from 2fafd8b to de2b36d Compare July 29, 2016 03:18

abeyad force-pushed the fix/deleted-indices-on-new-cluster branch from de2b36d to 4d87e39 Compare July 29, 2016 03:23

abeyad reviewed Jul 29, 2016
View reviewed changes

Adds check to the IndicesClusterStateService wiping data

6d45c82

directory test to verify the data itself is not deleted from the node.

ywelsch reviewed Jul 29, 2016
View reviewed changes

Applies code review comments

23c1ded

ywelsch reviewed Aug 3, 2016
View reviewed changes

Refactors the test

1588d87

abeyad force-pushed the fix/deleted-indices-on-new-cluster branch from 3f7f1a9 to 1588d87 Compare August 3, 2016 20:14

abeyad merged commit 8bbc312 into elastic:master Aug 4, 2016

abeyad deleted the fix/deleted-indices-on-new-cluster branch August 4, 2016 12:47

clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes issue with dangling index being deleted instead of re-imported #19666

Fixes issue with dangling index being deleted instead of re-imported #19666

abeyad commented Jul 28, 2016

abeyad commented Jul 28, 2016

bleskes Jul 28, 2016

abeyad Jul 29, 2016

abeyad Jul 29, 2016

ywelsch Jul 29, 2016

ywelsch Jul 29, 2016

abeyad Jul 29, 2016

ywelsch Jul 29, 2016

abeyad commented Jul 29, 2016

ywelsch Jul 29, 2016

abeyad commented Jul 30, 2016

ywelsch Aug 3, 2016

abeyad commented Aug 3, 2016

ywelsch commented Aug 4, 2016

abeyad commented Aug 4, 2016

Fixes issue with dangling index being deleted instead of re-imported #19666

Fixes issue with dangling index being deleted instead of re-imported #19666

Conversation

abeyad commented Jul 28, 2016

abeyad commented Jul 28, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abeyad commented Jul 29, 2016

Choose a reason for hiding this comment

abeyad commented Jul 30, 2016

Choose a reason for hiding this comment

abeyad commented Aug 3, 2016

ywelsch commented Aug 4, 2016

abeyad commented Aug 4, 2016