Write state also on data nodes if not master eligible #9952

brwe · 2015-03-02T18:18:55Z

When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

I am a little lost with this. I found that the index can still be deleted
from a data node if the state was written but the node gets a new cluster state from a
master that does not have it, for example because it was restarted without data folder. Happens
if the data node does not get the initial cluster state from the new but a later one and state
persistence is not disabled.
I avoid this now by this: https://github.com/elasticsearch/elasticsearch/pull/9952/files#diff-f0f71bedb3d7e6f1cec54e8dddf5c3d3R109
but am worried about side effects this might have. Any feedback appreciated.

closes #8823

kimchy · 2015-03-03T14:53:11Z

src/main/java/org/elasticsearch/gateway/GatewayMetaState.java

+                        // remove the index state for this index if it is only a data node
+                        // only delete if the last shard was removed
+                        if (shardsAllocatedOnThisNodeInLastClusterState) {
+                            removeIndexState(indexMetaData);


I am not sure we need this removal logic, we already have that in IndicesClusterStateService#applyCleanedIndices, where an index is removed is no shards are around any more, its nice that it would be in a single place?

ohh, I think I see what happens, in IndicesClusterStateService, we just remove the index and not delete it from the file system, which is good. In IndicesStore, only there we actually delete the shard content once the shard is allocated on all the other nodes. I think that we need to add a logic there that if the there are no more shards around, we need to also delete the index itself (which will delete its metadata).

s1monw · 2015-03-04T11:02:38Z

I left some comments...

brwe · 2015-03-04T19:26:10Z

Made a pr for the deletion of index folders here: #9985 Should be easy to remove all the additional deletion code from this pr.

brwe · 2015-03-05T18:09:12Z

#9985 is merged, I rebased on latest master and changed the code accordingly. I wanted to remove the change in ClusterStateEvent also because I was unable to reproduce the failures I'd seen before without it. But now I found that without the change the tests only pass on my Linux machine but fail every 10 iterations or so on my mac so something is still fishy. I'll try to come up with a detailed failure analysis tomorrow.

brwe · 2015-03-06T10:12:20Z

I think I know what is going on now: The fresh master with the empty cluster state does (rarely) not send the first cluster state due to a race condition in lifecycles of DiscoveryService and its member Discovery. In DiscoveryService.doStart() the Discovery is started but the lifecyle for DiscoveryService is started only after that. This is why when the first cluster state reaches DiscoveryService.publish the lifecycle might or might not have started.

I added a commit d69f2cf where I removed the ClusterStateEvent change and added an artificial delay to the DiscoveryService.doStart() so that the tests fail reliably just so you can check if you want.

I would suggest we remove the ClusterStateEvent workaround and open another issue for this because this behavior is not a result of this pull request.

s1monw · 2015-03-06T12:55:03Z

I agree with your idea of opening a new issue for the ClusterSTateEvent problem

brwe · 2015-03-10T23:48:22Z

Chatted with @s1monw and now rewrote it so that the selection of what to write is not done in GatewayMetaState anymore. I tried to do it similar to #10016. It is still a little raw but but would be great if you could let me know if this is the right direction.

s1monw · 2015-03-12T20:01:58Z

src/main/java/org/elasticsearch/gateway/GatewayMetaState.java

@@ -90,6 +94,26 @@ public MetaData loadMetaState() throws Exception {
        return metaStateService.loadFullState();
    }

+    public static class IndexMetaWriteInfo {
+        IndexMetaData newMetaData;


can they all be package private and final please? just like a struct

also you might wanna put this to the end of the file

also, deal with closed indices also on data nodes

When a node was a data node only then the index state was not written. In case this node connected to a master that did not have the index in the cluster state, for example because a master was restarted and the data folder was lost, then the indices were not imported as dangling but instead deleted. This commit makes sure that index state for data nodes is also written if they have at least one shard of this index allocated. closes #8823 closes #9952

brwe · 2015-04-29T15:24:04Z

need to investigate #10017 before we can push

brwe · 2015-04-30T08:34:58Z

The reason why the tests failed on CI is the same I described in the beginning #9952 (comment) : a data node receives a new cluster state from a master that does not have the index in its state but the data node missed the state with a no master block before and so state persistence was not disabled. the fact that an index is not in the cluster state is then interpreted as delete command. This can happen here for the reasons described in #10017 but there might be other reasons as well. I now think we should not delete indices at all if the cluster state that would cause a deletion comes from a new master.
I added a new commit for this but need someone to confirm that this is actually the right solution.

…n global state changed

s1monw · 2015-05-04T12:37:02Z

I added a new commit for this but need someone to confirm that this is actually the right solution.

+1 to the solution

brwe · 2015-05-04T15:36:25Z

Chatted with @kimchy and we decided to push as is and add a //norelease comment and open an issue because the short term fix for the problem (#9952 (comment)) is not very elegant.
Added another commit to address the latest comments.

s1monw · 2015-05-05T09:40:06Z

LGTM

kimchy reviewed Mar 3, 2015
View reviewed changes

s1monw self-assigned this Mar 4, 2015

s1monw added the review label Mar 4, 2015

brwe force-pushed the data-node-state-write-pr branch 2 times, most recently from fcbeaed to 011f424 Compare March 5, 2015 16:18

brwe mentioned this pull request Mar 6, 2015

Race condition in lifecycles of DiscoveryService and Discovery #10017

Closed

clintongormley mentioned this pull request Mar 9, 2015

All index info lost when the original master machines are not participating in master election #9975

Closed

brwe force-pushed the data-node-state-write-pr branch 2 times, most recently from 547de44 to 72f9f9f Compare March 10, 2015 23:04

s1monw reviewed Mar 12, 2015
View reviewed changes

brwe added 7 commits April 29, 2015 11:41

maintain list of indices that we wrote

92d1f40

also, deal with closed indices also on data nodes

add comments

ab90dbe

exception if cluster state inconsistent

5770828

check on disk if there is a shard written already for a closed index

f88e821

Set -> ImmutableSet

7c44299

simplify iteration

5cb39b8

rename

7569a51

brwe force-pushed the data-node-state-write-pr branch from 6e63466 to 7569a51 Compare April 29, 2015 11:06

brwe closed this in 4088dd3 Apr 29, 2015

kevinkluge removed the review label Apr 29, 2015

brwe reopened this Apr 29, 2015

don't delete indices if master is new

d2abcfa

brwe added 2 commits May 4, 2015 13:57

make method private

9f6f0e1

only gather closed indices list when previousMetaData == null not whe…

06d2b59

…n global state changed

add comment on ClusterChangedEvent and also //norelease

8e8f8d1

brwe closed this in 3cda9b2 May 5, 2015

brwe mentioned this pull request May 5, 2015

Investigate cluster state signaling of index deletes #10978

Closed

clintongormley added >bug resiliency :Cluster labels Jun 8, 2015

brwe mentioned this pull request Jun 15, 2015

Concurrent deletion of indices and master failure can cause indices to be reimported #11665

Closed

clintongormley mentioned this pull request Aug 5, 2015

Add allocate_all_primaries to cluster reroute #4285

Closed

clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write state also on data nodes if not master eligible #9952

Write state also on data nodes if not master eligible #9952

brwe commented Mar 2, 2015

kimchy Mar 3, 2015

kimchy Mar 4, 2015

s1monw commented Mar 4, 2015

brwe commented Mar 4, 2015

brwe commented Mar 5, 2015

brwe commented Mar 6, 2015

s1monw commented Mar 6, 2015

brwe commented Mar 10, 2015

s1monw Mar 12, 2015

s1monw Mar 12, 2015

brwe commented Apr 29, 2015

brwe commented Apr 30, 2015

s1monw commented May 4, 2015

brwe commented May 4, 2015

s1monw commented May 5, 2015

Write state also on data nodes if not master eligible #9952

Write state also on data nodes if not master eligible #9952

Conversation

brwe commented Mar 2, 2015

kimchy Mar 3, 2015

Choose a reason for hiding this comment

kimchy Mar 4, 2015

Choose a reason for hiding this comment

s1monw commented Mar 4, 2015

brwe commented Mar 4, 2015

brwe commented Mar 5, 2015

brwe commented Mar 6, 2015

s1monw commented Mar 6, 2015

brwe commented Mar 10, 2015

s1monw Mar 12, 2015

Choose a reason for hiding this comment

s1monw Mar 12, 2015

Choose a reason for hiding this comment

brwe commented Apr 29, 2015

brwe commented Apr 30, 2015

s1monw commented May 4, 2015

brwe commented May 4, 2015

s1monw commented May 5, 2015