Write index metadata on data nodes where shards allocated #8823

bobpoekert · 2014-12-08T18:23:32Z

If the master node thinks that an index does not exist, and another node thinks that it does, the conflict is currently resolved by having the node that has the index delete it. This can easily result in sudden unexpected data loss. The correct behavior would be for the conflict to be resolved by both nodes accepting the state of the node that thinks that the index exists.

vjanelle · 2014-12-08T18:33:38Z

Are you running dedicated masters with no query/data load?

bleskes · 2014-12-08T18:34:57Z

@bobpoekert can you elaborate more about what happened before the data was lost - did you update the mapping (title suggest so)? If you can share your cluster layout (dedicated master nodes or not) it would be great. Also please the grab a copy of the logs of the nodes and save them. They might give more insight.

bobpoekert · 2014-12-08T18:49:47Z

@vjanelle yes

@bleskes The sequence of events was the following:

Remove node (which is a master candidate) from cluster
Delete all index files from said node
Add node back into cluster
Node is elected master
All indexes in the cluster are now gone

vjanelle · 2014-12-08T18:52:52Z

Did you have data turned off on the master candidate as well in the configuration?

bobpoekert · 2014-12-08T18:53:11Z

@vjanelle No. The master candidate is also a data node.

bleskes · 2014-12-08T19:39:04Z

@bobpoekert thx. let me make sure I understand what you are saying:

when you started you had a master node running, but it was also allowed to have data (i.e., it didn't have node.data set to false in its elasticsearch.yml file). Call this node A.
you brought down another node which was both a master candidate and a data node (i.e., neither node.master nor node.data was set to false). Call this node B.
3).While B was down, you delete it's data folder (right? or was it some sub folders of it).
You brought B back up and it joined the cluster.

From this point on I'm not clear. Why was node B elected as master? What happened to the residing master node A?

bobpoekert · 2014-12-08T23:09:49Z

@bleskes

Cluster has a single master node (A) and two data nodes (B and C)
Remove A from the cluster
Now the cluster is offline (has no master)
Delete A's data folder
Bring A back up
Now B and C have no data

bleskes · 2014-12-08T23:17:26Z

@bobpoekert I'm confused. You said before you had nodes that are master candidates and are also data nodes. Can you confirm that node A has node.data: false in it's settings and and that node B & C have node.master: false ?

bobpoekert · 2014-12-08T23:18:16Z

@bleskes
All the nodes have data: true.
A has master: true
B and C have master: false

bleskes · 2014-12-08T23:41:37Z

OK. Clear. The cluster meta data (which we call cluster state) is stored and maintained on the master nodes. That meta data contains which indices are out there in the cluster. We only write the metadata on master eligible nodes, and rely on multiple masters for redundancy of this data set (compared to specific shard data).

Since you only have one master node, that means there is no redundancy in your cluster meta data storage. After you deleted it there is nowhere to get it back from so the cluster becomes empty.

We do have a feature that is called dangling indices, which is in charge of scanning data folders for indices that are found on disk but are not part the cluster state and automatically import then into the cluster. As it is today, this feature needs to find some part of the index meta data to work, but those are also stored on the master eligible nodes, which in your case there were none.

Thinking about it, we can be more resilient in situations where users are running only a single master node (though we highly recommend running more than one), and store the index metadata wherever a shard copy is stored, so also on data nodes. So we can improve the dangling indices case to identify those as well.

Lets keep this issue open and we will work on a PR to improve things based on the above.

lusis · 2014-12-09T00:49:31Z

This definitely feels like a documentation issue as well. I asked on twitter and the reason for running single master was essentially to avoid other bugs. I don't think it's an unfair assumption that a user would expect having a quorum of preexisting data nodes to be enough to promote a new master or rebuild it without data loss. I would, from a purely semantic perspective, expect that data nodes would have ALL the data needed for the cluster and that the master's state would live with the rest of the "data"

Also would it not make sense for maybe non-master eligible nodes to at least provide a backup of the master node cluster metadata for this case and as a safety precaution?

Ftr, I have no direct impact from this issue. Just another production ES user who tracks this stuff.

lusis · 2014-12-09T00:50:30Z

Sorry missed the part where you mention possibly storing the backup on data nodes.

grantr · 2014-12-09T03:07:58Z

Repro of this bug: https://gist.github.com/grantr/a53a9b6b91005ad9807f

This is more than a documentation issue. Even when running in a degraded configuration, shards should never be deleted if their metadata can't be found.

bleskes · 2014-12-09T07:59:09Z

Also would it not make sense for maybe non-master eligible nodes to at least provide a backup of the master node cluster metadata for this case and as a safety precaution?

This is indeed the plan

Today if a shard contains a segment that is from Lucene 3.x and therefore throws an `IndexFormatTooOldException` the nodes goes into a wild allocation loop if the index is directly recovered from the gateway. If the problematic shard is allocated later due to other reasons the shard will fail allocation and downgrading the cluster might be impossible since new segments in other indices have already been written. This commit adds santiy checks to the GatewayMetaState that tries to read the SegmentsInfo for every shard on the node and fails if a shard is corrupted or the index is too new etc. With the new data_path per index feature nodes might not have enough information unless they are master eligable since we used to not persist the index and global state on nodes that are not master eligable. This commit changes this behavior and writes the state on all nodes that hold data. This in an enhacement itself since data nodes that are not master eligable are not selfcontained today. This change also fixes the issue see in elastic#8823 since metadata is written on all data nodes now. Closes elastic#8823

When a node was a data node only then the index state was not written. In case this node connected to a master that did not have the index in the cluster state, for example because a master was restarted and the data folder was lost, then the indices were not imported as dangling but instead deleted. This commit makes sure that index state for data nodes is also written if they have at least one shard of this index allocated. closes elastic#8823

mkliu · 2015-03-07T06:46:11Z

+1111
This bug really burnt me! We had multiple master node, but there was one time all master nodes are down, so I promote one data node as master node. And all data are gone! It freaked the hell out of me. And because people are still ingesting data, the data are overwritten. By the time I realize what's happening, it's already too late, we lost lots of data...

polgl · 2015-03-09T14:16:03Z

Can the deletion be delayed for a longer period by setting a high value for gateway.local.dangling_timeout (couple of days maybe?)?

s1monw · 2015-03-09T16:14:17Z

@polgl we remove the deleting and always import now since #10016 is pushed to 2.0

polgl · 2015-03-09T17:26:22Z

I'm running 1.4 and can not update now, so i would prefer to change the settings and have more time to react.

When a node was a data node only then the index state was not written. In case this node connected to a master that did not have the index in the cluster state, for example because a master was restarted and the data folder was lost, then the indices were not imported as dangling but instead deleted. This commit makes sure that index state for data nodes is also written if they have at least one shard of this index allocated. closes elastic#8823

s1monw · 2015-03-12T18:51:25Z

@polgl you can modify the settings yourself and just to import always?

polgl · 2015-03-12T20:30:49Z

Hi, I did some tests and it seems like our cluster does not show this behavior.
We have a single dedicated master node, if this node does not have an index in the meta data / cluster state. nothing happens. The data nodes don't delete any files (the dangling indices part in the code is not executed). That sounds ok, I can live with that.

Thanks for you help

When a node was a data node only then the index state was not written. In case this node connected to a master that did not have the index in the cluster state, for example because a master was restarted and the data folder was lost, then the indices were not imported as dangling but instead deleted. This commit makes sure that index state for data nodes is also written if they have at least one shard of this index allocated. closes elastic#8823

When a node was a data node only then the index state was not written. In case this node connected to a master that did not have the index in the cluster state, for example because a master was restarted and the data folder was lost, then the indices were not imported as dangling but instead deleted. This commit makes sure that index state for data nodes is also written if they have at least one shard of this index allocated. closes #8823 closes #9952

saurabh24292 · 2016-06-27T08:15:13Z

ES Version 2.2

Previous state - A cluster with 6 nodes (3 Master cum data nodes, 3 client nodes).
Index has data for two months (1st April 2016 to 31st May 2016.) Actually, everyday, previous day's data is added and two months old data is deleted.
I restarted the cluster. All of a sudden, 90% of data for the date range 5th May 2016 to 31st May 2016 is gone (average record for these days goes down from 100000 per day to 10000 per day) and, surprisingly, deleted data fro date range 5th March 2016 to 31st March 2016 reappears.

What's the problem?

saurabh24292 · 2016-06-27T08:19:53Z

correction - version is 2.1.1

bleskes · 2016-06-27T13:24:30Z

@saurabh24292 I'm not sure what your problem is, but maybe you can ask on discuss.elastic.co? if we figure out it's related to this issue or is cause by something other problem we can re-open this or (more likely) open a new one

bleskes self-assigned this Dec 8, 2014

clintongormley mentioned this issue Dec 19, 2014

1.4.2 upgrade - alias re-added that was previously deleted #8987

Closed

clintongormley added >enhancement :Core/Infra/Core Core issues without another label :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. resiliency v1.5.0 and removed :Core/Infra/Core Core issues without another label labels Jan 21, 2015

s1monw mentioned this issue Jan 30, 2015

Fail node startup if shards index format is too old / new #9515

Closed

clintongormley changed the title ~~Mapping conflicts result in indexes being deleted~~ Write index metadata on data nodes where shards allocated Feb 9, 2015

brwe assigned brwe and unassigned bleskes Feb 26, 2015

brwe mentioned this issue Mar 2, 2015

Write state also on data nodes if not master eligible #9952

Closed

s1monw added v1.6.0 and removed v1.5.0 labels Mar 17, 2015

brwe mentioned this issue Mar 18, 2015

Deleted index comes back #10054

Closed

brwe closed this as completed in 4088dd3 Apr 29, 2015

brwe reopened this Apr 29, 2015

brwe added v2.0.0-beta1 and removed v1.6.0 labels May 5, 2015

brwe closed this as completed in 3cda9b2 May 5, 2015

brwe mentioned this issue Jun 15, 2015

Concurrent deletion of indices and master failure can cause indices to be reimported #11665

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write index metadata on data nodes where shards allocated #8823

Write index metadata on data nodes where shards allocated #8823

bobpoekert commented Dec 8, 2014

vjanelle commented Dec 8, 2014

bleskes commented Dec 8, 2014

bobpoekert commented Dec 8, 2014

vjanelle commented Dec 8, 2014

bobpoekert commented Dec 8, 2014

bleskes commented Dec 8, 2014

bobpoekert commented Dec 8, 2014

bleskes commented Dec 8, 2014

bobpoekert commented Dec 8, 2014

bleskes commented Dec 8, 2014

lusis commented Dec 9, 2014

lusis commented Dec 9, 2014

grantr commented Dec 9, 2014

bleskes commented Dec 9, 2014

mkliu commented Mar 7, 2015

polgl commented Mar 9, 2015

s1monw commented Mar 9, 2015

polgl commented Mar 9, 2015

s1monw commented Mar 12, 2015

polgl commented Mar 12, 2015

saurabh24292 commented Jun 27, 2016

saurabh24292 commented Jun 27, 2016

bleskes commented Jun 27, 2016

Write index metadata on data nodes where shards allocated #8823

Write index metadata on data nodes where shards allocated #8823

Comments

bobpoekert commented Dec 8, 2014

vjanelle commented Dec 8, 2014

bleskes commented Dec 8, 2014

bobpoekert commented Dec 8, 2014

vjanelle commented Dec 8, 2014

bobpoekert commented Dec 8, 2014

bleskes commented Dec 8, 2014

bobpoekert commented Dec 8, 2014

bleskes commented Dec 8, 2014

bobpoekert commented Dec 8, 2014

bleskes commented Dec 8, 2014

lusis commented Dec 9, 2014

lusis commented Dec 9, 2014

grantr commented Dec 9, 2014

bleskes commented Dec 9, 2014

mkliu commented Mar 7, 2015

polgl commented Mar 9, 2015

s1monw commented Mar 9, 2015

polgl commented Mar 9, 2015

s1monw commented Mar 12, 2015

polgl commented Mar 12, 2015

saurabh24292 commented Jun 27, 2016

saurabh24292 commented Jun 27, 2016

bleskes commented Jun 27, 2016