Write index metadata on data nodes where shards allocated #8823

Closed
bobpoekert opened this Issue Dec 8, 2014 · 23 comments

Projects

None yet
@bobpoekert

If the master node thinks that an index does not exist, and another node thinks that it does, the conflict is currently resolved by having the node that has the index delete it. This can easily result in sudden unexpected data loss. The correct behavior would be for the conflict to be resolved by both nodes accepting the state of the node that thinks that the index exists.

@vjanelle
vjanelle commented Dec 8, 2014

Are you running dedicated masters with no query/data load?

@bleskes
Member
bleskes commented Dec 8, 2014

@bobpoekert can you elaborate more about what happened before the data was lost - did you update the mapping (title suggest so)? If you can share your cluster layout (dedicated master nodes or not) it would be great. Also please the grab a copy of the logs of the nodes and save them. They might give more insight.

@bleskes bleskes self-assigned this Dec 8, 2014
@bobpoekert

@vjanelle yes

@bleskes The sequence of events was the following:

  1. Remove node (which is a master candidate) from cluster
  2. Delete all index files from said node
  3. Add node back into cluster
  4. Node is elected master
  5. All indexes in the cluster are now gone
@vjanelle
vjanelle commented Dec 8, 2014

Did you have data turned off on the master candidate as well in the configuration?

@bobpoekert

@vjanelle No. The master candidate is also a data node.

@bleskes
Member
bleskes commented Dec 8, 2014

@bobpoekert thx. let me make sure I understand what you are saying:

  1. when you started you had a master node running, but it was also allowed to have data (i.e., it didn't have node.data set to false in its elasticsearch.yml file). Call this node A.
  2. you brought down another node which was both a master candidate and a data node (i.e., neither node.master nor node.data was set to false). Call this node B.
    3).While B was down, you delete it's data folder (right? or was it some sub folders of it).
  3. You brought B back up and it joined the cluster.

From this point on I'm not clear. Why was node B elected as master? What happened to the residing master node A?

@bobpoekert

@bleskes

  1. Cluster has a single master node (A) and two data nodes (B and C)
  2. Remove A from the cluster
  3. Now the cluster is offline (has no master)
  4. Delete A's data folder
  5. Bring A back up
  6. Now B and C have no data
@bleskes
Member
bleskes commented Dec 8, 2014

@bobpoekert I'm confused. You said before you had nodes that are master candidates and are also data nodes. Can you confirm that node A has node.data: false in it's settings and and that node B & C have node.master: false ?

@bobpoekert

@bleskes
All the nodes have data: true.
A has master: true
B and C have master: false

@bleskes
Member
bleskes commented Dec 8, 2014

OK. Clear. The cluster meta data (which we call cluster state) is stored and maintained on the master nodes. That meta data contains which indices are out there in the cluster. We only write the metadata on master eligible nodes, and rely on multiple masters for redundancy of this data set (compared to specific shard data).

Since you only have one master node, that means there is no redundancy in your cluster meta data storage. After you deleted it there is nowhere to get it back from so the cluster becomes empty.

We do have a feature that is called dangling indices, which is in charge of scanning data folders for indices that are found on disk but are not part the cluster state and automatically import then into the cluster. As it is today, this feature needs to find some part of the index meta data to work, but those are also stored on the master eligible nodes, which in your case there were none.

Thinking about it, we can be more resilient in situations where users are running only a single master node (though we highly recommend running more than one), and store the index metadata wherever a shard copy is stored, so also on data nodes. So we can improve the dangling indices case to identify those as well.

Lets keep this issue open and we will work on a PR to improve things based on the above.

@lusis
lusis commented Dec 9, 2014

This definitely feels like a documentation issue as well. I asked on twitter and the reason for running single master was essentially to avoid other bugs. I don't think it's an unfair assumption that a user would expect having a quorum of preexisting data nodes to be enough to promote a new master or rebuild it without data loss. I would, from a purely semantic perspective, expect that data nodes would have ALL the data needed for the cluster and that the master's state would live with the rest of the "data"

Also would it not make sense for maybe non-master eligible nodes to at least provide a backup of the master node cluster metadata for this case and as a safety precaution?

Ftr, I have no direct impact from this issue. Just another production ES user who tracks this stuff.

@lusis
lusis commented Dec 9, 2014

Sorry missed the part where you mention possibly storing the backup on data nodes.

@grantr
grantr commented Dec 9, 2014

Repro of this bug: https://gist.github.com/grantr/a53a9b6b91005ad9807f

This is more than a documentation issue. Even when running in a degraded configuration, shards should never be deleted if their metadata can't be found.

@bleskes
Member
bleskes commented Dec 9, 2014

Also would it not make sense for maybe non-master eligible nodes to at least provide a backup of the master node cluster metadata for this case and as a safety precaution?

This is indeed the plan

@s1monw s1monw added a commit to s1monw/elasticsearch that referenced this issue Jan 30, 2015
@s1monw s1monw Fail node startup if shards index format is too old / new
Today if a shard contains a segment that is from Lucene 3.x and
therefore throws an `IndexFormatTooOldException` the nodes goes into
a wild allocation loop if the index is directly recovered from the gateway.
If the problematic shard is allocated later due to other reasons the shard
will fail allocation and downgrading the cluster might be impossible since
new segments in other indices have already been written.

This commit adds santiy checks to the GatewayMetaState that tries to read
the SegmentsInfo for every shard on the node and fails if a shard is corrupted
or the index is too new etc.

With the new data_path per index feature nodes might not have enough information
unless they are master eligable since we used to not persist the index and global
state on nodes that are not master eligable. This commit changes this behavior and
writes the state on all nodes that hold data. This in an enhacement itself since
data nodes that are not master eligable are not selfcontained today.

This change also fixes the issue see in #8823 since metadata is written on all
data nodes now.

Closes #8823
dd0234a
@clintongormley clintongormley changed the title from Mapping conflicts result in indexes being deleted to Write index metadata on data nodes where shards allocated Feb 9, 2015
@brwe brwe assigned brwe and unassigned bleskes Feb 26, 2015
@brwe brwe added a commit to brwe/elasticsearch that referenced this issue Mar 2, 2015
@brwe brwe Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823
38ad4d5
@brwe brwe added a commit to brwe/elasticsearch that referenced this issue Mar 5, 2015
@brwe brwe Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823
b08d4b7
@mkliu
mkliu commented Mar 7, 2015

+1111
This bug really burnt me! We had multiple master node, but there was one time all master nodes are down, so I promote one data node as master node. And all data are gone! It freaked the hell out of me. And because people are still ingesting data, the data are overwritten. By the time I realize what's happening, it's already too late, we lost lots of data...

@polgl
polgl commented Mar 9, 2015

Can the deletion be delayed for a longer period by setting a high value for gateway.local.dangling_timeout (couple of days maybe?)?

@s1monw
Contributor
s1monw commented Mar 9, 2015

@polgl we remove the deleting and always import now since #10016 is pushed to 2.0

@polgl
polgl commented Mar 9, 2015

I'm running 1.4 and can not update now, so i would prefer to change the settings and have more time to react.

@brwe brwe added a commit to brwe/elasticsearch that referenced this issue Mar 10, 2015
@brwe brwe Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823
b583799
@s1monw
Contributor
s1monw commented Mar 12, 2015

@polgl you can modify the settings yourself and just to import always?

@polgl
polgl commented Mar 12, 2015

Hi, I did some tests and it seems like our cluster does not show this behavior.
We have a single dedicated master node, if this node does not have an index in the meta data / cluster state. nothing happens. The data nodes don't delete any files (the dangling indices part in the code is not executed). That sounds ok, I can live with that.

Thanks for you help

@s1monw s1monw added v1.6.0 and removed v1.5.0 labels Mar 17, 2015
@brwe brwe added a commit to brwe/elasticsearch that referenced this issue Apr 29, 2015
@brwe brwe Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823
647eb22
@brwe brwe added a commit that referenced this issue Apr 29, 2015
@brwe brwe Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823
closes #9952
c3a1729
@brwe brwe added a commit that closed this issue Apr 29, 2015
@brwe brwe Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823
closes #9952
4088dd3
@brwe brwe closed this in 4088dd3 Apr 29, 2015
@brwe brwe reopened this Apr 29, 2015
@brwe brwe added v2.0.0 and removed v1.6.0 labels May 5, 2015
@brwe brwe added a commit that closed this issue May 5, 2015
@brwe brwe Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823
closes #9952
3cda9b2
@brwe brwe closed this in 3cda9b2 May 5, 2015
@saurabh24292

ES Version 2.2

Previous state - A cluster with 6 nodes (3 Master cum data nodes, 3 client nodes).
Index has data for two months (1st April 2016 to 31st May 2016.) Actually, everyday, previous day's data is added and two months old data is deleted.
I restarted the cluster. All of a sudden, 90% of data for the date range 5th May 2016 to 31st May 2016 is gone (average record for these days goes down from 100000 per day to 10000 per day) and, surprisingly, deleted data fro date range 5th March 2016 to 31st March 2016 reappears.

What's the problem?

@saurabh24292

correction - version is 2.1.1

@bleskes
Member
bleskes commented Jun 27, 2016

@saurabh24292 I'm not sure what your problem is, but maybe you can ask on discuss.elastic.co? if we figure out it's related to this issue or is cause by something other problem we can re-open this or (more likely) open a new one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment