New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write index metadata on data nodes where shards allocated #8823

Closed
bobpoekert opened this Issue Dec 8, 2014 · 23 comments

Comments

Projects
None yet
@bobpoekert

bobpoekert commented Dec 8, 2014

If the master node thinks that an index does not exist, and another node thinks that it does, the conflict is currently resolved by having the node that has the index delete it. This can easily result in sudden unexpected data loss. The correct behavior would be for the conflict to be resolved by both nodes accepting the state of the node that thinks that the index exists.

@vjanelle

This comment has been minimized.

Show comment
Hide comment
@vjanelle

vjanelle Dec 8, 2014

Are you running dedicated masters with no query/data load?

vjanelle commented Dec 8, 2014

Are you running dedicated masters with no query/data load?

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Dec 8, 2014

Member

@bobpoekert can you elaborate more about what happened before the data was lost - did you update the mapping (title suggest so)? If you can share your cluster layout (dedicated master nodes or not) it would be great. Also please the grab a copy of the logs of the nodes and save them. They might give more insight.

Member

bleskes commented Dec 8, 2014

@bobpoekert can you elaborate more about what happened before the data was lost - did you update the mapping (title suggest so)? If you can share your cluster layout (dedicated master nodes or not) it would be great. Also please the grab a copy of the logs of the nodes and save them. They might give more insight.

@bleskes bleskes self-assigned this Dec 8, 2014

@bobpoekert

This comment has been minimized.

Show comment
Hide comment
@bobpoekert

bobpoekert Dec 8, 2014

@vjanelle yes

@bleskes The sequence of events was the following:

  1. Remove node (which is a master candidate) from cluster
  2. Delete all index files from said node
  3. Add node back into cluster
  4. Node is elected master
  5. All indexes in the cluster are now gone

bobpoekert commented Dec 8, 2014

@vjanelle yes

@bleskes The sequence of events was the following:

  1. Remove node (which is a master candidate) from cluster
  2. Delete all index files from said node
  3. Add node back into cluster
  4. Node is elected master
  5. All indexes in the cluster are now gone
@vjanelle

This comment has been minimized.

Show comment
Hide comment
@vjanelle

vjanelle Dec 8, 2014

Did you have data turned off on the master candidate as well in the configuration?

vjanelle commented Dec 8, 2014

Did you have data turned off on the master candidate as well in the configuration?

@bobpoekert

This comment has been minimized.

Show comment
Hide comment
@bobpoekert

bobpoekert Dec 8, 2014

@vjanelle No. The master candidate is also a data node.

bobpoekert commented Dec 8, 2014

@vjanelle No. The master candidate is also a data node.

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Dec 8, 2014

Member

@bobpoekert thx. let me make sure I understand what you are saying:

  1. when you started you had a master node running, but it was also allowed to have data (i.e., it didn't have node.data set to false in its elasticsearch.yml file). Call this node A.
  2. you brought down another node which was both a master candidate and a data node (i.e., neither node.master nor node.data was set to false). Call this node B.
    3).While B was down, you delete it's data folder (right? or was it some sub folders of it).
  3. You brought B back up and it joined the cluster.

From this point on I'm not clear. Why was node B elected as master? What happened to the residing master node A?

Member

bleskes commented Dec 8, 2014

@bobpoekert thx. let me make sure I understand what you are saying:

  1. when you started you had a master node running, but it was also allowed to have data (i.e., it didn't have node.data set to false in its elasticsearch.yml file). Call this node A.
  2. you brought down another node which was both a master candidate and a data node (i.e., neither node.master nor node.data was set to false). Call this node B.
    3).While B was down, you delete it's data folder (right? or was it some sub folders of it).
  3. You brought B back up and it joined the cluster.

From this point on I'm not clear. Why was node B elected as master? What happened to the residing master node A?

@bobpoekert

This comment has been minimized.

Show comment
Hide comment
@bobpoekert

bobpoekert Dec 8, 2014

@bleskes

  1. Cluster has a single master node (A) and two data nodes (B and C)
  2. Remove A from the cluster
  3. Now the cluster is offline (has no master)
  4. Delete A's data folder
  5. Bring A back up
  6. Now B and C have no data

bobpoekert commented Dec 8, 2014

@bleskes

  1. Cluster has a single master node (A) and two data nodes (B and C)
  2. Remove A from the cluster
  3. Now the cluster is offline (has no master)
  4. Delete A's data folder
  5. Bring A back up
  6. Now B and C have no data
@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Dec 8, 2014

Member

@bobpoekert I'm confused. You said before you had nodes that are master candidates and are also data nodes. Can you confirm that node A has node.data: false in it's settings and and that node B & C have node.master: false ?

Member

bleskes commented Dec 8, 2014

@bobpoekert I'm confused. You said before you had nodes that are master candidates and are also data nodes. Can you confirm that node A has node.data: false in it's settings and and that node B & C have node.master: false ?

@bobpoekert

This comment has been minimized.

Show comment
Hide comment
@bobpoekert

bobpoekert Dec 8, 2014

@bleskes
All the nodes have data: true.
A has master: true
B and C have master: false

bobpoekert commented Dec 8, 2014

@bleskes
All the nodes have data: true.
A has master: true
B and C have master: false

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Dec 8, 2014

Member

OK. Clear. The cluster meta data (which we call cluster state) is stored and maintained on the master nodes. That meta data contains which indices are out there in the cluster. We only write the metadata on master eligible nodes, and rely on multiple masters for redundancy of this data set (compared to specific shard data).

Since you only have one master node, that means there is no redundancy in your cluster meta data storage. After you deleted it there is nowhere to get it back from so the cluster becomes empty.

We do have a feature that is called dangling indices, which is in charge of scanning data folders for indices that are found on disk but are not part the cluster state and automatically import then into the cluster. As it is today, this feature needs to find some part of the index meta data to work, but those are also stored on the master eligible nodes, which in your case there were none.

Thinking about it, we can be more resilient in situations where users are running only a single master node (though we highly recommend running more than one), and store the index metadata wherever a shard copy is stored, so also on data nodes. So we can improve the dangling indices case to identify those as well.

Lets keep this issue open and we will work on a PR to improve things based on the above.

Member

bleskes commented Dec 8, 2014

OK. Clear. The cluster meta data (which we call cluster state) is stored and maintained on the master nodes. That meta data contains which indices are out there in the cluster. We only write the metadata on master eligible nodes, and rely on multiple masters for redundancy of this data set (compared to specific shard data).

Since you only have one master node, that means there is no redundancy in your cluster meta data storage. After you deleted it there is nowhere to get it back from so the cluster becomes empty.

We do have a feature that is called dangling indices, which is in charge of scanning data folders for indices that are found on disk but are not part the cluster state and automatically import then into the cluster. As it is today, this feature needs to find some part of the index meta data to work, but those are also stored on the master eligible nodes, which in your case there were none.

Thinking about it, we can be more resilient in situations where users are running only a single master node (though we highly recommend running more than one), and store the index metadata wherever a shard copy is stored, so also on data nodes. So we can improve the dangling indices case to identify those as well.

Lets keep this issue open and we will work on a PR to improve things based on the above.

@lusis

This comment has been minimized.

Show comment
Hide comment
@lusis

lusis Dec 9, 2014

This definitely feels like a documentation issue as well. I asked on twitter and the reason for running single master was essentially to avoid other bugs. I don't think it's an unfair assumption that a user would expect having a quorum of preexisting data nodes to be enough to promote a new master or rebuild it without data loss. I would, from a purely semantic perspective, expect that data nodes would have ALL the data needed for the cluster and that the master's state would live with the rest of the "data"

Also would it not make sense for maybe non-master eligible nodes to at least provide a backup of the master node cluster metadata for this case and as a safety precaution?

Ftr, I have no direct impact from this issue. Just another production ES user who tracks this stuff.

lusis commented Dec 9, 2014

This definitely feels like a documentation issue as well. I asked on twitter and the reason for running single master was essentially to avoid other bugs. I don't think it's an unfair assumption that a user would expect having a quorum of preexisting data nodes to be enough to promote a new master or rebuild it without data loss. I would, from a purely semantic perspective, expect that data nodes would have ALL the data needed for the cluster and that the master's state would live with the rest of the "data"

Also would it not make sense for maybe non-master eligible nodes to at least provide a backup of the master node cluster metadata for this case and as a safety precaution?

Ftr, I have no direct impact from this issue. Just another production ES user who tracks this stuff.

@lusis

This comment has been minimized.

Show comment
Hide comment
@lusis

lusis Dec 9, 2014

Sorry missed the part where you mention possibly storing the backup on data nodes.

lusis commented Dec 9, 2014

Sorry missed the part where you mention possibly storing the backup on data nodes.

@grantr

This comment has been minimized.

Show comment
Hide comment
@grantr

grantr Dec 9, 2014

Repro of this bug: https://gist.github.com/grantr/a53a9b6b91005ad9807f

This is more than a documentation issue. Even when running in a degraded configuration, shards should never be deleted if their metadata can't be found.

grantr commented Dec 9, 2014

Repro of this bug: https://gist.github.com/grantr/a53a9b6b91005ad9807f

This is more than a documentation issue. Even when running in a degraded configuration, shards should never be deleted if their metadata can't be found.

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Dec 9, 2014

Member

Also would it not make sense for maybe non-master eligible nodes to at least provide a backup of the master node cluster metadata for this case and as a safety precaution?

This is indeed the plan

Member

bleskes commented Dec 9, 2014

Also would it not make sense for maybe non-master eligible nodes to at least provide a backup of the master node cluster metadata for this case and as a safety precaution?

This is indeed the plan

s1monw added a commit to s1monw/elasticsearch that referenced this issue Jan 30, 2015

Fail node startup if shards index format is too old / new
Today if a shard contains a segment that is from Lucene 3.x and
therefore throws an `IndexFormatTooOldException` the nodes goes into
a wild allocation loop if the index is directly recovered from the gateway.
If the problematic shard is allocated later due to other reasons the shard
will fail allocation and downgrading the cluster might be impossible since
new segments in other indices have already been written.

This commit adds santiy checks to the GatewayMetaState that tries to read
the SegmentsInfo for every shard on the node and fails if a shard is corrupted
or the index is too new etc.

With the new data_path per index feature nodes might not have enough information
unless they are master eligable since we used to not persist the index and global
state on nodes that are not master eligable. This commit changes this behavior and
writes the state on all nodes that hold data. This in an enhacement itself since
data nodes that are not master eligable are not selfcontained today.

This change also fixes the issue see in #8823 since metadata is written on all
data nodes now.

Closes #8823

@clintongormley clintongormley changed the title from Mapping conflicts result in indexes being deleted to Write index metadata on data nodes where shards allocated Feb 9, 2015

@brwe brwe assigned brwe and unassigned bleskes Feb 26, 2015

brwe added a commit to brwe/elasticsearch that referenced this issue Mar 2, 2015

Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823

brwe added a commit to brwe/elasticsearch that referenced this issue Mar 5, 2015

Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823
@mkliu

This comment has been minimized.

Show comment
Hide comment
@mkliu

mkliu Mar 7, 2015

+1111
This bug really burnt me! We had multiple master node, but there was one time all master nodes are down, so I promote one data node as master node. And all data are gone! It freaked the hell out of me. And because people are still ingesting data, the data are overwritten. By the time I realize what's happening, it's already too late, we lost lots of data...

mkliu commented Mar 7, 2015

+1111
This bug really burnt me! We had multiple master node, but there was one time all master nodes are down, so I promote one data node as master node. And all data are gone! It freaked the hell out of me. And because people are still ingesting data, the data are overwritten. By the time I realize what's happening, it's already too late, we lost lots of data...

@polgl

This comment has been minimized.

Show comment
Hide comment
@polgl

polgl Mar 9, 2015

Can the deletion be delayed for a longer period by setting a high value for gateway.local.dangling_timeout (couple of days maybe?)?

polgl commented Mar 9, 2015

Can the deletion be delayed for a longer period by setting a high value for gateway.local.dangling_timeout (couple of days maybe?)?

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Mar 9, 2015

Contributor

@polgl we remove the deleting and always import now since #10016 is pushed to 2.0

Contributor

s1monw commented Mar 9, 2015

@polgl we remove the deleting and always import now since #10016 is pushed to 2.0

@polgl

This comment has been minimized.

Show comment
Hide comment
@polgl

polgl Mar 9, 2015

I'm running 1.4 and can not update now, so i would prefer to change the settings and have more time to react.

polgl commented Mar 9, 2015

I'm running 1.4 and can not update now, so i would prefer to change the settings and have more time to react.

brwe added a commit to brwe/elasticsearch that referenced this issue Mar 10, 2015

Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823
@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Mar 12, 2015

Contributor

@polgl you can modify the settings yourself and just to import always?

Contributor

s1monw commented Mar 12, 2015

@polgl you can modify the settings yourself and just to import always?

@polgl

This comment has been minimized.

Show comment
Hide comment
@polgl

polgl Mar 12, 2015

Hi, I did some tests and it seems like our cluster does not show this behavior.
We have a single dedicated master node, if this node does not have an index in the meta data / cluster state. nothing happens. The data nodes don't delete any files (the dangling indices part in the code is not executed). That sounds ok, I can live with that.

Thanks for you help

polgl commented Mar 12, 2015

Hi, I did some tests and it seems like our cluster does not show this behavior.
We have a single dedicated master node, if this node does not have an index in the meta data / cluster state. nothing happens. The data nodes don't delete any files (the dangling indices part in the code is not executed). That sounds ok, I can live with that.

Thanks for you help

@s1monw s1monw added v1.6.0 and removed v1.5.0 labels Mar 17, 2015

brwe added a commit to brwe/elasticsearch that referenced this issue Apr 29, 2015

Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823

brwe added a commit that referenced this issue Apr 29, 2015

Write state also on data nodes if not master eligible
When a node was a data node only then the index state was not written.
In case this node connected to a master that did not have the index
in the cluster state, for example because a master was restarted and
the data folder was lost, then the indices were not imported as dangling
but instead deleted.
This commit makes sure that index state for data nodes is also written
if they have at least one shard of this index allocated.

closes #8823
closes #9952

@brwe brwe closed this in 4088dd3 Apr 29, 2015

@brwe brwe reopened this Apr 29, 2015

@brwe brwe added v2.0.0-beta1 and removed v1.6.0 labels May 5, 2015

@brwe brwe closed this in 3cda9b2 May 5, 2015

@saurabh24292

This comment has been minimized.

Show comment
Hide comment
@saurabh24292

saurabh24292 Jun 27, 2016

ES Version 2.2

Previous state - A cluster with 6 nodes (3 Master cum data nodes, 3 client nodes).
Index has data for two months (1st April 2016 to 31st May 2016.) Actually, everyday, previous day's data is added and two months old data is deleted.
I restarted the cluster. All of a sudden, 90% of data for the date range 5th May 2016 to 31st May 2016 is gone (average record for these days goes down from 100000 per day to 10000 per day) and, surprisingly, deleted data fro date range 5th March 2016 to 31st March 2016 reappears.

What's the problem?

saurabh24292 commented Jun 27, 2016

ES Version 2.2

Previous state - A cluster with 6 nodes (3 Master cum data nodes, 3 client nodes).
Index has data for two months (1st April 2016 to 31st May 2016.) Actually, everyday, previous day's data is added and two months old data is deleted.
I restarted the cluster. All of a sudden, 90% of data for the date range 5th May 2016 to 31st May 2016 is gone (average record for these days goes down from 100000 per day to 10000 per day) and, surprisingly, deleted data fro date range 5th March 2016 to 31st March 2016 reappears.

What's the problem?

@saurabh24292

This comment has been minimized.

Show comment
Hide comment
@saurabh24292

saurabh24292 Jun 27, 2016

correction - version is 2.1.1

saurabh24292 commented Jun 27, 2016

correction - version is 2.1.1

@bleskes

This comment has been minimized.

Show comment
Hide comment
@bleskes

bleskes Jun 27, 2016

Member

@saurabh24292 I'm not sure what your problem is, but maybe you can ask on discuss.elastic.co? if we figure out it's related to this issue or is cause by something other problem we can re-open this or (more likely) open a new one

Member

bleskes commented Jun 27, 2016

@saurabh24292 I'm not sure what your problem is, but maybe you can ask on discuss.elastic.co? if we figure out it's related to this issue or is cause by something other problem we can re-open this or (more likely) open a new one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment