Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to disable importing dangling indices on shared filesystems #16358

Closed
dakrone opened this issue Feb 1, 2016 · 2 comments
Closed
Assignees

Comments

@dakrone
Copy link
Member

dakrone commented Feb 1, 2016

We already have issues open for the behavior of importing indices when a node is out of the cluster while an index is deleted, however, in the case of shadow replicas on a shared filesystem, if the index is deleted while a node is not part of the cluster, when the node comes back it will try to re-import the index as dangling without any actual data, leading to exceptions like:

[2016-01-14 23:17:23,303][WARN ][indices.cluster ] [data1] [[my-index][2]] marking and sending shard failed due to [failed recovery]
[my-index][[my-index][2]] IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: IndexShardRecoveryException[shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: IndexNotFoundException[no segments* file found in store(default(mmapfs(/tmp/foo/my-index/2/index),niofs(/tmp/foo/my-index/2/index))): files: []];

It would be nice to have an option to disable this dangling import behavior only for indices on a shared filesystem, since this is likely to cause much more confusion than dangling imports on non-shared filesystems.

Related #13298

@dakrone dakrone self-assigned this Feb 1, 2016
@bleskes bleskes removed the discuss label Feb 12, 2016
dakrone added a commit to dakrone/elasticsearch that referenced this issue Feb 21, 2016
The `gateway.local.dangling.import_shared_fs` defaults to
`false` (meaning don't import them)

Relates to elastic#16358
@bleskes
Copy link
Contributor

bleskes commented Feb 21, 2016

Discussing the cause issue (index delete while node is off the cluster) I think we should consider this in a different way, which may help in the normal use case (non-shared FS) as well. Dangling indices are added as a safety guard against the case where a cluster goes down (or the master nodes) and because of mis configuration or because of bad operation practices (kill all masters, start fresh ones with a different data folders). In these cases the new master publishes a cluster state which is empty and we don't want to delete the data. These days we have a cluster UUID as part of the cluster meta data. That uuid is created once and never change in the duration of the cluster. That means that we can detect deletion while node left if:

  1. Store the cluster uuid in the index meta data, so we know the cluster this index belongs to.
  2. When a node gets a cluster state from a master and it doesn't contain the index it has on disk, the node can check whether the index used to be belong to the same cluster (by uuid) . If it is, we know it's deleted.
  3. If the uuid in the index is different than the uuid in the cluster, we import it.

How does this sound?

@dakrone
Copy link
Member Author

dakrone commented Feb 22, 2016

How does this sound?

I think this sounds great!

@abeyad abeyad assigned abeyad and unassigned dakrone Mar 1, 2016
abeyad pushed a commit to abeyad/elasticsearch that referenced this issue Mar 29, 2016
Previously, we would determine index deletes in the cluster state by
comparing the index metadatas between the current cluster state and the
previous cluster state and decipher which ones were missing (the missing
ones are deleted indices).  This led to confusing logic where it was
difficult to decipher if a missing index in the cluster state was truly
due to a delete and must be deleted on all nodes, or if it was due to
the cluster state being wiped out on master and thus, we want the
indices on the various nodes in the cluster to be imported as dangling
indices.

This commit introduces the notion of index tombstones in the cluster
state, where we are explicit about which indices have been deleted.
Index deletion on each node is now based on the presence of these
tombstones in the cluster state.  There is also functionality to purge
the tombstones after a 14 day expiry, so they don't stick around in the
cluster state forever.

Closes elastic#16358
abeyad pushed a commit to abeyad/elasticsearch that referenced this issue Apr 14, 2016
Previously, we would determine index deletes in the cluster state by
comparing the index metadatas between the current cluster state and the
previous cluster state and decipher which ones were missing (the missing
ones are deleted indices).  This led to a situation where a node that
went offline and rejoined the cluster could potentially cause dangling
indices to be imported which should have been deleted, because when a node
rejoins, its previous cluster state does not contain reliable state.

This commit introduces the notion of index tombstones in the cluster
state, where we are explicit about which indices have been deleted.
In the case where the previous cluster state is not useful for index
metadata comparisons, a node now determines which indices are to be
deleted based on these tombstones in the cluster state.  There is also
functionality to purge the tombstones after exceeding a certain amount.

Closes elastic#17265
Closes elastic#16358
Closes elastic#17435
@abeyad abeyad closed this as completed in d39eb2d Apr 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants