Operator applying changes breaks index when bulk inserting to non-replicated index #6555

EgbertW · 2023-03-21T09:31:35Z

Bug Report

What did you do?
I applied changes to the CRD of an Elasticsearch resource that trigger data nodes to be restarted, while the cluster is busy doing a load of bulk inserts to a index with number_of_replicas set to 0 (in order to optimize indexing speed).

What did you expect to see?
I expect the operator to gracefully restart the nodes, or if impossible, prevent nodes from restarting.

What did you see instead? Under which circumstances?
As soon as pod restart operation reaches a node containing a primary shard without replicas, the index state turns red and does not recover, even after the pod comes back up. The affected shard is stuck in INITIALIZING state.

This works fine when scaling down pods as the operator will migrate data first, but for pod restarts this does not happen.

Environment

ECK version:
2.1.0
Elasticsearch version:
7.17.8
Kubernetes information:

Running on GKE v1.23.14-gke.1800

Logs:

The restarted pod logs the following as last line:

Unable to acquire permit to use snapshot files during recovery, this recovery will recover index files from the source node. Ensure snapshot files can be used during recovery by sett │
│ ing [indices.recovery.max_concurrent_snapshot_file_downloads] to be no greater than [25]

I encountered issue #6478 that may address this issue by calling _node/shutdown prior to shutting down a pod, but since the use case and problem there is different I am reporting this as a separate issue. If this issue will be solved by the proposed PR #6544, feel free to close this issue. In the examples given in that issue, the cluster starts to recover files from a snapshot. In this case this is impossible as it's a freshly built index on the only node set.

The text was updated successfully, but these errors were encountered:

barkbay · 2023-04-11T12:49:36Z

#6478 addresses cases where Pods are deleted outside of the control of the operator (for example during evictions). It's not relevant when a Pod is restarted by the operator itself, following an update to the Elasticsearch resource for example. Note that there have been some improvements in recent versions of ECK regarding the way the Elasticsearch shutdown API is called.

Shards are not migrated when a Pod is deleted for an update/upgrade. If there is no replica then the shards on this Pod will no longer be available until it has restarted. The fact that the shards are not available even once the Pod has restarted seems to be an Elasticsearch issue, not related to ECK.

Ensure snapshot files can be used during recovery by setting [indices.recovery.max_concurrent_snapshot_file_downloads] to be no greater than [25]

I would open a new topic in https://discuss.elastic.co/c/elastic-stack/elasticsearch/6 to explain your use case and understand how indices.recovery.max_concurrent_snapshot_file_downloads can affect the recovery process.

I'm closing because I don't think we can improve things on the operator, especially if there is no replica. Feel free to reopen with more details about how you think the operator should handle this case.

EgbertW · 2023-04-11T14:18:26Z

Thank you for your response @barkbay

I can think of various ways the operator could help here:

Do not delete any node that contains a primary shard for an index without any replicas altogether
Migrate shards without any replicas away from a node before deleting it, and wait for this process to finish
Wait until there are no indices without any replicas before initiating the rolling upgrade

In the setup I'm working on migrating to an ECK managed environment, the server mainteinance script does the second option: migrating data away from the node before taking it down. This is a slow process, especially when massive indexing is going on, but it prevents the index from breaking. Ideally, if the operator would apply this logic, it would first upgrade any nodes that do not have primary shards without any replicas, then move the shards from nodes that do have such shards over to the already upgraded nodes and finally upgrade these nodes as well.

Alternatively, if it would be possible to create some pre-shutdown script that can run user defined custom checks before shutdown and (extensively) postpone the shutdown of a node, that could be used to work around this problem as well. In the setup I'm working with, any such index is guaranteed to be either deleted or replicated to other nodes within a couple of hours. The pre-stop-hook-script.sh file seems to offer such functionality but this file in the config map is managed by the operator and currently only seems to support a fixed waiting period using PRE_STOP_ADDITIONAL_WAIT_SECONDS. In the docs I do not see any way to inject custom code in this script.

I could overwrite the configmap, but that's probably going to be overwritten by the operator at some point. Or isn't it?

botelastic bot added the triage label Mar 21, 2023

barkbay closed this as not planned Won't fix, can't repro, duplicate, stale Apr 11, 2023

thbkrkr added the >non-issue label Jun 2, 2023

botelastic bot removed the triage label Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator applying changes breaks index when bulk inserting to non-replicated index #6555

Operator applying changes breaks index when bulk inserting to non-replicated index #6555

EgbertW commented Mar 21, 2023

barkbay commented Apr 11, 2023

EgbertW commented Apr 11, 2023

Operator applying changes breaks index when bulk inserting to non-replicated index #6555

Operator applying changes breaks index when bulk inserting to non-replicated index #6555

Comments

EgbertW commented Mar 21, 2023

Bug Report

barkbay commented Apr 11, 2023

EgbertW commented Apr 11, 2023