-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operator applying changes breaks index when bulk inserting to non-replicated index #6555
Comments
#6478 addresses cases where Shards are not migrated when a
I would open a new topic in https://discuss.elastic.co/c/elastic-stack/elasticsearch/6 to explain your use case and understand how I'm closing because I don't think we can improve things on the operator, especially if there is no replica. Feel free to reopen with more details about how you think the operator should handle this case. |
Thank you for your response @barkbay I can think of various ways the operator could help here:
In the setup I'm working on migrating to an ECK managed environment, the server mainteinance script does the second option: migrating data away from the node before taking it down. This is a slow process, especially when massive indexing is going on, but it prevents the index from breaking. Ideally, if the operator would apply this logic, it would first upgrade any nodes that do not have primary shards without any replicas, then move the shards from nodes that do have such shards over to the already upgraded nodes and finally upgrade these nodes as well. Alternatively, if it would be possible to create some pre-shutdown script that can run user defined custom checks before shutdown and (extensively) postpone the shutdown of a node, that could be used to work around this problem as well. In the setup I'm working with, any such index is guaranteed to be either deleted or replicated to other nodes within a couple of hours. The I could overwrite the configmap, but that's probably going to be overwritten by the operator at some point. Or isn't it? |
Bug Report
What did you do?
I applied changes to the CRD of an Elasticsearch resource that trigger data nodes to be restarted, while the cluster is busy doing a load of bulk inserts to a index with
number_of_replicas
set to 0 (in order to optimize indexing speed).What did you expect to see?
I expect the operator to gracefully restart the nodes, or if impossible, prevent nodes from restarting.
What did you see instead? Under which circumstances?
As soon as pod restart operation reaches a node containing a primary shard without replicas, the index state turns red and does not recover, even after the pod comes back up. The affected shard is stuck in INITIALIZING state.
This works fine when scaling down pods as the operator will migrate data first, but for pod restarts this does not happen.
Environment
ECK version:
2.1.0
Elasticsearch version:
7.17.8
Kubernetes information:
Running on GKE v1.23.14-gke.1800
The restarted pod logs the following as last line:
I encountered issue #6478 that may address this issue by calling _node/shutdown prior to shutting down a pod, but since the use case and problem there is different I am reporting this as a separate issue. If this issue will be solved by the proposed PR #6544, feel free to close this issue. In the examples given in that issue, the cluster starts to recover files from a snapshot. In this case this is impossible as it's a freshly built index on the only node set.
The text was updated successfully, but these errors were encountered: