Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support smoother k8s nodes rotation when using local volumes #2806

Open
sebgl opened this issue Apr 2, 2020 · 5 comments
Open

Support smoother k8s nodes rotation when using local volumes #2806

sebgl opened this issue Apr 2, 2020 · 5 comments
Labels
>feature Adds or discusses adding a feature to the product

Comments

@sebgl
Copy link
Contributor

sebgl commented Apr 2, 2020

When using local volumes, it can be quite complicated to handle Kubernetes nodes upgrades.
One common way to upgrade a k8s node is to take it out of the cluster, and replace it with a fresh new one. In which case the local volume is lost, and the corresponding Elasticsearch Pod stays Pending forever.

When that happens, the only way out is to manually remove both Pod and PVC, so a new Pod gets created with a new volume.

In an ideal world to simplify this, we would like to:

  • migrate data away from the ES node that will be removed (the k8s node is probably being drained at k8s level already)
  • once that node is removed, and the corresponding Pod becomes Pending, ECK would delete both Pod and PVC so they are recreated elsewhere
  • this is a mode of operation the user would probably have to indicate somewhere (in the Elasticsearch spec?). Doing it automatically feels complicated (how long should we wait? will the node come back?) and dangerous.

Related discuss issue: https://discuss.elastic.co/t/does-eck-support-local-persistent-disks-and-is-it-a-good-idea/223515/3

@sebgl sebgl added the >feature Adds or discusses adding a feature to the product label Apr 2, 2020
@weeco
Copy link

weeco commented Apr 4, 2020

I am not sure whether this is helpful at all because it's at such a high level, but I think the ECK operator could watch the ES data nodes and it's corresponding kubernetes nodes.

In the moment where let's say data-node-0 is no longer scheduled to k8s-node-abc but another node (for whatever reason) you can assume that this Elastic node has lost it's data. If this is the case the ECK operator can delete / recreate the PVC so that the pod is no longer pending anymore.

Does that make sense or am I missing something?

@FingerLiu
Copy link

this simple script seems work, but not proven in prod.

kubectl cordon k8s-node-abc

kubectl delete pvc -es-xxx --force --grace-period=0

kubectl drain k8s-node-abc --delete-local-data --ignore-daemonsets

kubectl uncordon k8s-node-abc

@thbkrkr
Copy link
Contributor

thbkrkr commented Mar 18, 2021

Relates to #2448.

@Jacse
Copy link

Jacse commented Mar 11, 2022

We've run into this exact issue two times now. When we try to upgrade the k8s version in our nodepool, we lose all our data and the cluster goes into a completely broken state.

I don't know how it works with other providers, but I can speak for GKE. We have a cluster with 3 nodes and an index with 2 shards and 1 replica per shard

What I believe happens is the following:

  1. GKE initiates a node pool version upgrade
  2. A node is drained and its pod is deleted along with local data
  3. A new node spins up with a new pod
  4. The new pod starts receiving data from replica shards stored on the other nodes
  5. GKE respects the pod disruption budget of max 1 unavailable pod, but its patience dies out at 1 hour and after 1 hour it continues the upgrade with a new node ("Note: During automatic or manual node upgrades, PDBs are respected for a maximum of 1 hour. If Pods running on a node cannot be scheduled onto new nodes within 1 hour, the upgrade is initiated, regardless.", from here)
  6. A new node is drained (now 2/3 unhealthy) and everything is a mess

The logs from GKE show that almost exactly one hour passes between each node teardown.

@othmane399
Copy link

othmane399 commented Sep 5, 2022

Hello,

Here's our approach to upgrade k8s on local storage node groups:

  • create a k8s upgraded node group with a dedicated label (let's say group: beta)
  • change the name and the nodeSelector label of the nodeSet to upgrade, and patch the updateStrategy as following
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch

metadata:
  name: cluster

spec:
  version: 8.3.3

  # Add before removing, ensuring no data is ever lost
  updateStrategy:
    changeBudget:
      maxSurge: 1
      maxUnavailable: 0

  nodeSets:
  
  # Change name to beta (required)
  - name: alpha
    count: 2

    podTemplate:
      spec:
        # Plug on 1 group
        # Change to beta
        nodeSelector:
          group: alpha
  • delete the old node group once all shards has been migrated, and revert the updateStrategy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature Adds or discusses adding a feature to the product
Projects
None yet
Development

No branches or pull requests

6 participants