Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE: Can't reduce cluster if kubernetes node is gone #215

Closed
dkropachev opened this issue Oct 23, 2020 · 5 comments · Fixed by #258 or #297
Closed

GKE: Can't reduce cluster if kubernetes node is gone #215

dkropachev opened this issue Oct 23, 2020 · 5 comments · Fixed by #258 or #297
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@dkropachev
Copy link
Contributor

Describe the bug
Can't reduce cluster if kubernetes node is gone

To Reproduce
Steps to reproduce the behavior:

  1. Deploy scylla-operator on GKE
  2. Deploy scylla cluster with at least 2 nodes:
    kubectl apply -n scylla -f ./examples/eks/cluster.yaml
  3. Get nodeName of the last pod:
    kubectl get pods -n scylla -o yaml | grep nodeName
  4. Kill kubernetes node:
    kubectl delete node ${nodeName}
  5. Reduce scylla cluster by 1 member:
    sed -r 's/([ \t]+)members: 3/\1members: 2/g' ./examples/eks/cluster.yaml
    kubectl apply -n scylla -f ./examples/eks/cluster.yaml
  6. Wait till node is removed:
    kubectl --namespace=scylla wait --timeout=5m --all --for=condition=Ready pod
    error: timed out waiting for the condition on pods/sct-cluster-us-east1-c-us-east1-2

Expected behavior
Node is removed
Next atempt to increase members number succeeded

Logs
operator.zip

Environment:

  • Platform: GKE
  • Kubernetes version: 1.15.12-gke.20
  • Scylla version:4.1.2
  • Scylla-operator version: e.g.: 0.2.4
@dkropachev dkropachev added the kind/bug Categorizes issue or PR as related to a bug. label Oct 23, 2020
@zimnx zimnx added this to the 1.0 milestone Oct 27, 2020
@slivne
Copy link

slivne commented Oct 27, 2020

So this is related to handling of lost host and how K8 / scylla-operator handles that

@slivne slivne added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 27, 2020
zimnx added a commit that referenced this issue Nov 20, 2020
When k8s node is gone, PVC might still have node affinity pointing
to lost node. In this situation, PVC is deleted by the Operator
and node replacement logic is triggered to restore cluster RF.

Fixes #215
zimnx added a commit that referenced this issue Nov 20, 2020
When k8s node is gone, PVC might still have node affinity pointing
to lost node. In this situation, PVC is deleted by the Operator
and node replacement logic is triggered to restore cluster RF.

Fixes #215
Fixes #114
zimnx added a commit that referenced this issue Nov 20, 2020
When k8s node is gone, PVC might still have node affinity pointing
to lost node. In this situation, PVC is deleted by the Operator
and node replacement logic is triggered to restore cluster RF.

Fixes #114
Fixes #215
@zimnx
Copy link
Collaborator

zimnx commented Nov 20, 2020

Tested following scenario on GKE (with #258)

  1. Deploy 3 node cluster
  2. kubectl delete node <last_pod_node>
  3. Decrease number of Scylla Cluster members to 2

Operator found that node has PV with node affinity set to missing node, Operator started replacing node. Once new node reached UN state, decomission happened and ScyllaCluster was successfully scaled down to 2.

$ kubectl -n scylla get pods -w
scylla-cluster-europe-west2-a-europe-west2-2   2/2     Terminating       0          4m12s
scylla-cluster-europe-west2-a-europe-west2-2   2/2     Terminating       0          4m12s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Pending           0          0s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Pending           0          1s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Init:0/2          0          1s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Init:1/2          0          5s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     PodInitializing   0          6s
scylla-cluster-europe-west2-a-europe-west2-2   1/2     Running           0          51s
scylla-cluster-europe-west2-a-europe-west2-2   2/2     Running           0          3m57s
scylla-cluster-europe-west2-a-europe-west2-2   1/2     Running           0          4m27s
scylla-cluster-europe-west2-a-europe-west2-2   1/2     Terminating       0          5m1s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Terminating       0          5m4s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Terminating       0          5m11s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Terminating       0          5m11s

$ kubectl -n scylla get pods -o wide                                              
NAME                                           READY   STATUS    RESTARTS   AGE   IP            NODE                                          NOMINATED NODE   READINESS GATES
scylla-cluster-europe-west2-a-europe-west2-0   2/2     Running   0          13m   10.240.0.58   gke-maciej-215-3-default-pool-3e692d62-73g9   <none>           <none>
scylla-cluster-europe-west2-a-europe-west2-1   2/2     Running   0          11m   10.240.0.71   gke-maciej-215-3-default-pool-3e692d62-crqt   <none>           <none>

So scale down won't happen immediately, first node must be replaced.

zimnx added a commit that referenced this issue Nov 23, 2020
When k8s node is gone, PVC might still have node affinity pointing
to lost node. In this situation, PVC is deleted by the Operator
and node replacement logic is triggered to restore cluster RF.

Fixes #114
Fixes #215
zimnx added a commit that referenced this issue Nov 23, 2020
When k8s node is gone, PVC might still have node affinity pointing
to lost node. In this situation, PVC is deleted by the Operator
and node replacement logic is triggered to restore cluster RF.

Fixes #114
Fixes #215
zimnx added a commit that referenced this issue Nov 23, 2020
When k8s node is gone, PVC might still have node affinity pointing
to lost node. In this situation, PVC is deleted by the Operator
and node replacement logic is triggered to restore cluster RF.

Fixes #114
Fixes #215
zimnx added a commit that referenced this issue Nov 23, 2020
When k8s node is gone, PVC might still have node affinity pointing
to lost node. In this situation, PVC is deleted by the Operator
and node replacement logic is triggered to restore cluster RF.

Fixes #114
Fixes #215
zimnx added a commit that referenced this issue Nov 23, 2020
When k8s node is gone, PVC might still have node affinity pointing
to lost node. In this situation, PVC is deleted by the Operator
and node replacement logic is triggered to restore cluster RF.

Fixes #114
Fixes #215
@dkropachev dkropachev reopened this Dec 7, 2020
@dkropachev
Copy link
Contributor Author

Tested following scenario on GKE (with #258)

  1. Deploy 3 node cluster
  2. kubectl delete node <last_pod_node>
  3. Decrease number of Scylla Cluster members to 2

Operator found that node has PV with node affinity set to missing node, Operator started replacing node. Once new node reached UN state, decomission happened and ScyllaCluster was successfully scaled down to 2.

$ kubectl -n scylla get pods -w
scylla-cluster-europe-west2-a-europe-west2-2   2/2     Terminating       0          4m12s
scylla-cluster-europe-west2-a-europe-west2-2   2/2     Terminating       0          4m12s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Pending           0          0s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Pending           0          1s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Init:0/2          0          1s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Init:1/2          0          5s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     PodInitializing   0          6s
scylla-cluster-europe-west2-a-europe-west2-2   1/2     Running           0          51s
scylla-cluster-europe-west2-a-europe-west2-2   2/2     Running           0          3m57s
scylla-cluster-europe-west2-a-europe-west2-2   1/2     Running           0          4m27s
scylla-cluster-europe-west2-a-europe-west2-2   1/2     Terminating       0          5m1s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Terminating       0          5m4s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Terminating       0          5m11s
scylla-cluster-europe-west2-a-europe-west2-2   0/2     Terminating       0          5m11s

$ kubectl -n scylla get pods -o wide                                              
NAME                                           READY   STATUS    RESTARTS   AGE   IP            NODE                                          NOMINATED NODE   READINESS GATES
scylla-cluster-europe-west2-a-europe-west2-0   2/2     Running   0          13m   10.240.0.58   gke-maciej-215-3-default-pool-3e692d62-73g9   <none>           <none>
scylla-cluster-europe-west2-a-europe-west2-1   2/2     Running   0          11m   10.240.0.71   gke-maciej-215-3-default-pool-3e692d62-crqt   <none>           <none>

So scale down won't happen immediately, first node must be replaced.

It is not happening on GKE, tested multiple times.

s-o stops doing things and sends TLS handshake error:

{"L":"INFO","T":"2020-12-04T19:31:48.383Z","N":"cluster-controller.replace","M":"Replace member Pod found","cluster":"scylla/sct-cluster","resourceVersion":"17507","member":"sct-cluster-us-east1-b-us-east1-2","replace_address":"10.3.249.58","ready":false,"_trace_id":"0pcVUQ4vRLuPa2RAdOlPGQ"}
{"L":"INFO","T":"2020-12-04T19:31:48.383Z","N":"cluster-controller","M":"Reconciliation successful","cluster":"scylla/sct-cluster","resourceVersion":"17507","_trace_id":"UWKHDuVeSG6hQFsMTS4qJg"}
2020/12/04 19:31:48 http: TLS handshake error from 10.0.3.1:54476: EOF
2020/12/04 19:31:58 http: TLS handshake error from 10.0.3.1:54516: EOF
2020/12/04 19:32:08 http: TLS handshake error from 10.0.3.1:54564: EOF
2020/12/04 19:32:18 http: TLS handshake error from 10.0.3.1:54598: EOF
2020/12/04 19:32:28 http: TLS handshake error from 10.0.3.1:54648: EOF
2020/12/04 19:32:38 http: TLS handshake error from 10.0.3.1:54678: EOF
2020/12/04 19:32:48 http: TLS handshake error from 10.0.3.1:54716: EOF
2020/12/04 19:32:58 http: TLS handshake error from 10.0.3.1:54744: EOF
2020/12/04 19:33:08 http: TLS handshake error from 10.0.3.1:54782: EOF
2020/12/04 19:33:18 http: TLS handshake error from 10.0.3.1:54812: EOF

Test-id: 5c40dbc7-ef2b-4792-96cf-b17739c0fbab

db-cluster: https://cloudius-jenkins-test.s3.amazonaws.com/5c40dbc7-ef2b-4792-96cf-b17739c0fbab/20201204_202713/db-cluster-5c40dbc7.zip
kubernetes: https://cloudius-jenkins-test.s3.amazonaws.com/5c40dbc7-ef2b-4792-96cf-b17739c0fbab/20201204_202713/kubernetes-5c40dbc7.zip

Test-id: 65b0a4a9-232e-49ed-9198-820a3e4bc7d3

db-cluster: https://cloudius-jenkins-test.s3.amazonaws.com/65b0a4a9-232e-49ed-9198-820a3e4bc7d3/20201206_074304/db-cluster-65b0a4a9.zip
kubernetes: https://cloudius-jenkins-test.s3.amazonaws.com/65b0a4a9-232e-49ed-9198-820a3e4bc7d3/20201206_074304/kubernetes-65b0a4a9.zip

@dkropachev
Copy link
Contributor Author

Events in scylla namespace:

found
0s          Warning   FailedScheduling       pod/sct-cluster-us-east1-b-us-east1-2                          persistentvolumeclaim "data-sct-cluster-us-east1-b-us-east1-2" not found
0s          Warning   FailedScheduling       pod/sct-cluster-us-east1-b-us-east1-2                          persistentvolumeclaim "data-sct-cluster-us-east1-b-us-east1-2" not found
0s          Warning   FailedScheduling       pod/sct-cluster-us-east1-b-us-east1-2                          persistentvolumeclaim "data-sct-cluster-us-east1-b-us-east1-2" not found
0s          Warning   FailedScheduling       pod/sct-cluster-us-east1-b-us-east1-2                          persistentvolumeclaim "data-sct-cluster-us-east1-b-us-east1-2" not found
0s          Warning   FailedScheduling       pod/sct-cluster-us-east1-b-us-east1-2                          persistentvolumeclaim "data-sct-cluster-us-east1-b-us-east1-2" not found
0s          Warning   FailedScheduling       pod/sct-cluster-us-east1-b-us-east1-2                          persistentvolumeclaim "data-sct-cluster-us-east1-b-us-east1-2" not found
0s          Warning   FailedScheduling       pod/sct-cluster-us-east1-b-us-east1-2                          persistentvolumeclaim "data-sct-cluster-us-east1-b-us-east1-2" not found
0s          Warning   FailedScheduling       pod/sct-cluster-us-east1-b-us-east1-2                          persistentvolumeclaim "data-sct-cluster-us-east1-b-us-east1-2" not found

zimnx added a commit that referenced this issue Dec 15, 2020
Finalizers in PVC caused a race between statefulset controller
and pvc provisioner. Pod spawned on next available node was missing
PVC and manual intervention was needed.
Removing finalizers from PVC prior to PVC and Pod deletion seems to
help.

Fixes #215
zimnx added a commit that referenced this issue Dec 15, 2020
Finalizers in PVC caused a race between statefulset controller
and pvc provisioner. Pod spawned on next available node was missing
PVC and manual intervention was needed.
Removing finalizers from PVC prior to PVC and Pod deletion seems to
help.

Fixes #215
@zimnx
Copy link
Collaborator

zimnx commented Dec 18, 2020

I managed to reproduce it locally two times over 20 runs on minikube and GKE.
I set up a watch on k8s etcd and it turns out that sometimes PVC was deleted after pod, and when it happened, new pod wasn't able to spawn due to above issue.
Once PVC is deleted forcefully issue no longer happened after 15 runs. I'll leave it for next couple of runs to be extra sure.

zimnx added a commit that referenced this issue Dec 18, 2020
Finalizers in PVC caused a race between statefulset controller
and pvc provisioner. Pod spawned on next available node was missing
PVC and manual intervention was needed.
Removing finalizers from PVC prior to PVC and Pod deletion seems to
help.

Fixes #215
zimnx added a commit that referenced this issue Dec 29, 2020
First initiate PVC deletion and then clear finalizers
to unblock PVC deletion.

Fixes #215
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
3 participants