kops rolling-update doesn't de-register instances from ELB network load balancer gracefully #11256

amorey · 2021-04-18T09:27:12Z

1. What kops version are you running? The command kops version, will display
this information.

1.20.0

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.20.5

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

kops rolling-update cluster <cluster-name> --instance-group <node-group-name> --yes --force

5. What happened after the commands executed?

Before running kops rolling-update, I ran a script in a loop to make HTTP requests to an NLB endpoint sitting in front of an echo server. At the point when kops rolling-update reported that it was stopping an instance, the HTTP requests started hanging and then recovered after ~90 seconds.

6. What did you expect to happen?

I expected the HTTP requests to continue to be handled successfully.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

I believe this issue is due to the fact that kops rolling-update detaches instances from their Auto Scaling groups without de-registering them from their NLB target groups first. My targetgroup has a de-registration delay of 90 seconds so this might explain the 90 second recovery time.

The text was updated successfully, but these errors were encountered:

rifelpet · 2021-04-18T13:20:02Z

I think the 90 seconds of dropped requests are due to the normal lifecycle of the pod and aren't specific to kops rolling-update. If you kubectl delete pod the echoserver pod, do you experience the same dropped requests?

If so you can avoid this by ensuring the echoserver's deployment replicas > 1, a PodDisruptionBudget matching the deployment pods with minAvailable: 1, and a preStop lifecycleHook on the echoserver containers that sleeps for 90 seconds. The hook ensures that echoserver won't receive the shutdown signal until after the deregistration delay expires and the target is entirely removed from the target group. This kubecon talk goes into more details about the lifecycle hook if you're curious.

There are node labels we can add that will encourage load balancer controllers to remove the node from load balancers, but I'm not sure it is necessary if the workloads running on the node to be replaced are configured properly for zero downtime.

amorey · 2021-04-20T07:49:43Z

Thanks for your quick response!

The echoserver's deployment already had 2 replicas and, as per your suggestion, I added a PodDisruptionBudget with minAvailable: 1 and a preStop sleep of 100 seconds. In addition, I added a terminationGracePeriodSeconds of 110 seconds to ensure the preStop hook would have time to complete.

Here's the script I used to curl on a loop (it also prints out a request timestamp and the response code):

while true;
do
  printf "$(date) "
  curl -I -s -o /dev/null -w "%{http_code}" <URL>
  echo ""
  sleep .1
done

And for the follow-up test I decided to try pinging a CLB endpoint as well as the original NLB endpoint both directly and through a CloudFlare proxy. Here is my echoserver yaml:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: echoserver
  labels:
    app: echoserver
spec:
  replicas: 2
  revisionHistoryLimit: 5
  selector:
    matchLabels:
      app: echoserver
  template:
    metadata:
      labels:
        app: echoserver
    spec:
      terminationGracePeriodSeconds: 110
      containers:
        - image: k8s.gcr.io/echoserver:1.10
          name: echoserver
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: 4Mi
            limits:
              memory: 20Mi
          imagePullPolicy: Always
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 100"]
---
kind: Service
apiVersion: v1
metadata:
  name: echoserver-nlb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 80
      targetPort: 8080
  selector:
    app: echoserver
---
kind: Service
apiVersion: v1
metadata:
  name: echoserver-clb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "elb"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-connection-draining-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-connection-draining-timeout: "90"
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"                       
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "3"                     
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "5"                                 
    service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "10"
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 80
      targetPort: 8080
  selector:
    app: echoserver
---
kind: PodDisruptionBudget
apiVersion: policy/v1beta1
metadata:
  name: echoserver
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: echoserver

And here are the configuration settings for the load balancers:

CLB
- Cross-zone load balancing: Enabled
- Idle timeout: 60 seconds
- Connection Draining: Enabled, 90 seconds
NLB
- Cross-zone load balancing: Enabled
- Connection termination on deregistration: Enabled
- Deregistration delay: 90 seconds

Finally, here were the results of running the kops rolling-update cluster command:

	Results
CLB-direct	`HTTP 200 OK` response codes without any slow response times
CLB-proxy	`HTTP 520 Web Server Returned an Unknown Error` response after 60 sec. request time each time kops stopped an instance
NLB-direct	`HTTP 200 OK` response code after 77 sec. request time each time kops stopped an instance
NLB-proxy	`HTTP 524 A Timeout Occurred` response code after 100 sec. request time each time kops stopped an instance

For comparison, I set up ASG lifecycle hooks to 1) drain kubernetes node instances and 2) de-register instances from their NLB target groups when an ASG termination request is received and I'm able to take down instances using the AWS CLI command autoscaling terminate-instance-in-auto-scaling-group without any downtime or slow request times (on the NLB).

Please let me know if I've misconfigured something for rolling-update zero downtime.

olemarkus · 2021-04-20T08:30:44Z

I thought it could be related to kubernetes/kubernetes#100779 but I see you are the author of that one as well

amorey · 2021-04-20T09:21:20Z

Yeah, I've been testing out various failure modes recently and that's another one I found. That issue has to do with automatic ELB de-registration when the master node instances restart so I don't think it's related but there might be some subtle connection in the kubernetes ELB controller code that I'm missing. I should add that I originally came across that issue while doing a kops rolling-update but that's only a coincidence due to the restart of the master node instance during a full rolling-update.

johngmyers · 2021-04-20T14:57:07Z

Perhaps not enough time between cordoning and termination for nodes that don't have any workload pods on them?

johngmyers · 2021-04-20T14:57:53Z

/area rolling-update

johngmyers · 2021-04-20T15:03:02Z

Perhaps we need to give the preStop and terminationGracePeriodSeconds treatment to kube-proxy.

rifelpet · 2021-04-20T15:25:21Z

The challenge with integrating that in kops is choosing an appropriate duration. It should be longer than the longest expected response times of the service + any delay in updating all relevant load balancers to no longer send new requests to the node.

It would probably be easiest to just expose it in the API. It'd likely be most appropriate in KubeProxyConfig. It would be nice if the setting was per-InstanceGroup rather than per-Cluster, but KubeProxyConfig isn't in InstanceGroupSpec.

johngmyers · 2021-04-20T15:52:01Z

First we should confirm it is the kube-proxy that is the issue.

I believe cordoning is what starts removing the node from the LB. If so, we would need a minimum time between that and termination, regardless of the time it takes to drain. We don't normally include kube-proxy in the drain and Cilium can implement its functionality without a pod, so I don't think terminationGracePeriodSeconds would help us after all.

By default we put all nodes, regardless of IG, into the LB pools. Someone would have to go out of their way to restrict the LB pools. We could put it into the RollingUpdate.

amorey · 2021-04-20T21:09:24Z

In kubernetes v1.18.17, kubectl cordon starts removing nodes from LBs but this is not the case in v1.20.5 (kubectl delete node does). The behavior for CLBs and NLBs is the same in both versions.

I think these are the steps that need to happen In order to decommission an instance gracefully:

De-register instance from the LBs
Drain the node
Close remaining network connections (either by stopping kube-proxy or letting LB close connections)
Terminate instance

Stopping kube-proxy sounds like a good way to give rolling-update control over the timing of step 3 but the LB de-registration step is still missing.

johngmyers · 2021-04-20T21:23:30Z

@amorey https://kubernetes.io/docs/concepts/architecture/nodes/ states:

Caution: kubectl cordon marks a node as 'unschedulable', which has the side effect of the service controller removing the node from any LoadBalancer node target lists it was previously eligible for, effectively removing incoming load balancer traffic from the cordoned node(s).

Is the Kubernetes documentation out of date?

amorey · 2021-04-20T21:44:04Z

Either that or maybe a regression. We might be able to figure out when/why the behavior changed by looking at the history here: https://github.com/kubernetes/kubernetes/commits/master/staging/src/k8s.io/legacy-cloud-providers/aws/aws_loadbalancer.go

johngmyers · 2021-04-21T00:00:15Z

Found in Kubernetes 1.19 release notes:

Service load balancers no longer exclude nodes marked unschedulable from the candidate nodes. The service load balancer exclusion label should be used instead.

Users upgrading from 1.18 who have cordoned nodes should set the node.kubernetes.io/exclude-from-external-load-balancers label on the impacted nodes before upgrading if they wish those nodes to remain excluded from service load balancers. (#90823, @smarterclayton) [SIG Apps, Cloud Provider and Network]

johngmyers · 2021-04-21T00:28:05Z

Filed kubernetes/website#27639

johngmyers · 2021-04-21T01:03:07Z

Partial fix in #11273

amorey · 2021-04-21T06:48:36Z

Partial fix in #11273

🚀 Wow! That's the fastest turnaround time I've seen on any open source project. Thanks!

What are you thinking about with regards to closing the active connections after the drain? For my use case, it's fine to let the LB close the connections but I need control over the the rolling-update timeout.

amorey · 2021-04-21T14:34:57Z

Here's a kubernetes blog post from today:
https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/

Apparently there's a graceful node shutdown feature that's in beta and enabled by default starting with 1.21. Rolling-update should be able to offload some work there but I'm not sure if the feature will handle LB de-registration or not.

johngmyers · 2021-04-21T16:10:57Z

I believe the intent of the 5s post-drain delay is to give the node time to send FIN packets for all connections to/from the terminated pods.

You bring up the point that we need something to give the node time to send FIN packets for connections to/from kube-proxy. There's also the issue that we might need additional time for the load balancers to notice the node has been removed from the pool (especially if there are no pods to drain) and we might want to give additional time for outstanding requests through kube-proxy to complete.

As for Graceful Node Shutdown:

I don't see how to get Graceful Node Shutdown to respect PodDisruptionBudgets. Even if it could, I'd still like to have kops rolling update run the drain of non-DaemonSet pods.

We should make sure that Graceful Node Shutdown is appropriately hooked up in systemd, but that's another ticket.

We might need to tweak the kube-proxy pod to take advantage of Graceful Node Termination.

I would like to move the deletion of the node from the Kubernetes API to the node, but that appears to be explicitly disallowed by the Node Authorizer. I don't know the reason they did that.

fejta-bot · 2021-07-22T16:54:00Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

k8s-triage-robot · 2021-08-21T17:25:41Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

amorey · 2021-08-23T07:05:06Z

/remove-lifecycle rotten

k8s-triage-robot · 2021-11-21T07:18:43Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

johngmyers · 2021-11-21T18:53:01Z

/remove-lifecycle stale

k8s-triage-robot · 2022-02-19T19:36:08Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-03-21T20:15:43Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

olemarkus · 2022-03-23T07:28:06Z

/remove-lifecycle rotten

I think this one have been solved now with #12902 and #12966

/close

k8s-ci-robot · 2022-03-23T07:28:25Z

@olemarkus: Closing this issue.

In response to this:

/remove-lifecycle rotten

I think this one have been solved now with #12902 and #12966

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the area/rolling-update label Apr 20, 2021

johngmyers added kind/office-hours and removed kind/office-hours labels Apr 23, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 22, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 21, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 23, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 21, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 21, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 19, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 21, 2022

k8s-ci-robot closed this as completed Mar 23, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kops rolling-update doesn't de-register instances from ELB network load balancer gracefully #11256

kops rolling-update doesn't de-register instances from ELB network load balancer gracefully #11256

amorey commented Apr 18, 2021

rifelpet commented Apr 18, 2021

amorey commented Apr 20, 2021

olemarkus commented Apr 20, 2021

amorey commented Apr 20, 2021 •

edited

Loading

johngmyers commented Apr 20, 2021

johngmyers commented Apr 20, 2021

johngmyers commented Apr 20, 2021

rifelpet commented Apr 20, 2021

johngmyers commented Apr 20, 2021

amorey commented Apr 20, 2021 •

edited

Loading

johngmyers commented Apr 20, 2021 •

edited

Loading

amorey commented Apr 20, 2021 •

edited

Loading

johngmyers commented Apr 21, 2021

johngmyers commented Apr 21, 2021

johngmyers commented Apr 21, 2021

amorey commented Apr 21, 2021

amorey commented Apr 21, 2021 •

edited

Loading

johngmyers commented Apr 21, 2021

fejta-bot commented Jul 22, 2021

k8s-triage-robot commented Aug 21, 2021

amorey commented Aug 23, 2021

k8s-triage-robot commented Nov 21, 2021

johngmyers commented Nov 21, 2021

k8s-triage-robot commented Feb 19, 2022

k8s-triage-robot commented Mar 21, 2022

olemarkus commented Mar 23, 2022

k8s-ci-robot commented Mar 23, 2022

kops rolling-update doesn't de-register instances from ELB network load balancer gracefully #11256

kops rolling-update doesn't de-register instances from ELB network load balancer gracefully #11256

Comments

amorey commented Apr 18, 2021

rifelpet commented Apr 18, 2021

amorey commented Apr 20, 2021

olemarkus commented Apr 20, 2021

amorey commented Apr 20, 2021 • edited Loading

johngmyers commented Apr 20, 2021

johngmyers commented Apr 20, 2021

johngmyers commented Apr 20, 2021

rifelpet commented Apr 20, 2021

johngmyers commented Apr 20, 2021

amorey commented Apr 20, 2021 • edited Loading

johngmyers commented Apr 20, 2021 • edited Loading

amorey commented Apr 20, 2021 • edited Loading

johngmyers commented Apr 21, 2021

johngmyers commented Apr 21, 2021

johngmyers commented Apr 21, 2021

amorey commented Apr 21, 2021

amorey commented Apr 21, 2021 • edited Loading

johngmyers commented Apr 21, 2021

fejta-bot commented Jul 22, 2021

k8s-triage-robot commented Aug 21, 2021

amorey commented Aug 23, 2021

k8s-triage-robot commented Nov 21, 2021

johngmyers commented Nov 21, 2021

k8s-triage-robot commented Feb 19, 2022

k8s-triage-robot commented Mar 21, 2022

olemarkus commented Mar 23, 2022

k8s-ci-robot commented Mar 23, 2022

amorey commented Apr 20, 2021 •

edited

Loading

amorey commented Apr 20, 2021 •

edited

Loading

johngmyers commented Apr 20, 2021 •

edited

Loading

amorey commented Apr 20, 2021 •

edited

Loading

amorey commented Apr 21, 2021 •

edited

Loading