-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kops rolling-update doesn't de-register instances from ELB network load balancer gracefully #11256
Comments
I think the 90 seconds of dropped requests are due to the normal lifecycle of the pod and aren't specific to If so you can avoid this by ensuring the echoserver's deployment replicas > 1, a PodDisruptionBudget matching the deployment pods with There are node labels we can add that will encourage load balancer controllers to remove the node from load balancers, but I'm not sure it is necessary if the workloads running on the node to be replaced are configured properly for zero downtime. |
Thanks for your quick response! The echoserver's deployment already had 2 replicas and, as per your suggestion, I added a PodDisruptionBudget with Here's the script I used to curl on a loop (it also prints out a request timestamp and the response code): while true;
do
printf "$(date) "
curl -I -s -o /dev/null -w "%{http_code}" <URL>
echo ""
sleep .1
done And for the follow-up test I decided to try pinging a CLB endpoint as well as the original NLB endpoint both directly and through a CloudFlare proxy. Here is my echoserver yaml: kind: Deployment
apiVersion: apps/v1
metadata:
name: echoserver
labels:
app: echoserver
spec:
replicas: 2
revisionHistoryLimit: 5
selector:
matchLabels:
app: echoserver
template:
metadata:
labels:
app: echoserver
spec:
terminationGracePeriodSeconds: 110
containers:
- image: k8s.gcr.io/echoserver:1.10
name: echoserver
ports:
- containerPort: 8080
resources:
requests:
memory: 4Mi
limits:
memory: 20Mi
imagePullPolicy: Always
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 100"]
---
kind: Service
apiVersion: v1
metadata:
name: echoserver-nlb
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: 8080
selector:
app: echoserver
---
kind: Service
apiVersion: v1
metadata:
name: echoserver-clb
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "elb"
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-connection-draining-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-connection-draining-timeout: "90"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "3"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "5"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "10"
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: 8080
selector:
app: echoserver
---
kind: PodDisruptionBudget
apiVersion: policy/v1beta1
metadata:
name: echoserver
spec:
minAvailable: 1
selector:
matchLabels:
app: echoserver And here are the configuration settings for the load balancers:
Finally, here were the results of running the
For comparison, I set up ASG lifecycle hooks to 1) drain kubernetes node instances and 2) de-register instances from their NLB target groups when an ASG termination request is received and I'm able to take down instances using the AWS CLI command Please let me know if I've misconfigured something for |
I thought it could be related to kubernetes/kubernetes#100779 but I see you are the author of that one as well |
Yeah, I've been testing out various failure modes recently and that's another one I found. That issue has to do with automatic ELB de-registration when the master node instances restart so I don't think it's related but there might be some subtle connection in the kubernetes ELB controller code that I'm missing. I should add that I originally came across that issue while doing a |
Perhaps not enough time between cordoning and termination for nodes that don't have any workload pods on them? |
/area rolling-update |
Perhaps we need to give the |
The challenge with integrating that in kops is choosing an appropriate duration. It should be longer than the longest expected response times of the service + any delay in updating all relevant load balancers to no longer send new requests to the node. It would probably be easiest to just expose it in the API. It'd likely be most appropriate in KubeProxyConfig. It would be nice if the setting was per-InstanceGroup rather than per-Cluster, but KubeProxyConfig isn't in InstanceGroupSpec. |
First we should confirm it is the kube-proxy that is the issue. I believe cordoning is what starts removing the node from the LB. If so, we would need a minimum time between that and termination, regardless of the time it takes to drain. We don't normally include kube-proxy in the drain and Cilium can implement its functionality without a pod, so I don't think By default we put all nodes, regardless of IG, into the LB pools. Someone would have to go out of their way to restrict the LB pools. We could put it into the |
In kubernetes v1.18.17, I think these are the steps that need to happen In order to decommission an instance gracefully:
Stopping kube-proxy sounds like a good way to give rolling-update control over the timing of step 3 but the LB de-registration step is still missing. |
@amorey https://kubernetes.io/docs/concepts/architecture/nodes/ states:
Is the Kubernetes documentation out of date? |
Either that or maybe a regression. We might be able to figure out when/why the behavior changed by looking at the history here: https://github.com/kubernetes/kubernetes/commits/master/staging/src/k8s.io/legacy-cloud-providers/aws/aws_loadbalancer.go |
Found in Kubernetes 1.19 release notes:
|
Filed kubernetes/website#27639 |
Partial fix in #11273 |
🚀 Wow! That's the fastest turnaround time I've seen on any open source project. Thanks! What are you thinking about with regards to closing the active connections after the drain? For my use case, it's fine to let the LB close the connections but I need control over the the rolling-update timeout. |
Here's a kubernetes blog post from today: Apparently there's a graceful node shutdown feature that's in beta and enabled by default starting with 1.21. Rolling-update should be able to offload some work there but I'm not sure if the feature will handle LB de-registration or not. |
I believe the intent of the 5s post-drain delay is to give the node time to send FIN packets for all connections to/from the terminated pods. You bring up the point that we need something to give the node time to send FIN packets for connections to/from kube-proxy. There's also the issue that we might need additional time for the load balancers to notice the node has been removed from the pool (especially if there are no pods to drain) and we might want to give additional time for outstanding requests through kube-proxy to complete. As for Graceful Node Shutdown: I don't see how to get Graceful Node Shutdown to respect We should make sure that Graceful Node Shutdown is appropriately hooked up in systemd, but that's another ticket. We might need to tweak the kube-proxy pod to take advantage of Graceful Node Termination. I would like to move the deletion of the node from the Kubernetes API to the node, but that appears to be explicitly disallowed by the Node Authorizer. I don't know the reason they did that. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
@olemarkus: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1. What
kops
version are you running? The commandkops version
, will displaythis information.
1.20.0
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.1.20.5
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops rolling-update cluster <cluster-name> --instance-group <node-group-name> --yes --force
5. What happened after the commands executed?
Before running
kops rolling-update
, I ran a script in a loop to make HTTP requests to an NLB endpoint sitting in front of an echo server. At the point whenkops rolling-update
reported that it was stopping an instance, the HTTP requests started hanging and then recovered after ~90 seconds.6. What did you expect to happen?
I expected the HTTP requests to continue to be handled successfully.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?
I believe this issue is due to the fact that
kops rolling-update
detaches instances from their Auto Scaling groups without de-registering them from their NLB target groups first. My targetgroup has a de-registration delay of 90 seconds so this might explain the 90 second recovery time.The text was updated successfully, but these errors were encountered: