New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS NLB with externalTrafficPolicy: Local: Health check defaults take 90s to remove node #73362
Comments
@kubernetes/sig-aws-bugs |
@kellycampbell: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I'm also curious if anyone has good recommendations for how to schedule pods with |
/assign @M00nF1sh |
One solution I figured out to avoid dropping new connections during deployment updates was to setup two deployments tha can be updated in an A/B fashion. This works because the local traffic still goes through the kube-proxy so it can load balance across two pods on the same node. For a small cluster, I use a DaemonSet, but for a larger cluster, scheduling policy would need to be set such that the deployed pods end up on the same physical nodes. Then the Service's selector selects all pods from A and B, and the deploy or daemonsets have a selector for only their pods. When you need to rollout an update, you would create set B, let those pods start up fully, then you can stop set A. example of the DaemonSet's: (replace the '-a' with '-b' for the B deployment. ---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ambassador-dev-a
spec:
selector:
matchLabels:
ab-ambassador: ambassador-dev-a
template:
labels:
k8s-app: ambassador-dev
ab-ambassador: ambassador-dev-a and the Service selector is like this: ---
apiVersion: v1
kind: Service
metadata:
name: ambassador-dev
selector:
k8s-app: ambassador-dev |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
/sig cloud-provider |
I am trying to solve the same problem using a solution along those lines. One can use pod affinity rules to ensure the new pods are scheduled on the same nodes as the old pods, and one can set the deployment strategy to surge to ensure a rolling update creates new pods before deleting old pods. The remaining piece then is to ensure the old pods are selected for deletion in the desired order (i.e., we want the scale-down of the old pods at each step to delete a pod that is on the same node as a new pod), to which end I made #80004, which changes the ReplicaSet controller to prefer pods that are colocated with pods from a related ReplicaSet when choosing pods for deletion during scale-down. I'd appreciate any feedback on that PR. |
Your solution sounds similar to the A-B strategy above except automated instead of manual. #73362 (comment) I'm wondering if it would be good to formalize the requirements a bit more, e.g. adding some specific metadata or option to Deployments to provide this kind of A/B strategy when replacing existing pods. Or maybe it can be done within affinity, or pdb's and the scheduler such that if a node already has a healthy pod, it must maintain at least one healthy pod on that node unless a full drain is started. Another thought I had about this is what if the kube-proxy's health check for that service would start returning failed before the last pod on a node is completely drained such that leftover traffic can still get handled for some amount of time before killing the pod? Edit: one issue I can think of with the solution you have (actually, would be problem for any A/B on the same node) is it makes re-balancing a cluster harder. E.g. if you want a rolling update to schedule to lightly loaded nodes first. |
…Local This is a fix for issue kubernetes#73362
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
…Local This is a fix for issue kubernetes#73362
What happened:
A Service with
externalTrafficPolicy: Local
and AWS NLB load balancer is very slow to recognize that a node fails health checks.I created a DaemonSet and a Service with
When one of the pods is restarted, the NLB still allows traffic to it (which is dropped causing connect timeout) for a minimum of 3 * 30s with the default health check settings.
What you expected to happen:
Preferably, the load balancer could recognize the failed node or pod sooner.
The minimum settings for NLB are documented here: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html
Environment:
kubectl version
): 1.11.7uname -a
): 4.9.0-7-amd64The text was updated successfully, but these errors were encountered: