Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS NLB with externalTrafficPolicy: Local: Health check defaults take 90s to remove node #73362

Closed
kellycampbell opened this issue Jan 27, 2019 · 12 comments · Fixed by #73363
Closed
Assignees
Labels
area/provider/aws Issues or PRs related to aws provider kind/bug Categorizes issue or PR as related to a bug. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider.

Comments

@kellycampbell
Copy link

What happened:

A Service with externalTrafficPolicy: Local and AWS NLB load balancer is very slow to recognize that a node fails health checks.

I created a DaemonSet and a Service with

metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  type: LoadBalancer
  externalTrafficPolicy: Local

When one of the pods is restarted, the NLB still allows traffic to it (which is dropped causing connect timeout) for a minimum of 3 * 30s with the default health check settings.

What you expected to happen:

Preferably, the load balancer could recognize the failed node or pod sooner.

The minimum settings for NLB are documented here: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html

Environment:

  • Kubernetes version (use kubectl version): 1.11.7
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 9 (stretch)
  • Kernel (e.g. uname -a): 4.9.0-7-amd64
  • Install tools: kops 1.11.0
  • Others:
@kellycampbell kellycampbell added the kind/bug Categorizes issue or PR as related to a bug. label Jan 27, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 27, 2019
@kellycampbell
Copy link
Author

@kubernetes/sig-aws-bugs

@k8s-ci-robot k8s-ci-robot added sig/aws and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 27, 2019
@k8s-ci-robot
Copy link
Contributor

@kellycampbell: Reiterating the mentions to trigger a notification:
@kubernetes/sig-aws-bugs

In response to this:

@kubernetes/sig-aws-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kellycampbell
Copy link
Author

I'm also curious if anyone has good recommendations for how to schedule pods with externalTrafficPolicy: Local such that traffic isn't dropped. I'm thinking there could be a way using a Deployment with pod anti-affinity and rolling update params such that a new pod is brought up on each node before shutting down the old pod. If that's too complex, then I suppose a custom controller could be made for this purpose.

@d-nishi
Copy link

d-nishi commented Feb 5, 2019

/assign @M00nF1sh

@kellycampbell
Copy link
Author

kellycampbell commented Feb 5, 2019

One solution I figured out to avoid dropping new connections during deployment updates was to setup two deployments tha can be updated in an A/B fashion. This works because the local traffic still goes through the kube-proxy so it can load balance across two pods on the same node.

For a small cluster, I use a DaemonSet, but for a larger cluster, scheduling policy would need to be set such that the deployed pods end up on the same physical nodes.

Then the Service's selector selects all pods from A and B, and the deploy or daemonsets have a selector for only their pods. When you need to rollout an update, you would create set B, let those pods start up fully, then you can stop set A.

example of the DaemonSet's: (replace the '-a' with '-b' for the B deployment.

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ambassador-dev-a
spec:
  selector:
    matchLabels:
      ab-ambassador: ambassador-dev-a
  template:
      labels:
        k8s-app: ambassador-dev
        ab-ambassador: ambassador-dev-a

and the Service selector is like this:

---
apiVersion: v1
kind: Service
metadata:
  name: ambassador-dev
  selector:
    k8s-app: ambassador-dev

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 6, 2019
@kellycampbell
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 28, 2019
@kellycampbell kellycampbell changed the title AWS NLB with externalTrafficPolicy: Local: Health check defaults take min 90s to remove node AWS NLB with externalTrafficPolicy: Local: Health check defaults take 90s to remove node May 28, 2019
@k8s-ci-robot k8s-ci-robot added area/provider/aws Issues or PRs related to aws provider needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. and removed sig/aws labels Aug 6, 2019
@nikhita
Copy link
Member

nikhita commented Aug 6, 2019

/sig cloud-provider

@k8s-ci-robot k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 6, 2019
@Miciah
Copy link
Contributor

Miciah commented Aug 6, 2019

@kellycampbell,

I'm also curious if anyone has good recommendations for how to schedule pods with externalTrafficPolicy: Local such that traffic isn't dropped. I'm thinking there could be a way using a Deployment with pod anti-affinity and rolling update params such that a new pod is brought up on each node before shutting down the old pod.

I am trying to solve the same problem using a solution along those lines. One can use pod affinity rules to ensure the new pods are scheduled on the same nodes as the old pods, and one can set the deployment strategy to surge to ensure a rolling update creates new pods before deleting old pods. The remaining piece then is to ensure the old pods are selected for deletion in the desired order (i.e., we want the scale-down of the old pods at each step to delete a pod that is on the same node as a new pod), to which end I made #80004, which changes the ReplicaSet controller to prefer pods that are colocated with pods from a related ReplicaSet when choosing pods for deletion during scale-down. I'd appreciate any feedback on that PR.

@kellycampbell
Copy link
Author

kellycampbell commented Aug 6, 2019

@Miciah

Your solution sounds similar to the A-B strategy above except automated instead of manual. #73362 (comment)

I'm wondering if it would be good to formalize the requirements a bit more, e.g. adding some specific metadata or option to Deployments to provide this kind of A/B strategy when replacing existing pods. Or maybe it can be done within affinity, or pdb's and the scheduler such that if a node already has a healthy pod, it must maintain at least one healthy pod on that node unless a full drain is started.

Another thought I had about this is what if the kube-proxy's health check for that service would start returning failed before the last pod on a node is completely drained such that leftover traffic can still get handled for some amount of time before killing the pod?

Edit: one issue I can think of with the solution you have (actually, would be problem for any A/B on the same node) is it makes re-balancing a cluster harder. E.g. if you want a rolling update to schedule to lightly loaded nodes first.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 4, 2019
@frittentheke
Copy link

/remove-lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/aws Issues or PRs related to aws provider kind/bug Categorizes issue or PR as related to a bug. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants