Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

externalTrafficPolicy: Local with Type: LoadBalancer AWS NLB health checks failing #80579

Closed
denmaddog opened this issue Jul 25, 2019 · 10 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/unresolved Indicates an issue that can not or will not be resolved.

Comments

@denmaddog
Copy link

denmaddog commented Jul 25, 2019

What happened:
Hello,
I have a service with valid endpoints. Configured as:

> Type: LoadBalancer
> externalTrafficPolicy: Local
> HealthCheck NodePort:     32181

It creates AWS ELB, with healthchecks pointing at: HTTP:32181/healthz.
From node on which POD resides when I do:

> curl -I localhost:32181/healthz
> HTTP/1.1 503 Service Unavailable
kubectl describe svc ws-zeljko-milic-xport-frontend -n zeljko-milic
Name:                     ws-zeljko-milic-xport-frontend
Namespace:                zeljko-milic
Labels:                   app=ws-zeljko-milic-xport
                          tier=xport-frontend
Annotations:              service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
                          service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: *
Selector:                 app=ws-zeljko-milic-xport,tier=xport-frontend
Type:                     LoadBalancer
IP:                       100.66.91.101
LoadBalancer Ingress:     5d0afacdce57c-683284729.eu-west-1.elb.amazonaws.com
Port:                     https  443/TCP
TargetPort:               443/TCP
NodePort:                 https  31053/TCP
Endpoints:                100.102.0.0:443,100.104.0.0:443,100.122.0.2:443
Port:                     http  80/TCP
TargetPort:               80/TCP
NodePort:                 http  32049/TCP
Endpoints:                100.102.0.0:80,100.104.0.0:80,100.122.0.2:80
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     32181
Events:                   <none>
in kube-proxy log:
1 healthcheck.go:151] Opening healthcheck "zeljko-milic/ws-zeljko-milic-xport-frontend" on port 32181

also with curl on node:

curl  localhost:32181/healthz
{
        "service": {
                "namespace": "zeljko-milic",
                "name": "ws-zeljko-milic-xport-frontend"
        },
        "localEndpoints": 0

I guess 503 is due: "localEndpoints": 0, but I have no idea why it says 0, when service itself has endpoints.
any idea?
is it even possible to have this working with AWS?
It work when External Traffic Policy: Cluster ... but I need Local to preserve source IP if possible
but for some reason AWS NLB health checks are failing

What you expected to happen:
I expect health checks to work, target instances in AWS NLB to be healthy.

How to reproduce it (as minimally and precisely as possible):

apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: default
  labels:
    app: nginx
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  externalTrafficPolicy: Local
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
type: LoadBalancer

Anything else we need to know?:

Environment:
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.9", GitCommit:"e09f5c40b55c91f681a46ee17f9bc447eeacee57", GitTreeState:"clean", BuildDate:"2019-05-27T16:08:57Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.10", GitCommit:"e3c134023df5dea457638b614ee17ef234dc34a6", GitTreeState:"clean", BuildDate:"2019-07-08T03:40:54Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

Linux ip-10-11-1-16 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1 (2019-04-12) x86_64 GNU/Linux

  • Install tools:
  • Network plugin and version (if this is a network-related bug): weave (default with KOPS install)
  • Others: KOPS 1.12.2

/sig cloud-provider
/sig network

@denmaddog denmaddog added the kind/bug Categorizes issue or PR as related to a bug. label Jul 25, 2019
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 25, 2019
@athenabot
Copy link

/triage unresolved

Comment /remove-triage unresolved when the issue is assessed and confirmed.

🤖 I am a bot run by vllry. 👩‍🔬

@k8s-ci-robot k8s-ci-robot added the triage/unresolved Indicates an issue that can not or will not be resolved. label Jul 25, 2019
@vllry
Copy link
Contributor

vllry commented Jul 25, 2019

/sig aws

@andrewsykim
Copy link
Member

/assign @nckturner @mcrute @jaypipes

@k8s-ci-robot
Copy link
Contributor

@andrewsykim: GitHub didn't allow me to assign the following users: jaypipes.

Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @nckturner @mcrute @jaypipes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@denmaddog
Copy link
Author

Hello,
In iptables I see this:

-A KUBE-XLB-2FPSARVCBUCVHR5K -m comment --comment "zeljko-milic/ws-zeljko-milic-xport-frontend:http has no local endpoints" -j KUBE-MARK-DROP
-A KUBE-XLB-4HYDKK5QZMVM56HW -m comment --comment "zeljko-milic/ws-zeljko-milic-xport-frontend:https has no local endpoints" -j KUBE-MARK-DROP

@denmaddog
Copy link
Author

My node name is:

kubectl get nodes|grep ip-10-11-1-71
ip-10-11-1-71.eu-west-1.compute.internal

But my node hostname is:

curl http://169.254.169.254/latest/meta-data/local-hostname
ip-10-11-1-71.cbts.all

And kube-proxy is running with flag:
--hostname-override=ip-10-11-1-71.cbts.all
When I change to
--hostname-override=ip-10-11-1-71.eu-west-1.compute.internal
and restart kube-proxy pod, everything is working as expected.

I guess that code is somewehere expecting hostname and node name to be the same in order for health check to work/has local endpoints. Can this be fixed?

Is there a way I can configure kube-proxy via KOPS to have same hostname and node name?

Best,

Zeljko

@andrewsykim
Copy link
Member

andrewsykim commented Jul 27, 2019

Hi @denmaddog! This is actually a known limitation where the AWS cloud provider does not allow for --hostname-override, see #54482 for more details. Maybe this needs to be documented better @jaypipes @mcrute @nckturner @justinsb ?

@andrewsykim
Copy link
Member

/close

Closing this for now since it's a known issue, feel free to re-open @denmaddog if needed

@k8s-ci-robot
Copy link
Contributor

@andrewsykim: Closing this issue.

In response to this:

/close

Closing this for now since it's a known issue, feel free to re-open @denmaddog if needed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@thakkarnirav
Copy link

thakkarnirav commented May 23, 2022

We followed this: https://aws.amazon.com/premiumsupport/knowledge-center/eks-troubleshoot-unhealthy-targets-nlb/ and it resolved the issue after following

A Network Load Balancer with the [externalTrafficPolicy is set to Local](https://github.com/kubernetes/kubernetes/issues/61486) (from the Kubernetes website), with a custom Amazon VPC DNS on the DHCP options set. To resolve this issue, patch kube-proxy with the hostname override flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/unresolved Indicates an issue that can not or will not be resolved.
Projects
None yet
Development

No branches or pull requests

8 participants