Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: service.alpha.kubernetes.io/external-traffic: OnlyLocal not working with ELB (a.k.a. "preserve source IP") #35187

Closed
hjacobs opened this issue Oct 20, 2016 · 8 comments
Assignees
Labels
area/nodecontroller sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@hjacobs
Copy link

hjacobs commented Oct 20, 2016

Is this a BUG REPORT or FEATURE REQUEST?: BUG REPORT

Kubernetes version: 1.4.0

Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"a16c0a7f71a6f93c7e0f222d961f4675cd97a46b", GitTreeState:"clean", BuildDate:"2016-09-26T18:16:57Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0+coreos.2", GitCommit:"672d0ab602ada99c100e7f18ecbbdcea181ef008", GitTreeState:"clean", BuildDate:"2016-09-30T05:49:34Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): CoreOS
  • Kernel (e.g. uname -a): 4.7.0-coreos

What happened:

Trying out the alpha "preserve source IP" feature (see #19754 and http://kubernetes.io/docs/user-guide/load-balancer/) which does not seem to work on AWS. The "local only" rule is not applied and traffic is still forwarded to all pods (i.e. to different nodes even across AZs).

What you expected to happen:

Iptables rules on nodes should route ELB traffic to local pods only if OnlyLocal is specified.

How to reproduce it (as minimally and precisely as possible):

  • enable Alpha features on all components -feature-gates=AllAlpha=true
  • create a cluster on AWS with at least two worker nodes (EC2 instances)
  • deploy a replica set with at least two pods spread across worker nodes
  • create a service type LoadBalancer with the annotation service.alpha.kubernetes.io/external-traffic: OnlyLocal
  • observe the local iptables rules on the worker nodes (sudo iptables-save): a KUBE-XLB-* chain will be created, but no rule is using it (jumping to it with -j).

Anything else do we need to know:

This issue seems to be related to aws.go filling LoadBalancerStatus.Ingress.Hostname and proxier.go expecting LoadBalancerStatus.Ingress.IP. As the AWS ELB only provides a hostname (IPs change regularly), there is no easy fix.

@hjacobs
Copy link
Author

hjacobs commented Oct 21, 2016

Routing to local pods in the same AZ (when running multi-AZ cluster) is important on AWS also for cost efficiency (EC2 cross-AZ traffic costs 0.02 USD / GB): see the AWS Data Transfer Costs Overview Figure

@bprashanth
Copy link
Contributor

This isn't supposed to work in 1.4. we only put in the code that handles this for node port services in 1.5 ( ie head right now). The example I gave in 35758 should work, but note that that's just a simple node port service to keep your traffic local. does elb actually preserve src ip or do we need proxy protocol?

@blueyed
Copy link

blueyed commented May 17, 2017

This seems to work by now (Kubernetes 1.5.7).
(isLocal is determined by comparing addr.NodeName with hostname however - which might not be correct as per kubernetes/kops#2584 (comment)).

@thockin thockin added the sig/network Categorizes an issue or PR as relevant to SIG Network. label May 19, 2017
@thockin
Copy link
Member

thockin commented May 19, 2017

This just can't work on ELB

@thockin thockin closed this as completed May 19, 2017
@blueyed
Copy link

blueyed commented May 19, 2017

@thockin
Can you elaborate please?
I seem to have gotten this to work (after fixing/working around kubernetes/kops#2584 (comment)), but maybe not as expected?

@varsharaja
Copy link

@thockin
Iam using Kubernetes 1.5.5 and an AWS ELB. After enabling the beta feature for external-traffic = OnlyLocal, the ELB reports that the instances are out of services( we are actually running healthcheck to validate our service is running). Iam also running the services as deamonset so that all nodes have the service we require.
Is there something Iam missing or have to fix ?

@sergeohl
Copy link

sergeohl commented Jan 4, 2018

I am using kubernetes 1.8.5 and I have the same issue.

I use this service.

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
  labels:
    app: http-test
  name: http-test
spec:
  externalTrafficPolicy: Local
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: http
  selector:
    app: http-test
  type: LoadBalancer

The endpoint is ok and the healthcheck port are change on kubernetes and on ELB. But the health check return there is no endpoint:

curl http://ip-10-6-23-73.ec2.internal:32583
{
	"service": {
		"namespace": "default",
		"name": "http-test"
	},
	"localEndpoints": 0
}

Even if the endpoint are ok

kubectl get ep
NAME              ENDPOINTS                                      AGE
http-test         100.108.94.168:80,100.119.47.34:80             17h

Do have any difference between endpoint and localendpoint ?

@toredash
Copy link

I can confirm that I see the same behaviour as @sergeohl, difference is that I'm running:
openshift v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62

The previous beta spec service.beta.kubernetes.io/external-traffic (https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#feature-availability ) worked fine on previous versions of OpenShift:
openshift v3.6.1+008f2d5
kubernetes v1.6.1+5115d708d7

The difference between 1.7 and 1.6 is that 1.7 will create a health check on the ELB pointing to another nodePort with HTTP:/healthz. That location is not created by kubernetes (as it should).

In 1.6, k8s would drop traffic on nodes not running a pod for that nodePort. Hence the ELB would not send traffic to nodes that didn't have a pod running backing that service.

On 1.6 the ELBs health check is pointing to the nodePort of the pod using TCP.

Comparing output from same k8s service from 1.6 and 1.7, and also ELB output:

1.6

$ oc get svc socketcluster -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:eu-central-1:X:certificate/X
    service.beta.kubernetes.io/external-traffic: OnlyLocal
    service.beta.kubernetes.io/healthcheck-nodeport: "30173"
  creationTimestamp: 2018-01-04T12:09:24Z
  labels:
    app: X
  name: X
  namespace: X
  resourceVersion: "22956818"
  selfLink: /api/v1/namespaces/X/services/X
  uid: X
spec:
  clusterIP: 172.30.103.134
  ports:
  - name: X
    nodePort: 32337
    port: 443
    protocol: TCP
    targetPort: 8000
  selector:
    app: X
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - hostname: X.eu-central-1.elb.amazonaws.com

aws> elb describe-load-balancers --load-balancer-names X --region eu-central-1 | jq '.LoadBalancerDescriptions[].HealthCheck'
{
  "HealthyThreshold": 2,
  "Interval": 10,
  "Target": "TCP:32337",
  "Timeout": 5,
  "UnhealthyThreshold": 6
}
aws> elb describe-instance-health --load-balancer-name X --region eu-central-1 | jq '.InstanceStates[].State' | sort | uniq -c
   21 "InService"
   3 "OutOfService"

The OutOfService is expected since they are the 3x Infra Nodes in OpenShift.

1.7

$ oc get svc Y -o yaml 
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-west-2:Y:certificate/Y
  creationTimestamp: 2018-01-30T10:38:29Z
  labels:
    app: Y
  name: Y
  namespace: Y
  resourceVersion: "6426893"
  selfLink: /api/v1/namespaces/Y/services/Y
  uid: b60f0a53-05a9-11e8-aeed-06a7993e720a
spec:
  clusterIP: 172.30.85.206
  externalTrafficPolicy: Local
  healthCheckNodePort: 30700
  ports:
  - name: Y
    nodePort: 31387
    port: 443
    protocol: TCP
    targetPort: 8000
  selector:
    app: Y
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - hostname: Y.us-west-2.elb.amazonaws.com

aws> elb describe-load-balancers --load-balancer-names Y --region us-west-2 | jq '.LoadBalancerDescriptions[].HealthCheck'
{
  "HealthyThreshold": 2,
  "Interval": 10,
  "Target": "HTTP:30700/healthz",
  "Timeout": 5,
  "UnhealthyThreshold": 6
}
aws> elb describe-instance-health --load-balancer-name Y --region us-west-2 | jq '.InstanceStates[].State' | sort | uniq -c
  24 "OutOfService"

If I change the healthcheck on the ELB created by kubernetes 1.7 like the following, it works as expected like it did in 1.6:

aws> elb configure-health-check --load-balancer-name Y --health-check "Target=TCP:31387,Timeout=5,Interval=10,UnhealthyThreshold=6,HealthyThreshold=2" --region us-west-2
{
    "HealthCheck": {
        "HealthyThreshold": 2, 
        "Interval": 10, 
        "Target": "TCP:31387", 
        "Timeout": 5, 
        "UnhealthyThreshold": 6
    }
}
aws> elb describe-instance-health --load-balancer-name Y --region us-west-2 | jq '.InstanceStates[].State' | sort | uniq -c
  21 "InService"
   3 "OutOfService"

@thockin could you please consider reopening the issue based on this information?

@sergeohl did you find a permanent workaround ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/nodecontroller sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

9 participants