Kubernets service not distributing traffic in equally , seeing imbalance in traffic . #125013

uttam-phygitalz · 2024-05-21T08:02:03Z

What happened?

We are seeing traffic is not in balancing among ingress controller replicas when replica count gets higher .
We have set HPA like 40 as Maximum replicas and when the load test happen the HPA get triggered and spawn new replicas but the load is not evenly distributed even though resources are available . . PFB the screenshot .

It is deployed in AWS NLB . There is not long-lived connection preset , all are new connections being hit .

description of ingress

│ Labels: app=ingress-nginx-external-nlb │ │ app.kubernetes.io/managed-by=Helm │ Annotations: helm.sh/resource-policy: keep │ │ │ service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: │ │ │ service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp │ │ service.beta.kubernetes.io/aws-load-balancer-connection-draining-enabled: true │ │ service.beta.kubernetes.io/aws-load-balancer-connection-draining-timeout: 60 │ │ service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 300 │ │ service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: true │ │ service.beta.kubernetes.io/aws-load-balancer-extra-security-groups: sg-0116assa519f2f2aa1fe8c │ │ service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip │ │ service.beta.kubernetes.io/aws-load-balancer-type: nlb │ │ Selector: app=ingress-nginx-external │ │ Type: LoadBalancer │ │ IP Family Policy: SingleStack │ │ IP Families: IPv4 │ │ IP: 172.20.189.13 │ │ IPs: 172.20.189.13 │ │ LoadBalancer Ingress: a47c0fada1425caa057592-76e4445441da70fa.elb.us-west-2.amazonaws.com │ │ Port: https 443/TCP │ │ TargetPort: 443/TCP │ │ NodePort: https 31411/TCP │ │ Endpoints: 100.64.165.237:443,100.65.173.35:443,100.64.244.118:443 │ │ Session Affinity: None │ │ External Traffic Policy: Local │ │ HealthCheck NodePort: 31286 │ │ Events: │ │ Type Reason Age From Message │ │ ---- ------ ---- ---- ------- │ │ Normal UpdatedLoadBalancer 16m (x163 over 2d17h) service-controller Updated load balancer with new hosts

What did you expect to happen?

The traffic should distributed among all replicas evenly or somewhere near to that .Not like totally imbalanced way .

How can we reproduce it (as minimally and precisely as possible)?

Deploy Ingress controllers
set the HPA for ingress controller ,like min 3 and max 40 .
perform the load test .

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.8-eks-adc7111

Cloud provider

AWS

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Rocky linux / alpine

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-05-21T08:02:13Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

T-Lakshmi · 2024-05-21T08:38:33Z

/sig network
/sig cloud-provider

adrianmoisey · 2024-05-21T09:36:39Z

Are your pods equally spread across your nodes?

We noticed a similar problem, and our issue was that some nodes had more ingress-nginx pods than others, so each node would distribute the traffic it received amongst the pods hosted on itself.

uttam-phygitalz · 2024-05-21T12:52:40Z

Hi @adrianmoisey , Yeah could see its spread over different nodes . each node has one replica pod running .

adrianmoisey · 2024-05-21T13:38:47Z

Hi @adrianmoisey , Yeah could see its spread over different nodes . each node has one replica pod running .

And just to confirm, when you are scaled up (to 40 pods), you have and equal spread of pods to nodes?

uttam-phygitalz · 2024-05-21T14:06:22Z

Hi @adrianmoisey , Yeah could see its spread over different nodes . each node has one replica pod running .

And just to confirm, when you are scaled up (to 40 pods), you have and equal spread of pods to nodes?

Yes correct .. Its being equal spread to nodes .. Each node has one replica running ..

aojea · 2024-05-21T14:10:28Z

You need to test from inside the cluster and from outside, to investigate if is a loadbalancer problem or a kubernetes problem

adrianmoisey · 2024-05-21T14:42:00Z

What does the Service look like? Can you paste a YAML representation of it here?

uttam-phygitalz · 2024-05-21T15:52:32Z

What does the Service look like? Can you paste a YAML representation of it here?

Service looks okay ..

`apiVersion: v1
kind: Service
metadata:
annotations:
helm.sh/resource-policy: keep
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
service.beta.kubernetes.io/aws-load-balancer-connection-draining-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-connection-draining-timeout: "60"
service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "300"
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-extra-security-groups: sg-0165192f2aa1fe8cc
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-type: nlb
creationTimestamp: "2024-02-18T19:10:42Z"
finalizers:

service.kubernetes.io/load-balancer-cleanup
labels:
app: ingress-nginx-external-nlb
app.kubernetes.io/managed-by: Helm
bu: cloud
name: ingress-nginx-external-nlb
namespace: qa
resourceVersion: "4809557489"
uid: 985e1e4e-cb41-4afa-896a-012e40d826dc
spec:
allocateLoadBalancerNodePorts: true
clusterIP: 172.20.196.247
clusterIPs:
172.20.196.247
externalTrafficPolicy: Local
healthCheckNodePort: 32161
internalTrafficPolicy: Cluster
ipFamilies:
IPv4
ipFamilyPolicy: SingleStack
ports:
name: https
nodePort: 31172
port: 443
protocol: TCP
targetPort: 443
selector:
app: ingress-nginx-external
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- hostname: a985e1eecbbbt414afa896a012e40d826d-7b200cd7fdc4afef.elb.us-west-2.amazonaws.com`

adrianmoisey · 2024-05-21T18:10:35Z

externalTrafficPolicy: Local
internalTrafficPolicy: Cluster

Given that internalTrafficPolicy is set to Cluster, I'd assume that Kubernetes would distribute the traffic evenly.

Since externalTrafficPolicy is set to Local, it may be the NLB that is causing this behaviour.

I agree with @aojea's suggestion of doing a test inside the cluster. That we it will help eliminate either the cluster to the load balancer.

aroradaman · 2024-05-22T06:48:49Z

/remove-kind bug
(until we are sure etp: local and NLB is not the culprit)

elmiko · 2024-05-22T16:27:58Z

we are discussing this in the sig cloud provider meeting today, we aren't quite sure this is specific to the cloud controller manager and not a configuration with the load balancer in aws. would like to see more data related the questions asked earlier.

cc @kmala

shaneutt · 2024-05-23T16:13:59Z

/assign @shaneutt

shaneutt · 2024-05-23T17:17:55Z

We discussed this one in the SIG Network meeting today, and it seems we have several open questions. I've assigned myself this just to try and help shepherd it forward, but @uttam-phygitalz it does seem there are some open questions about this above including a desire to see if this is something that might be happening outside the cluster? Please let us know?

/triage needs-information

shaneutt · 2024-06-25T18:55:14Z

Seems like this is getting stale.

/lifecycle stale

Let us know your thoughts on some of the above questions @uttam-phygitalz, or if you need any help or support in this?

elmiko · 2024-07-17T16:29:22Z

we talked about this again at sig cloud--provider today, deferring an acceptance on triage while we wait for more information.

aojea · 2024-07-17T17:30:24Z

/close

last comment from the reporter from May, it can always be reopened if there is more information

k8s-ci-robot · 2024-07-17T17:30:29Z

@aojea: Closing this issue.

In response to this:

/close

last comment from the reporter from May, it can always be reopened if there is more information

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

uttam-phygitalz added the kind/bug Categorizes issue or PR as related to a bug. label May 21, 2024

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 21, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 21, 2024

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 21, 2024

k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 22, 2024

k8s-ci-robot assigned shaneutt May 23, 2024

k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label May 23, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 25, 2024

k8s-ci-robot closed this as completed Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernets service not distributing traffic in equally , seeing imbalance in traffic . #125013

Kubernets service not distributing traffic in equally , seeing imbalance in traffic . #125013

uttam-phygitalz commented May 21, 2024

k8s-ci-robot commented May 21, 2024

T-Lakshmi commented May 21, 2024

adrianmoisey commented May 21, 2024

uttam-phygitalz commented May 21, 2024

adrianmoisey commented May 21, 2024

uttam-phygitalz commented May 21, 2024

aojea commented May 21, 2024

adrianmoisey commented May 21, 2024

uttam-phygitalz commented May 21, 2024

adrianmoisey commented May 21, 2024

aroradaman commented May 22, 2024

elmiko commented May 22, 2024

shaneutt commented May 23, 2024

shaneutt commented May 23, 2024

shaneutt commented Jun 25, 2024

elmiko commented Jul 17, 2024

aojea commented Jul 17, 2024

k8s-ci-robot commented Jul 17, 2024

Kubernets service not distributing traffic in equally , seeing imbalance in traffic . #125013

Kubernets service not distributing traffic in equally , seeing imbalance in traffic . #125013

Comments

uttam-phygitalz commented May 21, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented May 21, 2024

T-Lakshmi commented May 21, 2024

adrianmoisey commented May 21, 2024

uttam-phygitalz commented May 21, 2024

adrianmoisey commented May 21, 2024

uttam-phygitalz commented May 21, 2024

aojea commented May 21, 2024

adrianmoisey commented May 21, 2024

uttam-phygitalz commented May 21, 2024

adrianmoisey commented May 21, 2024

aroradaman commented May 22, 2024

elmiko commented May 22, 2024

shaneutt commented May 23, 2024

shaneutt commented May 23, 2024

shaneutt commented Jun 25, 2024

elmiko commented Jul 17, 2024

aojea commented Jul 17, 2024

k8s-ci-robot commented Jul 17, 2024