Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC: disproportionate load balancing #4054

Closed
natemurthy opened this issue Apr 30, 2019 · 3 comments
Closed

gRPC: disproportionate load balancing #4054

natemurthy opened this issue Apr 30, 2019 · 3 comments

Comments

@natemurthy
Copy link

NGINX Ingress controller version: nginx-ingress-controller:0.22.0
Kubernetes version: v1.12.3
Environment:

  • Cloud provider or hardware configuration: on-prem
  • OS (e.g. from /etc/os-release): Ubuntu 16.04
  • Kernel (e.g. uname -a): Linux 4.15.0

What happened:
I have a cluster running 5 pods (replicas) of the same gRPC server deployment and multiple clients (about 80) running outside the cluster. The clients connect to the backend pods through an nginx-ingress configured with the GRPC annotation. Occasionally I will observe that one or more pods receive a disproportionate number of connections:

grpc-unbalanced

The reader will notice that between 12:30 and 14:30 one pod was handling nearly 80% of all the incoming connections! Sometimes this may last for just an hour (I have a configuration snippet with grpc_read_timeout 3600s; set), sometimes this may last for several hours.

What you expected to happen:
I would expect connections to be roughly uniformly balanced across each pod, for example:

grpc-somewhat-balanced

How to reproduce it (as minimally and precisely as possible):
It is unclear how to reproduce this other than just running gRPC servers with both unary and streaming handlers across multiple pods reachable via a ClusterIP service type exposed through nginx-ingress (using default round-robin load balancer) with DNS endpoint, and observing this behavior over several hours.

@aledbf
Copy link
Member

aledbf commented May 1, 2019

@natemurthy please update to 0.24.1 and disable reuse-port in the configuration configmap

@natemurthy
Copy link
Author

natemurthy commented May 1, 2019

Can you point to a specific issue resolved in 0.24.1 that fixes this? I will give this a try but will take some time to verify because our ingress controller is a shared resource across many organizations' pods and namespaces.

@natemurthy
Copy link
Author

natemurthy commented May 7, 2019

@aledbf I have confirmed that your recommendation works as desired. You can see the changes applied at around 10:02 on the below. Closing this out. Thank you for the support!

success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants