-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC: disproportionate load balancing #4054
Comments
@natemurthy please update to 0.24.1 and disable reuse-port in the configuration configmap |
Can you point to a specific issue resolved in 0.24.1 that fixes this? I will give this a try but will take some time to verify because our ingress controller is a shared resource across many organizations' pods and namespaces. |
@aledbf I have confirmed that your recommendation works as desired. You can see the changes applied at around 10:02 on the below. Closing this out. Thank you for the support! |
NGINX Ingress controller version:
nginx-ingress-controller:0.22.0
Kubernetes version:
v1.12.3
Environment:
uname -a
): Linux 4.15.0What happened:
I have a cluster running 5 pods (replicas) of the same gRPC server deployment and multiple clients (about 80) running outside the cluster. The clients connect to the backend pods through an nginx-ingress configured with the GRPC annotation. Occasionally I will observe that one or more pods receive a disproportionate number of connections:
The reader will notice that between 12:30 and 14:30 one pod was handling nearly 80% of all the incoming connections! Sometimes this may last for just an hour (I have a configuration snippet with
grpc_read_timeout 3600s;
set), sometimes this may last for several hours.What you expected to happen:
I would expect connections to be roughly uniformly balanced across each pod, for example:
How to reproduce it (as minimally and precisely as possible):
It is unclear how to reproduce this other than just running gRPC servers with both unary and streaming handlers across multiple pods reachable via a
ClusterIP
service type exposed through nginx-ingress (using defaultround-robin
load balancer) with DNS endpoint, and observing this behavior over several hours.The text was updated successfully, but these errors were encountered: