New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Endpoint watchers cannot assume Ready pods are "Ready" in the presence of process restarts #13364
Comments
@kubernetes/goog-cluster @kubernetes/rh-cluster-infra |
Are you sure? Kubelet should send a not ready status, then restart, then later a ready status. Endpoint controller should remove from the list, then later readd. The only thing not guaranteed here would be the latency between death and removal. If the stop and restart are so fast that they both happen in a single pass of the endpoint controller, then the pod would stay in the list. If that's a problem (why would it be?), then we could change endpoint controller to remove, then readd on the next loop--endpoint controller should know that a restart happened. |
So the use case is JBoss, which comes up and starts listening on port 8080 within about ~5-10 seconds. However, WAR loading takes longer than that - another 10-60s depending on how big your app is. The ready check guards that latter portion. So sequence of events is:
Essentially there is always a race between 4/5 that depending on the various propagation delays, the load balancer can observe port 8080 open before it observes the endpoint list purge the pod. We can minimize the window, but it's not truly zero. |
I'll fight hard to NOT put kubelet downtime in the critical path for pods. There's a natural propagation delay here, as you point out. Could we On Mon, Aug 31, 2015 at 1:19 PM, Clayton Coleman notifications@github.com
|
Hrm - not-ready has to propagate two layers asynchronously so there's On Mon, Aug 31, 2015 at 5:29 PM, Tim Hockin notifications@github.com
Clayton Coleman | Lead Engineer, OpenShift |
Is this issue still relevant? |
I think this is an app problem - reporting ready when it is not. |
If a pod process dies and is restarted by Kube (triggering the ready flag to be reset), there's no guarantee a load balancer watching endpoints sees the pod go back to not ready (kubelet updating status -> endpoints controller -> load balancer watcher) prior to the pod starting to listen on its port. For any process that opens its TCP port before being ready, this means when a process dies and is restarted, load balancers will continue to hit the endpoint before its readiness check passes.
One solution is to have your load balancer include the readiness check - but that only works a) if your load balancer supports it, or b) if you chose http or tcp (load balancers can't use the exec check). Most load balancers do support ready checks, although not all support the same set of options.
One possible fix for the exec check is to expose on the kubelet an endpoint that can be a readiness proxy for the process -
https://kubelet:10250/v1/pods/<pod name or maybe pod ip>/readinesscheck
that uses the internal ready bool to return. This would put slightly higher load on the kubelet, and mean the kubelet restarting would result in load balancers thinking the pod is down.The text was updated successfully, but these errors were encountered: