-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional health status to Endpoints #22120
Comments
@howardjohn fyi |
cc @snowp |
Thanks for your suggestions. I have some questions. 🤔 For a endpoint, when will you treat it as unready? If there are some intermittent readiness failures, how the Istio distinguish unready and unhealthy? If a endpoint is killed by the liveness and then is restarted, how the Istio distinguish unready and unhealthy? |
It is based on readiness probe that user has defined in K8s
It does not distinguish between unready and unhealthy. All unready endpoints are sent as unhealthy (due to lack of separate status, which is this feature request for) to Envoy. Based on passive health checking (outlier detection), Envoy marks them as Unhealthy.
Same as above |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
not stale |
Still is needed |
@wbpcode Do you think it is reasonable to add this status? |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
@mattklein123 can you please mark this "help wanted"? |
When a controlplane has to send "Unready" k8s endpoints to Envoy, when Envoy runs in to panic mode, Envoy sends traffic to "Unready" endpoints. This breaks the Kubernetes contract that "Unready" endpoints should never receive traffic.
In Istio, we send "Unready" endpoints with "Unhealthy" status. So the panic threshold algorithm can not distinguish between "Unready" vs. "Unhealthy".
Please refer to istio/istio#18367 for why we send "Unready" endpoints. This also needed for warmup duration (slow start mode) so that intermittent readiness failures (after the service is ready for the first time) does not cause spurious warmups.
The feature request is to add additional status for "UNREADY" endpoints, that load balancer can exclude while considering hosts during panic threshold scenarios.
The text was updated successfully, but these errors were encountered: