Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional health status to Endpoints #22120

Open
ramaraochavali opened this issue Jul 12, 2022 · 10 comments
Open

Add additional health status to Endpoints #22120

ramaraochavali opened this issue Jul 12, 2022 · 10 comments

Comments

@ramaraochavali
Copy link
Contributor

When a controlplane has to send "Unready" k8s endpoints to Envoy, when Envoy runs in to panic mode, Envoy sends traffic to "Unready" endpoints. This breaks the Kubernetes contract that "Unready" endpoints should never receive traffic.

In Istio, we send "Unready" endpoints with "Unhealthy" status. So the panic threshold algorithm can not distinguish between "Unready" vs. "Unhealthy".

Please refer to istio/istio#18367 for why we send "Unready" endpoints. This also needed for warmup duration (slow start mode) so that intermittent readiness failures (after the service is ready for the first time) does not cause spurious warmups.

The feature request is to add additional status for "UNREADY" endpoints, that load balancer can exclude while considering hosts during panic threshold scenarios.

@ramaraochavali ramaraochavali added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Jul 12, 2022
@ramaraochavali
Copy link
Contributor Author

@howardjohn fyi

@wbpcode wbpcode added area/load balancing and removed triage Issue requires triage labels Jul 13, 2022
@wbpcode
Copy link
Member

wbpcode commented Jul 13, 2022

cc @snowp

@wbpcode
Copy link
Member

wbpcode commented Jul 13, 2022

Thanks for your suggestions. I have some questions. 🤔 For a endpoint, when will you treat it as unready? If there are some intermittent readiness failures, how the Istio distinguish unready and unhealthy? If a endpoint is killed by the liveness and then is restarted, how the Istio distinguish unready and unhealthy?

@ramaraochavali
Copy link
Contributor Author

For a endpoint, when will you treat it as unready?

It is based on readiness probe that user has defined in K8s

If there are some intermittent readiness failures, how the Istio distinguish unready and unhealthy?

It does not distinguish between unready and unhealthy. All unready endpoints are sent as unhealthy (due to lack of separate status, which is this feature request for) to Envoy. Based on passive health checking (outlier detection), Envoy marks them as Unhealthy.

If a endpoint is killed by the liveness and then is restarted, how the Istio distinguish unready and unhealthy?

Same as above

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 12, 2022
@ramaraochavali
Copy link
Contributor Author

not stale

@ramaraochavali
Copy link
Contributor Author

Still is needed

@github-actions github-actions bot removed the stale stalebot believes this issue/PR has not been touched recently label Aug 13, 2022
@ramaraochavali
Copy link
Contributor Author

@wbpcode Do you think it is reasonable to add this status?

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Sep 21, 2022
@ramaraochavali
Copy link
Contributor Author

@mattklein123 can you please mark this "help wanted"?

@github-actions github-actions bot removed the stale stalebot believes this issue/PR has not been touched recently label Sep 21, 2022
@mattklein123 mattklein123 added help wanted Needs help! and removed enhancement Feature requests. Not bugs or questions. labels Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants