[Feature] Another possible way to detect readiness failures in fault tolerance after release 0.5.0 #990
Closed
1 of 2 tasks
Labels
enhancement
New feature or request
Search before asking
Description
In the current implementation of fault tolerance, the kuberay operator watches the events to detect readiness probe failures. If detected, the pod is marked as unhealthy and will be deleted, and start a new one in later Reconciliation.
As @kevin85421 suggests #601 (comment), watching readiness probe fail event may have downside:
A potential solution to the problem is to examine the ContainerStatus.Ready property to determine if a readiness probe has failed (refer to https://pkg.go.dev/k8s.io/api/core/v1#ContainerStatus for more information).
And kuberay operator can watch the
ContainerStatus.Ready
(not sure watchstatus
of an object is in this case is a good approach or not). Alternatively, Kuberay operator can handle readiness probe failures during the reconciliations that are triggered by other object changes.Use case
No response
Related issues
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: