Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Another possible way to detect readiness failures in fault tolerance after release 0.5.0 #990

Closed
1 of 2 tasks
Yicheng-Lu-llll opened this issue Mar 26, 2023 · 0 comments · Fixed by #1341
Closed
1 of 2 tasks
Labels
enhancement New feature or request

Comments

@Yicheng-Lu-llll
Copy link
Contributor

Yicheng-Lu-llll commented Mar 26, 2023

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

In the current implementation of fault tolerance, the kuberay operator watches the events to detect readiness probe failures. If detected, the pod is marked as unhealthy and will be deleted, and start a new one in later Reconciliation.

As @kevin85421 suggests #601 (comment), watching readiness probe fail event may have downside:

we finally need to stop watching events because operator operations should be idempotent and stateless. However, events are time-sensitive and deleting Pods based on events are not idempotent.

A potential solution to the problem is to examine the ContainerStatus.Ready property to determine if a readiness probe has failed (refer to https://pkg.go.dev/k8s.io/api/core/v1#ContainerStatus for more information).

// ContainerStatus contains details for the current status of this container.
type ContainerStatus struct {
	...
	// Specifies whether the container has passed its readiness probe.
	Ready bool `json:"ready" protobuf:"varint,4,opt,name=ready"`
        ...
	
}

And kuberay operator can watch the ContainerStatus.Ready (not sure watch status of an object is in this case is a good approach or not). Alternatively, Kuberay operator can handle readiness probe failures during the reconciliations that are triggered by other object changes.

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant