-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Readiness probe failed: timeout on minikube #2158
Comments
Which Ray images are you using? You should use images that include |
@kevin85421 yes, I'm using |
@kevin85421 do you have any idea what may be happening? This blocks me. |
I tried the following on my Mac M1, and my RayCluster is healthy; no pods have been killed. kind create cluster
helm install kuberay-operator kuberay/kuberay-operator --version 1.1.1
helm install raycluster kuberay/ray-cluster --version 1.1.1 --set image.tag=2.22.0-py310-aarch64
Btw, are you in the Ray Slack channel? It will be helpful to join the Slack workspace. Other KubeRay users can also share their experiences. You can join |
@kevin85421 what container runtime do you use? Colima or Docker Desktop? |
I use Docker. |
Ok @kevin85421, I think I found the culprit, some weird behaviour with Example cases:
Also noticed not setting So I see two possible things here (which may be interconnected):
Disabling What do you think? |
Search before asking
KubeRay Component
ray-operator
What happened + What you expected to happen
After RayCluster is launched (about 40s) operator kills all worker pods due to failed readiness probe, nothing is restarted, only head node stays (which passes the probe okay). Events:
Readiness probe failed: command "bash -c wget -T 2 -q -O- http://localhost:52365/api/local_raylet_healthz | grep success" timed out
Repeated for each worker pod which are then killed. Head node stays healthy, workers are not restarted.
Reproduction script
Anything else
I'm running minikube inside colima on m2 mac. Tried different arm64 versions of kuberay operator (1.1.0 and 1.1.1) - same problem.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: