Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][Buildkite] Test RayCluster sample YAMLs (nightly operator) is stuck indefinitely #1459

Closed
1 of 2 tasks
kevin85421 opened this issue Sep 28, 2023 · 2 comments · Fixed by #1465
Closed
1 of 2 tasks
Assignees
Labels
1.0 bug Something isn't working

Comments

@kevin85421
Copy link
Member

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ci

What happened + What you expected to happen

https://buildkite.com/ray-project/ray-ecosystem-ci-kuberay-ci/builds/1742#018ad8ca-1dc6-44a5-b1be-6dd83d490bbe

Reproduction script

The test is stuck indefinitely in every PR / commit.

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@kevin85421 kevin85421 added the bug Something isn't working label Sep 28, 2023
@kevin85421 kevin85421 added the 1.0 label Sep 28, 2023
@architkulkarni
Copy link
Contributor

Do you happen to know roughly when this started to fail? If not no worries, I should be able to look through past commits

@architkulkarni
Copy link
Contributor

The offending PR is likely #1412.

The prior 9 commits all passed this test, but it failed in this commit and the 15 next commits (except for the 6th commit out of these, which I assume is due to randomness)

architkulkarni added a commit that referenced this issue Oct 2, 2023
My understanding is that the RayCluster sample YAML test framework only adds RayCluster CRs, but doesn't add the other resources in the sample YAML file (for example the external redis deployment). In the case of the external redis sample YAML, the sample YAML test started failing after #1412 and my tentative hypothesis is that the cleanup job added by the PR hangs if there's no external redis.

For now, we should merge this PR to unbreak CI. Later, we can decide whether to properly support an end-to-end external redis test.

Related issue number
Closes #1459

Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
kevin85421 pushed a commit to kevin85421/kuberay that referenced this issue Oct 17, 2023
My understanding is that the RayCluster sample YAML test framework only adds RayCluster CRs, but doesn't add the other resources in the sample YAML file (for example the external redis deployment). In the case of the external redis sample YAML, the sample YAML test started failing after ray-project#1412 and my tentative hypothesis is that the cleanup job added by the PR hangs if there's no external redis.

For now, we should merge this PR to unbreak CI. Later, we can decide whether to properly support an end-to-end external redis test.

Related issue number
Closes ray-project#1459

Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants