Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Fix flaky test: should be able to update all Pods to Running #893

Merged
merged 1 commit into from
Feb 3, 2023

Conversation

kevin85421
Copy link
Member

Why are these changes needed?

The test "should be able to update all Pods to Running" is flaky. See

https://github.com/ray-project/kuberay/actions/runs/4058693982/jobs/6985922032
https://github.com/ray-project/kuberay/actions/runs/4071855521/jobs/7014050487
https://github.com/ray-project/kuberay/actions/runs/4049283327/jobs/6965472578
https://github.com/ray-project/kuberay/actions/runs/4037511626/jobs/6940851665

For Kubernetes, we use lister to get Kubernetes cluster information from the local cache. However, the local cache may have a jet lag with the Kubernetes API server. Hence, we need to replace Expect with Eventually to wait for the local cache to become up-to-date.

Related issue number

#882

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

I tested this PR with the following bash script. It succeeds 10 times consecutively.

#!/bin/bash
for i in {1..10};
        do echo "iteration ${i}";
        make test | tee log${i}
done

@kevin85421 kevin85421 added the bug Something isn't working label Feb 2, 2023
@kevin85421
Copy link
Member Author

@davidxia would you mind reviewing this PR? Thank you!

Copy link
Contributor

@davidxia davidxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah didn't know. Thanks!

@kevin85421
Copy link
Member Author

kevin85421 commented Feb 4, 2023

This test is still flaky. (See link for more details.) Only "should be able to update all Pods to Running" fails, so all Pods become running before the start of the test "cluster's .status.state should be updated to 'ready' shortly after all Pods are Running". We may need to increase the timeout of the test.

@Yicheng-Lu-llll will update the test and run more runs to test its stability.

lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
…ay-project#893)

For Kubernetes, we use lister to get Kubernetes cluster information from the local cache. However, the local cache may have a jet lag with the Kubernetes API server. Hence, we need to replace Expect with Eventually to wait for the local cache to become up-to-date.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants