Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][GCS FT] Clean up the Redis key before the head Pod is deleted #1989

Merged
merged 1 commit into from
Mar 14, 2024

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Mar 13, 2024

Why are these changes needed?

#1785 checks numDeletedHeads to determine whether to create the Redis cleanup Kubernetes Job or not. However, numDeletedHeads does not always indicate the number of head Pods that have been deleted but rather those that are in the process of being deleted. Hence, since GCS is still running, the Redis key will not be fully deleted.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(
  • Create a RayCluster with ray-cluster.external-redis-uri.yaml.
  • Delete the RayCluster: kubectl delete rayclusters.ray.io raycluster-external-redis-uri
  • Check the Redis after all Ray Pods are deleted.
    kubectl exec -it $YOUR_REDIS_POD -- redis-cli -a "5241590000000000"
    KEYS * # expected behavior: no key (without this PR), 1 key (with this PR)

@kevin85421 kevin85421 changed the title WIP [Bug][GCS FT] Clean up the Redis key before the head Pod is deleted Mar 13, 2024
@kevin85421 kevin85421 marked this pull request as ready for review March 13, 2024 20:33
@kevin85421 kevin85421 requested a review from jjyao March 13, 2024 20:40
@kevin85421
Copy link
Member Author

cc @rueian

Copy link
Contributor

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests?

@kevin85421
Copy link
Member Author

Do we have tests?

We have a check here. However, it is in the tearDown hook, which will not cause the test to fail. In the screenshot below, the test passes although the check fails. We will have a follow up about it.

Screen Shot 2024-03-13 at 5 31 40 PM

@kevin85421
Copy link
Member Author

Open an issue to track the progress #1997.

@kevin85421 kevin85421 merged commit d632ac1 into ray-project:master Mar 14, 2024
23 checks passed
kevin85421 added a commit that referenced this pull request Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants