Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCS FT] Improve GCS FT cleanup UX #1592

Merged
merged 2 commits into from
Nov 1, 2023

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Nov 1, 2023

Why are these changes needed?

  • Mitigate the penalty if the Redis cleanup Job fails.

    • Minimize the computing resources for the K8s Job.
    • Set the backoffLimit to 0 to avoid 6 dead Pods to overload ResourceQuota's count/pods.
  • With the default configuration redis_db_connect_retries = 600 and redis_db_connect_wait_milliseconds = 100, it takes 13 mins to fail. This PR configures the environment variables so that the Job takes 3 mins to fail. See [GCS FT] Update Redis connection configs ray#40860 for more details.

  • Add loggings to improve the observability for troubleshooting.

  • Note:

    • If we delete the Redis deployment, the Redis cleanup Job cannot connect to Redis, so it retries based on the environment variables mentioned above.
    • If we delete the key in Redis, the Redis cleanup Job can connect to Redis but fails to delete it due to non-existence. Hence, it fails immediately.

Related issue number

#1557

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@kevin85421 kevin85421 added the 1.0 label Nov 1, 2023
@kevin85421 kevin85421 merged commit 6fa2d3a into ray-project:master Nov 1, 2023
23 checks passed
kevin85421 added a commit to kevin85421/kuberay that referenced this pull request Nov 2, 2023
Copy link

@Vman909 Vman909 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need more suggestions like to answer this telephone with address

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants