Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the support to replace evicted head pod #381

Merged
merged 1 commit into from
Jul 15, 2022

Conversation

Jeffwan
Copy link
Collaborator

@Jeffwan Jeffwan commented Jul 15, 2022

Why are these changes needed?

Currently, operator doesn't handle the failed pods. When pod is evicted, it just raise error and operator keeps requeue the item. From user journey side, we actually like to replace the head pod with a healthy one. This change catch the evict pod and delete it in the current loop, the next reconcile loop triggered by pod deletion will find the head missing and create a new pod instead.

Related issue number

Closes #372

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@DmitriGekhtman
Copy link
Collaborator

Updated the issue description to close the linked issue.

Thank you, this looks good!

@Jeffwan Jeffwan merged commit 9de2fb0 into ray-project:master Jul 15, 2022
@Jeffwan Jeffwan deleted the jiaxin/remove_evicted_pod branch July 15, 2022 20:51
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Evicted head pod is not replaced
2 participants