Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hotfix][Bug] suspend is not a stateless operation #1741

Merged
merged 3 commits into from
Dec 13, 2023

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Dec 12, 2023

Why are these changes needed?

  • Reproduce the error
    • Step 1: Set suspend to true.
    • Step 2: The status changes to suspended after all Pods are deleted.
    • Step 3: Set suspend back to false; this will trigger the creation of Pods.
    • Step 4: Set suspend back to true before all Pods are fully operational. In this case, the state remains suspended instead of becoming ready.
    • Step 5: At this moment, the KubeRay operator will skip the reconciliation because suspend: true and the status is suspended.

The function reconcilePods should only create Pods when suspend is false and the status is not suspended. However, RayCluster currently does not have a well-defined state machine. To avoid adding complexity to the codebase, I decided not to skip the reconciliation until we have a well-defined state machine.

Related issue number

#1711

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@kevin85421 kevin85421 changed the title [WIP][Hotfix] [Hotfix][Bug] suspend is not a stateless operation Dec 12, 2023
@kevin85421
Copy link
Member Author

cc @andrewsykim This is the follow up for #1711 (comment).

@kevin85421 kevin85421 marked this pull request as ready for review December 12, 2023 23:16
kevin85421 and others added 2 commits December 12, 2023 15:52
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>
@kevin85421 kevin85421 merged commit 1fe5ae7 into ray-project:master Dec 13, 2023
25 checks passed
@andrewsykim
Copy link
Contributor

@kevin85421 thanks for fixing! LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants