coordinator,maintainer: Fixed bootstrap might fail to succeed in frequent maintainer scheduling#4114
Conversation
📝 WalkthroughWalkthroughAdds bootstrap-completion gating to maintainer move logic and a safety guard for premature maintainer removal; updates tests to cover the new bootstrap requirement and adds mocks in coordinator tests to mark changefeeds as BootstrapDone. Changes
Sequence Diagram(s)sequenceDiagram
participant Coordinator as Coordinator
participant Operator as MoveMaintainerOperator
participant Origin as OriginNode
participant Dest as DestNode
rect rgba(200,200,255,0.5)
Coordinator->>Operator: schedule move (origin -> dest)
Operator->>Origin: observe status
Origin-->>Operator: Stopped
Operator->>Dest: observe status
Dest-->>Operator: Working, BootstrapDone=false
note right of Operator: do not finish yet
Dest-->>Operator: Working, BootstrapDone=true
Operator->>Coordinator: mark finished / apply schedule
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @hongyunyan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the stability and correctness of changefeed lifecycle management, particularly during node movements and restarts. By introducing a Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces important correctness fixes for changefeed move and restart operations. The core issue addressed is a race condition where a changefeed maintainer could get stuck in a bootstrapping state if a move/remove operation was initiated concurrently.
The changes include:
- The
MoveMaintainerOperatornow correctly waits for the destination maintainer to be fully bootstrapped by checking a newBootstrapDoneflag in its status. This ensures a move operation is only considered complete when the changefeed is fully operational on the new node. - The
Maintaineris now more robust against prematureremoverequests. It will ignore non-cascade remove requests that arrive before it has finished its own bootstrap process. This prevents the maintainer from entering aremovingstate that would block the bootstrap from ever completing, thus avoiding a potential deadlock.
The changes are well-implemented and accompanied by thorough unit tests that cover the new logic and edge cases. The overall code quality is high. This is a solid improvement to the system's robustness.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@maintainer/maintainer_remove_before_bootstrap_test.go`:
- Line 1: Add the standard copyright header at the very top of this file before
the "package maintainer" declaration; ensure it matches the project's canonical
header text and formatting (including year and owner) used across the repo so
the CI header check passes, and do not alter the existing "package maintainer"
line or any other code in maintainer_remove_before_bootstrap_test.go.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: asddongmen, wk989898 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
@hongyunyan: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
…uent maintainer scheduling (pingcap#4114) close pingcap#4115
What problem does this PR solve?
Issue Number: close #4115
What is changed and how it works?
This pull request significantly improves the stability and correctness of changefeed lifecycle management, particularly during node movements and restarts. By introducing a
BootstrapDoneflag and refining the handling of remove requests, it ensures that changefeeds are fully initialized before operations are considered complete and prevents erroneous premature removals, thereby enhancing the overall reliability of the system.Highlights
MoveMaintainerOperatornow explicitly waits for aBootstrapDoneflag from the destination node'sMaintainerStatusbefore considering a changefeed move operation complete. This ensures the changefeed is fully initialized on the new node.Maintainercomponent has been updated to ignore non-cascade remove requests if they arrive before the maintainer has completed its bootstrap process. This prevents premature termination and potential issues where bootstrap responses might be dropped, leading to a stuck state.ignoredNonCascadeRemoveBeforeBootstrapatomic boolean has been introduced in theMaintainerto prevent log spam when multiple non-cascade remove requests are received before bootstrap.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.