Skip to content

api,cli: fix running changefeed resume handling#4894

Open
wlwilliamx wants to merge 1 commit intopingcap:masterfrom
wlwilliamx:fix/resume-normal-changefeed
Open

api,cli: fix running changefeed resume handling#4894
wlwilliamx wants to merge 1 commit intopingcap:masterfrom
wlwilliamx:fix/resume-normal-changefeed

Conversation

@wlwilliamx
Copy link
Copy Markdown
Collaborator

@wlwilliamx wlwilliamx commented Apr 23, 2026

What problem does this PR solve?

Issue Number: close #4893

What is changed and how it works?

This PR rejects changefeed resume requests when the changefeed is already in a running state.

The change adds a shared FeedState.IsResumable predicate and uses it in:

  • CLI parameter validation, so cdc cli changefeed resume fails before printing a misleading success message or sending the resume request.
  • HTTP resume handling, before downstream validation and resume GC safepoint/barrier setup.
  • Coordinator resume handling, replacing the previous silent no-op with ErrChangefeedUpdateRefused.

This prevents running changefeeds from reporting resume success and avoids creating unnecessary resuming GC guards for invalid resume requests.

Check List

Tests

  • Unit test
  • Manual test
go test ./api/v2 ./coordinator ./pkg/config ./cmd/cdc/cli -count=1

Questions

Will it cause performance regression or break compatibility?

No performance regression. Invalid resume requests for running changefeeds now return an explicit error instead of success/no-op.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

Fix `cdc cli changefeed resume` to reject running changefeeds instead of reporting success, and avoid creating temporary resume GC guards for invalid resume requests.

Summary by CodeRabbit

  • Bug Fixes

    • Changefeed resume operations now validate that the changefeed is in a resumable state (stopped, failed, or finished) before proceeding. Requests to resume changefeeds in other states are rejected early with a clear error message.
  • Tests

    • Added comprehensive tests validating changefeed resume state validation across CLI, API, and coordinator layers.

Signed-off-by: wlwilliamx <wlwilliamx@gmail.com>
@ti-chi-bot ti-chi-bot Bot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Apr 23, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign flowbehappy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

Early state validation is added to CLI and API resume handlers to verify changefeeds are in a resumable state (stopped/failed/finished) before proceeding with resume operations. A new IsResumable() method encapsulates this check, with handlers returning ErrChangefeedUpdateRefused for non-resumable states instead of silently ignoring or reporting false success.

Changes

Cohort / File(s) Summary
State Validation Logic
pkg/config/changefeed.go, pkg/config/changefeed_test.go
Introduces FeedState.IsResumable() method that defines stopped/failed/finished as resumable states, with comprehensive unit tests covering all state cases.
API v2 Handler
api/v2/changefeed.go, api/v2/changefeed_test.go
Adds early state validation via validateResumeChangefeedState function to reject non-resumable changefeeds before GC prerequisites, with tests verifying rejection of normal/warning/pending states and acceptance of stopped/failed/finished states.
CLI Handler
cmd/cdc/cli/cli_changefeed_resume.go, cmd/cdc/cli/cli_changefeed_resume_test.go
Implements state validation guard in validateParams to refuse resume for non-resumable changefeeds; updates existing test mocks and adds new test for rejecting normal state.
Coordinator Handler
coordinator/controller.go, coordinator/controller_test.go
Replaces silent ignoring of non-resumable states with ErrChangefeedUpdateRefused error; test updated to expect refusal instead of success for normal-state changefeeds.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A changefeed that's running won't run again,
We check its state before the flow begins,
No false success, no GC barriers lost,
Early validation prevents the cost! 🌻

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'api,cli: fix running changefeed resume handling' clearly describes the main changes: fixes to API and CLI components related to changefeed resume handling for running changefeeds.
Description check ✅ Passed The PR description follows the template with all required sections: issue reference (close #4893), detailed explanation of changes, test coverage statement (unit and manual tests with go test command), answers to compatibility questions, and a release note.
Linked Issues check ✅ Passed All coding requirements from issue #4893 are met: CLI validation rejects normal changefeeds, HTTP handler validates state early, coordinator returns explicit error instead of silent no-op, and temporary GC guards are avoided.
Out of Scope Changes check ✅ Passed All changes directly address issue #4893: introducing IsResumable() predicate and validating state in CLI, API, and coordinator. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 23, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a centralized IsResumable check for changefeeds within the pkg/config package and integrates it into the API, CLI, and Coordinator layers to ensure that only changefeeds in Stopped, Failed, or Finished states can be resumed. The changes include updated error handling in the Coordinator and comprehensive test coverage across all affected components. Feedback from the review suggests further centralizing the validation logic and error messages to reduce duplication across the different layers and improve maintainability.

Comment thread pkg/config/changefeed.go
Comment on lines +114 to +121
func (s FeedState) IsResumable() bool {
switch s {
case StateFailed, StateStopped, StateFinished:
return true
default:
return false
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message associated with this state check is duplicated in three different places (api/v2/changefeed.go, coordinator/controller.go, and cmd/cdc/cli/cli_changefeed_resume.go).

To improve maintainability and ensure consistency if the list of resumable states ever changes, consider defining a shared constant or a helper method on FeedState that returns the allowed states as a string, or even a method that returns the formatted error message.

Comment thread api/v2/changefeed.go
Comment on lines +1066 to +1073
func validateResumeChangefeedState(state config.FeedState) error {
if state.IsResumable() {
return nil
}
return errors.ErrChangefeedUpdateRefused.GenWithStackByArgs(
fmt.Sprintf("can only resume changefeed when it is stopped, failed, or finished, but current state is %s", state),
)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This helper function is only used once in ResumeChangefeed. While it's not a bug, for consistency with other handlers in this file (like UpdateChangefeed at line 930), you might consider inlining this check or refactoring it into a shared location in pkg/config since the same logic and error message are repeated in the CLI and Coordinator layers.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
coordinator/controller.go (1)

799-806: LGTM — refusal is now explicit.

Returning ErrChangefeedUpdateRefused instead of the previous silent no-op correctly surfaces the rejection to HTTP/CLI callers, and aligns with the new CLI/API early guards. The early return before backend.ResumeChangefeed and moveChangefeedToSchedulingQueue also avoids the temporary resuming GC guard problem described in the issue.

Nit: zap.Any("state", state) can be zap.String("state", string(state)) since FeedState is a string alias — cheaper and a more precise field type.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@coordinator/controller.go` around lines 799 - 806, Replace the zap.Any usage
for the changefeed state in the warning log with a string field for efficiency
and precision: where the code checks state.IsResumable() and returns
errors.ErrChangefeedUpdateRefused.GenWithStackByArgs, update the log.Warn call
to use zap.String("state", string(state)) instead of zap.Any("state", state) so
the FeedState (string alias) is logged as a concrete string field.
api/v2/changefeed.go (1)

757-767: Consider hoisting the state check above the keyspace state check.

The resumable-state guard is placed after keyspaceMeta.State != KeyspaceState_ENABLED, but it only needs cfInfo.State which is already available from line 743. Moving the validateResumeChangefeedState call to immediately after the co.GetChangefeed block (before the keyspaceMeta check, or at least before any further network work) would make the early-exit intent clearer and consistent with the CLI guard placement. Functionally equivalent today, but a cleaner ordering.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v2/changefeed.go` around lines 757 - 767, Move the resumable-state guard
so it runs immediately after obtaining cfInfo (i.e., right after the
co.GetChangefeed result) by calling validateResumeChangefeedState(cfInfo.State)
before checking keyspaceMeta.State; specifically, hoist the
validateResumeChangefeedState call up so it executes prior to the
keyspaceMeta/state check and returns the same error via c.Error when invalid,
keeping the existing behavior of c.IndentedJSON and c.Abort for the keyspaceMeta
check intact.
pkg/config/changefeed_test.go (1)

41-44: Minor: include the state in the assertion message for easier failure triage.

require.Equal on each loop iteration will report a mismatch but not which state failed. Adding tt.state as a msgAndArgs makes the failure actionable at a glance.

Suggested tweak
-	for _, tt := range tests {
-		require.Equal(t, tt.resumable, tt.state.IsResumable())
-	}
+	for _, tt := range tests {
+		require.Equal(t, tt.resumable, tt.state.IsResumable(), "state=%s", tt.state)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/changefeed_test.go` around lines 41 - 44, Update the loop
assertion so failures include the state under test: when iterating over tests,
pass tt.state (or a description like tt.state.String() if available) as the
msgAndArgs to require.Equal so the assertion reads require.Equal(t,
tt.resumable, tt.state.IsResumable(), tt.state) — this ensures any mismatch
prints which tt.state caused the failure; reference the loop variables tests and
tt and the IsResumable() method to locate where to add the msg.
pkg/config/changefeed.go (1)

113-121: Consider exposing the user-facing error message alongside IsResumable.

The predicate centralizes the rule well, but the human-readable message "can only resume changefeed when it is stopped, failed, or finished, but current state is %s" is now duplicated verbatim at three call sites (api/v2/changefeed.go:1071, cmd/cdc/cli/cli_changefeed_resume.go:167, coordinator/controller.go:801). If the set of resumable states ever changes, the three messages will drift. Optional: expose a helper on FeedState (e.g., ResumableErrorMessage(s FeedState) string) or a package-level format string so the message tracks the predicate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/changefeed.go` around lines 113 - 121, The IsResumable predicate
is fine but the user-facing error string is duplicated at call sites; add a
single source of truth (either a method on FeedState like
ResumableErrorMessage() or a package-level format string constant such as
ResumableErrorFmt) and update callers that currently check IsResumable() to use
that helper when constructing the error ("can only resume changefeed when it is
stopped, failed, or finished, but current state is %s"), ensuring FeedState and
IsResumable remain the canonical state logic while the new
ResumableErrorMessage/ResumableErrorFmt produces the formatted human message so
changes to resumable states automatically keep the message in sync.
api/v2/changefeed_test.go (1)

80-163: Consider extracting the fake server/coordinator to a shared test helper.

resumeNormalServer and resumeNormalCoordinator each implement ~10 interface methods just to exercise one handler. If additional v2 handler tests need similar fakes (likely, given the PR adds early-guard patterns), extracting these into a changefeed_test_helpers.go (or using a mock generated via gomock/mockery) would reduce boilerplate and keep the interface-satisfaction code out of the test body. Not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v2/changefeed_test.go` around lines 80 - 163, Extract the test fakes
resumeNormalServer and resumeNormalCoordinator into a shared test helper (e.g.,
changefeed_test_helpers.go) so other v2 handler tests can reuse them; move the
type definitions and all their methods (Run, Close, SelfInfo, Liveness,
GetCoordinator, IsCoordinator, GetCoordinatorInfo, GetPdClient, GetEtcdClient,
GetMaintainerManager on resumeNormalServer and Stop, Run, ListChangefeeds,
GetChangefeed, CreateChangefeed, RemoveChangefeed, PauseChangefeed,
ResumeChangefeed, UpdateChangefeed, RequestResolvedTsFromLogCoordinator,
Initialized on resumeNormalCoordinator) into that file (or replace with
generated mocks via gomock/mockery) and update tests to import/use
resumeNormalServer and resumeNormalCoordinator from the helper to remove
boilerplate in changefeed_test.go.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@api/v2/changefeed_test.go`:
- Around line 80-163: Extract the test fakes resumeNormalServer and
resumeNormalCoordinator into a shared test helper (e.g.,
changefeed_test_helpers.go) so other v2 handler tests can reuse them; move the
type definitions and all their methods (Run, Close, SelfInfo, Liveness,
GetCoordinator, IsCoordinator, GetCoordinatorInfo, GetPdClient, GetEtcdClient,
GetMaintainerManager on resumeNormalServer and Stop, Run, ListChangefeeds,
GetChangefeed, CreateChangefeed, RemoveChangefeed, PauseChangefeed,
ResumeChangefeed, UpdateChangefeed, RequestResolvedTsFromLogCoordinator,
Initialized on resumeNormalCoordinator) into that file (or replace with
generated mocks via gomock/mockery) and update tests to import/use
resumeNormalServer and resumeNormalCoordinator from the helper to remove
boilerplate in changefeed_test.go.

In `@api/v2/changefeed.go`:
- Around line 757-767: Move the resumable-state guard so it runs immediately
after obtaining cfInfo (i.e., right after the co.GetChangefeed result) by
calling validateResumeChangefeedState(cfInfo.State) before checking
keyspaceMeta.State; specifically, hoist the validateResumeChangefeedState call
up so it executes prior to the keyspaceMeta/state check and returns the same
error via c.Error when invalid, keeping the existing behavior of c.IndentedJSON
and c.Abort for the keyspaceMeta check intact.

In `@coordinator/controller.go`:
- Around line 799-806: Replace the zap.Any usage for the changefeed state in the
warning log with a string field for efficiency and precision: where the code
checks state.IsResumable() and returns
errors.ErrChangefeedUpdateRefused.GenWithStackByArgs, update the log.Warn call
to use zap.String("state", string(state)) instead of zap.Any("state", state) so
the FeedState (string alias) is logged as a concrete string field.

In `@pkg/config/changefeed_test.go`:
- Around line 41-44: Update the loop assertion so failures include the state
under test: when iterating over tests, pass tt.state (or a description like
tt.state.String() if available) as the msgAndArgs to require.Equal so the
assertion reads require.Equal(t, tt.resumable, tt.state.IsResumable(), tt.state)
— this ensures any mismatch prints which tt.state caused the failure; reference
the loop variables tests and tt and the IsResumable() method to locate where to
add the msg.

In `@pkg/config/changefeed.go`:
- Around line 113-121: The IsResumable predicate is fine but the user-facing
error string is duplicated at call sites; add a single source of truth (either a
method on FeedState like ResumableErrorMessage() or a package-level format
string constant such as ResumableErrorFmt) and update callers that currently
check IsResumable() to use that helper when constructing the error ("can only
resume changefeed when it is stopped, failed, or finished, but current state is
%s"), ensuring FeedState and IsResumable remain the canonical state logic while
the new ResumableErrorMessage/ResumableErrorFmt produces the formatted human
message so changes to resumable states automatically keep the message in sync.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3654cb8-198a-4f2c-922c-67b15d7c4d9a

📥 Commits

Reviewing files that changed from the base of the PR and between 94cd0a7 and 8b83f10.

📒 Files selected for processing (8)
  • api/v2/changefeed.go
  • api/v2/changefeed_test.go
  • cmd/cdc/cli/cli_changefeed_resume.go
  • cmd/cdc/cli/cli_changefeed_resume_test.go
  • coordinator/controller.go
  • coordinator/controller_test.go
  • pkg/config/changefeed.go
  • pkg/config/changefeed_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Running changefeed resume reports success and creates unnecessary GC guard

1 participant