Skip to content

cmd/cdc/server: return code 0 on graceful shutdown#4564

Merged
ti-chi-bot[bot] merged 6 commits intopingcap:masterfrom
pingyu:graceful-shutdown-exit-code
Mar 24, 2026
Merged

cmd/cdc/server: return code 0 on graceful shutdown#4564
ti-chi-bot[bot] merged 6 commits intopingcap:masterfrom
pingyu:graceful-shutdown-exit-code

Conversation

@pingyu
Copy link
Copy Markdown
Contributor

@pingyu pingyu commented Mar 22, 2026

What problem does this PR solve?

Issue Number: close #4563

What is changed and how it works?

cmd/cdc/server currently logs a graceful shutdown as normal, but it still
returns context.Canceled from svr.Run. That makes the CLI exit with code 1
even when shutdown was triggered intentionally by SIGTERM.

This PR:

  • adds isNormalServerShutdown to classify nil, context.Canceled, and
    wrapped context.Canceled as normal shutdown;
  • uses the helper for both shutdown logging and the final return path in
    options.run;
  • returns nil for normal shutdown so cdc server exits with code 0;
  • adds unit coverage for the helper.

Check List

Tests

  • Unit test
    • go test ./cmd/cdc/server/...

Questions

Will it cause performance regression or break compatibility?

No. The change only affects error classification on the shutdown path and does
not change steady-state behavior or compatibility.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

Fix a bug where TiCDC could exit with code 1 during graceful SIGTERM shutdown.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved graceful shutdown handling to distinguish between normal and abnormal server exits
    • Refined timeout behavior to prevent error logging on normal shutdown scenarios
  • Refactor

    • Simplified server close interface to better decouple shutdown operations

Signed-off-by: Ping Yu <yuping@pingcap.com>
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 22, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-triage-completed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 22, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 22, 2026

📝 Walkthrough

Walkthrough

The changes implement proper exit status handling for graceful server shutdown by introducing a function to classify shutdowns as "normal" (nil error or context canceled), decoupling the server close operation from the potentially canceled run context, and suppressing error returns for normal exits to ensure exit code 0 on graceful termination.

Changes

Cohort / File(s) Summary
Server shutdown exit classification
cmd/cdc/server/server.go
Introduced isNormalServerShutdown(err, ctx) function to determine if a server termination should be treated as normal. Updated (*options).run to use this function for exit classification, changed svr.Close(ctx) to svr.Close() to decouple shutdown from the run context, and added logic to return nil on normal exits and suppress error propagation.
Shutdown logic tests
cmd/cdc/server/server_test.go
Added TestIsNormalServerShutdown test function covering multiple scenarios: nil errors, unwrapped context.Canceled, canceled contexts, wrapped cancellation errors via cerror.Trace, and unrelated wrapped errors.
Server interface
pkg/server/server.go
Updated Server interface method signature from Close(ctx context.Context) to Close(), removing the context parameter from the public interface contract.
Server implementation
server/server.go
Updated (*server).Close method signature to match interface change from Close(ctx context.Context) to Close(). Modified internal implementation to use a new closeCtx derived from context.Background() with GracefulShutdownTimeout instead of accepting caller-provided context, affecting module shutdown calls and etcd metadata deletion context propagation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 When signals knock with graceful cheer,
The rabbit closes without fear,
No spurious errors cloud the day,
Exit code zero shows the way!
Hop, hop, hooray!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: ensuring graceful shutdown returns exit code 0 instead of 1.
Description check ✅ Passed The description includes issue number, clear explanation of the problem, what changed, test information, and release notes as required by the template.
Linked Issues check ✅ Passed The PR fully addresses issue #4563 by classifying graceful shutdowns as normal, returning nil for normal shutdown, and ensuring exit code 0 when SIGTERM triggers shutdown.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the graceful shutdown exit code issue. The Server.Close signature change is a necessary refactoring supporting the main objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pingcap-cla-assistant
Copy link
Copy Markdown

pingcap-cla-assistant bot commented Mar 22, 2026

CLA assistant check
All committers have signed the CLA.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where the cdc server command would incorrectly exit with a non-zero status code (1) during a graceful shutdown, such as when a SIGTERM signal is received. The changes introduce a new utility function to precisely differentiate between genuine errors and expected shutdown signals like context.Canceled, thereby ensuring the server terminates with a success code (0) in these normal operational scenarios.

Highlights

  • Error Classification for Shutdown: Introduced an isNormalServerShutdown helper function to accurately classify nil, context.Canceled, and wrapped context.Canceled errors as normal server shutdown conditions.
  • Consistent Shutdown Logging: Modified the server's run method to utilize the new isNormalServerShutdown helper, ensuring that graceful shutdowns are consistently logged as normal exits rather than errors.
  • Correct Exit Code on Graceful Shutdown: Ensured that the cdc server command returns nil (resulting in an exit code 0) when a normal shutdown occurs, preventing the CLI from exiting with an erroneous code 1.
  • Unit Test Coverage: Added comprehensive unit tests for the isNormalServerShutdown helper function to validate its behavior across various error scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ti-chi-bot ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 22, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses the issue of the server exiting with a non-zero status code on graceful shutdown. The introduction of the isNormalServerShutdown helper function centralizes the logic for identifying a normal shutdown, making the code cleaner and easier to understand. The changes are well-tested with new unit tests covering various shutdown scenarios. I have one suggestion to make the implementation more idiomatic.

pingyu and others added 2 commits March 22, 2026 12:50
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ping Yu <yuping@pingcap.com>
@pingyu pingyu marked this pull request as ready for review March 22, 2026 09:36
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/cdc/server/server.go`:
- Around line 160-167: The shutdown goroutine currently calls
svr.Close(context.Background()), which gives downstream Close calls no deadline;
change it to create a timeout context derived from GracefulShutdownTimeout
(e.g., ctx, cancel := context.WithTimeout(context.Background(),
GracefulShutdownTimeout)) and pass that ctx into svr.Close, ensuring you call
cancel() (defer or after Close) so resources are released; keep the existing
goroutine and select on ch/ticker as-is but ensure the timeout context is used
when invoking svr.Close to propagate the shutdown budget to all submodule Close
methods.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 637fb1d3-a012-4c6e-b3df-b51e2933861e

📥 Commits

Reviewing files that changed from the base of the PR and between e8de3a6 and 3d8950f.

📒 Files selected for processing (2)
  • cmd/cdc/server/server.go
  • cmd/cdc/server/server_test.go

pingyu added 2 commits March 22, 2026 17:54
Signed-off-by: Ping Yu <yuping@pingcap.com>
Signed-off-by: Ping Yu <yuping@pingcap.com>
defer ticker.Stop()
go func() {
svr.Close(ctx)
svr.Close(context.Background())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ctx parameter should be removed

@wk989898
Copy link
Copy Markdown
Collaborator

/test all

Signed-off-by: Ping Yu <yuping@pingcap.com>
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/needs-triage-completed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
server/server.go (1)

472-473: Nested timeout is bounded by parent context.

timeoutCtx inherits from closeCtx, so the effective timeout for etcd cleanup is min(cleanMetaDuration, remaining time in closeCtx). If module closures consume most of the 30-second budget, etcd cleanup may get less than its intended 10 seconds.

This is acceptable behavior for bounding total shutdown time, but worth noting that cleanMetaDuration is a soft limit rather than a guarantee.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/server.go` around lines 472 - 473, The current timeoutCtx is created
from closeCtx (timeoutCtx, timeoutCancel := context.WithTimeout(closeCtx,
cleanMetaDuration)), which makes its deadline the min of cleanMetaDuration and
remaining closeCtx time; to guarantee etcd cleanup gets the full
cleanMetaDuration regardless of how long prior module shutdown took, create the
timeout from a fresh background context (use
context.WithTimeout(context.Background(), cleanMetaDuration)) and use that for
the etcd cleanup call(s) while still calling timeoutCancel; update references to
timeoutCtx/timeoutCancel accordingly in the etcd cleanup path so it uses the
independent deadline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@server/server.go`:
- Around line 472-473: The current timeoutCtx is created from closeCtx
(timeoutCtx, timeoutCancel := context.WithTimeout(closeCtx, cleanMetaDuration)),
which makes its deadline the min of cleanMetaDuration and remaining closeCtx
time; to guarantee etcd cleanup gets the full cleanMetaDuration regardless of
how long prior module shutdown took, create the timeout from a fresh background
context (use context.WithTimeout(context.Background(), cleanMetaDuration)) and
use that for the etcd cleanup call(s) while still calling timeoutCancel; update
references to timeoutCtx/timeoutCancel accordingly in the etcd cleanup path so
it uses the independent deadline.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 26ef4904-87a3-4d44-8bb7-dd63b5a5c5a8

📥 Commits

Reviewing files that changed from the base of the PR and between c651442 and 86efaf9.

📒 Files selected for processing (3)
  • cmd/cdc/server/server.go
  • pkg/server/server.go
  • server/server.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/cdc/server/server.go

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 23, 2026
@ti-chi-bot ti-chi-bot bot added the lgtm label Mar 24, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 24, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flowbehappy, wk989898

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [flowbehappy,wk989898]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Mar 24, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 24, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-03-23 07:18:05.362682414 +0000 UTC m=+166281.398752675: ☑️ agreed by wk989898.
  • 2026-03-24 01:52:53.005676071 +0000 UTC m=+233169.041746321: ☑️ agreed by flowbehappy.

@ti-chi-bot ti-chi-bot bot merged commit f08ab83 into pingcap:master Mar 24, 2026
26 checks passed
pingyu added a commit to pingyu/ticdc that referenced this pull request Mar 24, 2026
@pingyu pingyu mentioned this pull request Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cmd/cdc/server: exits with code 1 on graceful SIGTERM shutdown

3 participants