Skip to content

kafka(ticdc): close sarama clients on init failures (#12573)#12592

Open
ti-chi-bot wants to merge 1 commit intopingcap:release-7.5from
ti-chi-bot:cherry-pick-12573-to-release-7.5
Open

kafka(ticdc): close sarama clients on init failures (#12573)#12592
ti-chi-bot wants to merge 1 commit intopingcap:release-7.5from
ti-chi-bot:cherry-pick-12573-to-release-7.5

Conversation

@ti-chi-bot
Copy link
Copy Markdown
Member

This is an automated cherry-pick of #12573

What problem does this PR solve?

Issue Number: close #12572

Kafka admin / producer initialization paths in pkg/sink/kafka can return while still leaving the raw Sarama client alive. Repeated retry / rebuild loops may accumulate background metadata updaters, broker connections, and related resources.

The normal wrapper close paths also do not always release the owned client:

  • saramaAdminClient.Close only closes the admin handle
  • saramaSyncProducer.Close only closes the producer

What is changed and how it works?

  • close the raw sarama.Client when admin creation from client fails
  • close the raw sarama.Client when sync producer creation from client fails
  • close the raw sarama.Client when async producer creation from client fails
  • explicitly close the owned client in saramaAdminClient.Close
  • explicitly close the owned client in saramaSyncProducer.Close
  • add focused regression tests for admin/sync/async init-failure cleanup and for wrapper close behavior

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)

Unit test:

  • go test ./pkg/sink/kafka -count=1

Questions

Will it cause performance regression or break compatibility?

No. The change only tightens resource cleanup on Kafka init / close paths and does not change normal successful send semantics.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

Release leaked Kafka Sarama clients on init-failure and wrapper-close paths.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. type/cherry-pick-for-release-7.5 This PR is cherry-picked to release-7.5 from a source PR. labels Apr 8, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 8, 2026

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

  1. It must be LGTMed and approved by the reviewers firstly.
  2. For pull requests to TiDB-x branches, it must have no failed tests.
  3. AFTER it has lgtm and approved labels, please wait for the cherry-pick merging approval from triage owners.
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown
Member Author

@wlwilliamx This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 8, 2026

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ensures Sarama clients are properly closed during initialization failures and standard cleanup, while also adding constructor seams and unit tests for heartbeat and failure scenarios. However, the code contains multiple unresolved git merge conflict markers across several files that will cause compilation errors. Additionally, the use of global variables for constructor seams introduces race conditions in parallel tests, which should be addressed to ensure deterministic test results.

Comment on lines +55 to +62
<<<<<<< HEAD
// Close shuts down the admin client.
=======
// HeartbeatBroker sends a heartbeat to all brokers to keep the kafka connection alive.
HeartbeatBrokers()

// Close shuts down the admin client and releases any owned underlying client connections.
>>>>>>> 9fbde6ebeb (kafka(ticdc): close sarama clients on init failures (#12573))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The file contains unresolved git merge conflict markers. This will cause compilation errors and must be resolved before merging.

	// HeartbeatBrokers sends a heartbeat to all brokers to keep the kafka connection alive.
	HeartbeatBrokers()

	// Close shuts down the admin client and releases any owned underlying client connections.
	Close()

Comment on lines +116 to +121
<<<<<<< HEAD

p, err := sarama.NewSyncProducerFromClient(client)
=======
p, err := newSaramaSyncProducerFromClientImpl(client)
>>>>>>> 9fbde6ebeb (kafka(ticdc): close sarama clients on init failures (#12573))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Unresolved merge conflict markers. The resolution should use the mockable newSaramaSyncProducerFromClientImpl constructor to ensure the new unit tests can inject failures correctly.

	p, err := newSaramaSyncProducerFromClientImpl(client)

Comment on lines +18 to +22
<<<<<<< HEAD
=======
stdErrors "errors"
"sync"
>>>>>>> 9fbde6ebeb (kafka(ticdc): close sarama clients on init failures (#12573))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Unresolved merge conflict markers in the import block.

	stdErrors "errors"
	"sync"

Comment on lines +109 to +110
<<<<<<< HEAD
=======
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Unresolved merge conflict markers. These must be removed to allow the new tests to be included in the build.

require.Equal(t, 1, client.closeCalls)
require.True(t, client.closed)
}
>>>>>>> 9fbde6ebeb (kafka(ticdc): close sarama clients on init failures (#12573))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Unresolved merge conflict marker at the end of the file.

Comment on lines +32 to +35
newSaramaClientImpl = sarama.NewClient
newSaramaClusterAdminFromClientImpl = sarama.NewClusterAdminFromClient
newSaramaSyncProducerFromClientImpl = sarama.NewSyncProducerFromClient
newSaramaAsyncProducerFromClientImpl = sarama.NewAsyncProducerFromClient
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using global variables for constructor seams introduces a race condition in tests. Since multiple tests in this package (e.g., TestSyncProducer, TestAsyncProducer) are marked with t.Parallel(), they may run concurrently with tests that modify these global variables (like TestSaramaFactoryAdminClientClosesClientOnAdminInitFailure), leading to non-deterministic failures. Consider passing these creators as dependencies to the factory or using a thread-safe mocking approach.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 8, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wlwilliamx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Apr 8, 2026
@wlwilliamx
Copy link
Copy Markdown
Contributor

/retest

1 similar comment
@wlwilliamx
Copy link
Copy Markdown
Contributor

/retest

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 8, 2026

@ti-chi-bot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-dm-integration-test 69dd742 link true /test pull-dm-integration-test
pull-cdc-integration-pulsar-test 69dd742 link true /test pull-cdc-integration-pulsar-test
pull-cdc-integration-storage-test 69dd742 link true /test pull-cdc-integration-storage-test
pull-cdc-integration-mysql-test 69dd742 link true /test pull-cdc-integration-mysql-test
pull-cdc-integration-kafka-test 69dd742 link true /test pull-cdc-integration-kafka-test
pull-verify 69dd742 link true /test pull-verify
pull-dm-compatibility-test 69dd742 link true /test pull-dm-compatibility-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved do-not-merge/cherry-pick-not-approved do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. type/cherry-pick-for-release-7.5 This PR is cherry-picked to release-7.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants