Skip to content

pulsar: support debezium protocol#5054

Open
HGHNice wants to merge 3 commits into
pingcap:masterfrom
HGHNice:pulsar/support-debezium-protocol
Open

pulsar: support debezium protocol#5054
HGHNice wants to merge 3 commits into
pingcap:masterfrom
HGHNice:pulsar/support-debezium-protocol

Conversation

@HGHNice
Copy link
Copy Markdown

@HGHNice HGHNice commented May 15, 2026

What problem does this PR solve?

Issue Number: close #5056

The Pulsar sink currently only supports canal-json. Users who consume
TiCDC events via Pulsar with Debezium-compatible consumers (e.g. Flink CDC)
have no way to use the standard Debezium message format.

What is changed and how it works?

  • Extend IsPulsarSupportedProtocols() in pkg/config/sink_protocol.go
    to include ProtocolDebezium.
  • Update the error message in downstreamadapter/sink/pulsar/helper.go
    to reflect the expanded protocol list.
  • The Debezium codec is already fully implemented and shared with the Kafka
    sink via the common codec builder, so no additional encoding logic is needed.
  • Add unit test TestIsPulsarSupportedProtocols.

Check List

Tests

  • Unit test

Questions

Will it cause performance regression or break compatibility?

No. Existing canal-json behavior is unchanged.

Do you need to update user documentation, design documentation or monitoring documentation?

The Pulsar sink docs should note that debezium is now a valid protocol value.

Release note

The Pulsar sink now supports the `debezium` protocol. Set `protocol=debezium`
in the Pulsar sink URI to produce Debezium-format change events.


## Summary by CodeRabbit

## Release Notes

* **New Features**
  * Pulsar sink now supports debezium protocol in addition to canal-json protocol.

* **Tests**
  * Added validation tests for Pulsar sink protocol compatibility.

<!-- review_stack_entry_start -->

[![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/pingcap/ticdc/pull/5054)

<!-- review_stack_entry_end -->

Summary by CodeRabbit

  • New Features

    • Pulsar sink now supports Debezium protocol format in addition to Canal-JSON.
  • Tests

    • Added unit tests for protocol parsing and Pulsar-supported protocol validation.
    • Updated Debezium integration test flow to run against both Kafka and Pulsar sink types.

Extend IsPulsarSupportedProtocols to include ProtocolDebezium so that
Pulsar changefeeds can use the debezium message format. The debezium
codec is already implemented and shared with the Kafka sink via the
common codec builder, so no additional encoding logic is needed.
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. labels May 15, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 15, 2026

Hi @HGHNice. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added the needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. label May 15, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 15, 2026

Welcome @HGHNice!

It looks like this is your first PR to pingcap/ticdc 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to pingcap/ticdc. 😃

@ti-chi-bot ti-chi-bot Bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 15, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Pulsar sink now accepts debezium in addition to canal-json. Validation and tests updated, Pulsar helper error text and router initialization adjusted, and the debezium_basic integration script branches to run against kafka or pulsar sinks.

Changes

Pulsar Debezium Protocol Support

Layer / File(s) Summary
Protocol validation and unit tests
pkg/config/sink_protocol.go, pkg/config/sink_protocol_test.go
IsPulsarSupportedProtocols now returns true for ProtocolCanalJSON and ProtocolDebezium. Tests added/expanded for parsing debezium and simple, and a table-driven test verifies supported protocols.
Pulsar helper updates
downstreamadapter/sink/pulsar/helper.go
The unsupported-protocol error message now lists [canal-json, debezium]. Event router construction passes protocol == config.ProtocolAvro instead of a hard-coded flag and removes the obsolete inline comment.
Integration test script branching
tests/integration_tests/debezium_basic/run.sh
run.sh gates on SINK_TYPE (kafka or pulsar), conditionally starts Pulsar, builds the appropriate SINK_URI, creates the changefeed, and runs the matching consumer.

🎯 3 (Moderate) | ⏱️ ~20 minutes

"I’m a rabbit in the code glade, nibbling bugs away,
Canal and Debezium now hop in the same play.
Tests updated, scripts adapt, messages sing true—
A tiny hop forward, from me to you. 🐇"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'pulsar: support debezium protocol' is clear and specific, directly describing the main feature addition in the changeset.
Description check ✅ Passed The PR description follows the template with all required sections: issue number (close #5056), clear problem statement, technical changes explained, tests added (unit test checked), compatibility assessment, and release note provided.
Linked Issues check ✅ Passed All coding requirements from issue #5056 are met: IsPulsarSupportedProtocols() extended to include ProtocolDebezium, error messages updated, and unit tests added. The Debezium codec reuse satisfies the no-new-codec requirement.
Out of Scope Changes check ✅ Passed All changes are scoped to supporting Debezium protocol in Pulsar: protocol config validation, helper function updates, unit tests, and integration test migration. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pingcap-cla-assistant
Copy link
Copy Markdown

pingcap-cla-assistant Bot commented May 15, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the Debezium protocol to the Pulsar sink by updating the protocol validation logic and error messages. Feedback includes a recommendation to replace a hardcoded boolean with a dynamic protocol check in the event router, a request for additional test coverage in existing protocol parsing and string conversion tests, and a minor grammatical correction in a user-facing error message.

Comment thread downstreamadapter/sink/pulsar/helper.go
Comment thread pkg/config/sink_protocol_test.go
Comment thread downstreamadapter/sink/pulsar/helper.go
- Make isAvro parameter dynamic in NewEventRouter call
- Add debezium and simple cases to protocol parsing and string tests
@ti-chi-bot ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/needs-triage-completed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 21, 2026
@wk989898
Copy link
Copy Markdown
Collaborator

/ok-to-test

@ti-chi-bot ti-chi-bot Bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels May 29, 2026
@wk989898
Copy link
Copy Markdown
Collaborator

Please add the Debezium integration tests for Pulsar.

@HGHNice
Copy link
Copy Markdown
Author

HGHNice commented May 29, 2026

Please add the Debezium integration tests for Pulsar.

Pulsar Debezium integration tests have been added. Thank!

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 3, 2026
@wk989898
Copy link
Copy Markdown
Collaborator

wk989898 commented Jun 3, 2026

/test all

@ti-chi-bot ti-chi-bot Bot added the lgtm label Jun 3, 2026
@ti-chi-bot ti-chi-bot Bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 3, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 3, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-06-03 07:46:26.536077424 +0000 UTC m=+341287.606394814: ☑️ agreed by wk989898.
  • 2026-06-03 07:51:27.648907855 +0000 UTC m=+341588.719225245: ☑️ agreed by 3AceShowHand.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 3, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 3AceShowHand, flowbehappy, wk989898

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the approved label Jun 3, 2026
@HGHNice HGHNice force-pushed the pulsar/support-debezium-protocol branch from 4495deb to aa6dee8 Compare June 8, 2026 05:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/integration_tests/debezium_basic/run.sh (1)

13-15: Prevent silent no-op on unexpected SINK_TYPE in debezium_basic CI runs

tests/integration_tests/run.sh invokes each integration test as bash "$script" "$sink_type", and run_light_it_in_ci.sh / run_heavy_it_in_ci.sh exit on any sink_type other than mysql|kafka|pulsar|storage. In run_heavy_it_in_ci.sh, debezium_basic is scheduled only in the kafka_groups and pulsar_groups, so tests/integration_tests/debezium_basic/run.sh is only called with kafka or pulsar in CI—meaning the return branch for other values won’t be hit. Add an explicit error/skip message (instead of a bare return) if you want protection for manual/developer invocations with an unexpected argument.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration_tests/debezium_basic/run.sh` around lines 13 - 15, The
script currently does a silent `return` when SINK_TYPE is not "kafka" or
"pulsar", which can hide unexpected invocations; replace the bare `return` in
the conditional that checks the SINK_TYPE variable with an explicit message and
exit to make behavior clear (e.g., echo "Skipping debezium_basic: unsupported
SINK_TYPE='$SINK_TYPE'" and exit 0 for a deliberate skip, or exit 1 if you
prefer treating it as an error). Update the branch guarding SINK_TYPE so it
prints that message (including the value of SINK_TYPE) and exits accordingly
instead of silently returning.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration_tests/debezium_basic/run.sh`:
- Line 32: Calls to run_pulsar_cluster (and similar commands) use unquoted
variable expansions like run_pulsar_cluster $WORK_DIR normal which can break on
whitespace or globbing; update each call (referenced occurrences of
run_pulsar_cluster and any other command invocations using WORK_DIR or similar
variables at the noted locations) to quote variables (e.g. run_pulsar_cluster
"$WORK_DIR" "normal" or quote each expanded variable) so arguments are passed
safely and globbing/word-splitting is avoided.

---

Nitpick comments:
In `@tests/integration_tests/debezium_basic/run.sh`:
- Around line 13-15: The script currently does a silent `return` when SINK_TYPE
is not "kafka" or "pulsar", which can hide unexpected invocations; replace the
bare `return` in the conditional that checks the SINK_TYPE variable with an
explicit message and exit to make behavior clear (e.g., echo "Skipping
debezium_basic: unsupported SINK_TYPE='$SINK_TYPE'" and exit 0 for a deliberate
skip, or exit 1 if you prefer treating it as an error). Update the branch
guarding SINK_TYPE so it prints that message (including the value of SINK_TYPE)
and exits accordingly instead of silently returning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b729cbf1-1c49-4120-9a63-348d7c9f8095

📥 Commits

Reviewing files that changed from the base of the PR and between 4495deb and aa6dee8.

📒 Files selected for processing (1)
  • tests/integration_tests/debezium_basic/run.sh

fi

if [ "$SINK_TYPE" == "pulsar" ]; then
run_pulsar_cluster $WORK_DIR normal
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Quote command arguments expanded from variables.

Line [32], Line [36], Line [40], and Line [44] use unquoted variable expansions in command arguments. This can break on whitespace/glob characters and is easy to harden.

Suggested patch
-		run_pulsar_cluster $WORK_DIR normal
+		run_pulsar_cluster "$WORK_DIR" normal
@@
-	cdc_cli_changefeed create --sink-uri="$SINK_URI" --config=$CUR/conf/changefeed.toml
+	cdc_cli_changefeed create --sink-uri="$SINK_URI" --config="$CUR/conf/changefeed.toml"
@@
-		run_kafka_consumer $WORK_DIR $SINK_URI $CUR/conf/changefeed.toml
+		run_kafka_consumer "$WORK_DIR" "$SINK_URI" "$CUR/conf/changefeed.toml"
@@
-		run_pulsar_consumer --upstream-uri $SINK_URI --config $CUR/conf/changefeed.toml
+		run_pulsar_consumer --upstream-uri "$SINK_URI" --config "$CUR/conf/changefeed.toml"

Also applies to: 36-36, 40-40, 44-44

🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 32-32: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration_tests/debezium_basic/run.sh` at line 32, Calls to
run_pulsar_cluster (and similar commands) use unquoted variable expansions like
run_pulsar_cluster $WORK_DIR normal which can break on whitespace or globbing;
update each call (referenced occurrences of run_pulsar_cluster and any other
command invocations using WORK_DIR or similar variables at the noted locations)
to quote variables (e.g. run_pulsar_cluster "$WORK_DIR" "normal" or quote each
expanded variable) so arguments are passed safely and globbing/word-splitting is
avoided.

Source: Linters/SAST tools

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 8, 2026

@HGHNice: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cdc-kafka-integration-light aa6dee8 link unknown /test pull-cdc-kafka-integration-light
pull-cdc-pulsar-integration-heavy aa6dee8 link unknown /test pull-cdc-pulsar-integration-heavy
pull-cdc-kafka-integration-heavy aa6dee8 link unknown /test pull-cdc-kafka-integration-heavy

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. lgtm ok-to-test Indicates a PR is ready to be tested. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pulsar: support debezium protocol

4 participants