Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cdc): share a changelog stream for multiple cdc tables #12535

Merged
merged 69 commits into from
Oct 31, 2023

Conversation

StrikeW
Copy link
Contributor

@StrikeW StrikeW commented Sep 26, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Part of #11079, related: #11545

Interface

  1. create a CDC Source job
create source mysql_mydb with (
 connector = 'mysql-cdc',
 hostname = '127.0.0.1',
 port = '8306',
 username = 'root',
 password = '123456',
 database.name = 'mydb',
 server.id = 5888
);
  • The source job is singleton
  • The format is fixed to FORMAT PLAIN ENCODE JSON so it can be eliminated.
  1. create CDC Tables based on the above source
CREATE TABLE t1_rw (
    v1 int,
    v2 int,
    PRIMARY KEY(v1)
) FROM mysql_mydb TABLE 'mydb.t1';
CREATE TABLE t3_rw (
  v1 INTEGER,
  v2 timestamptz,
  PRIMARY KEY (v1)
) FROM mysql_mydb TABLE 'mydb.t3';
  • The parallelism of these table jobs is set to 1
  • The table name behind the second TABLE keyword should be qualified with the database name in the MySQL.

Implementation

  • Add a new Source type streaming job to support the cdc source job
  • The output schema of the cdc source job is fixed to (payload jsonb, _rw_offset varchar, _rw_table_name varchar, _row_id ).
  • The dispatcher of the cdc source job will dispatch cdc event chunks to downstream table jobs based on the _rw_table_name column
  • The CdcBackfillExecutor will transform the upstream chunk to the specific schema of a table job (e.g. (v1 int, v2 int)) before doing backfill
  • Right now the backfill state doesn't persist in this case, so it will trigger the backfill again upon recovery
  • Here is a figure to demonstrate the workflow:
    cdc-2023-09-18-1535

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

  • Allows users to create multiple CDC tables that share a same Source subscribing to a upstream database.
  • The new syntax is described in the PR description.
  • User needs to enable this feature by set cdc_backfill='true'

but lacks of internal tables in `show internal tables`
@StrikeW StrikeW force-pushed the siyuan/cdc-backfill-framework-multi-table branch from 21e059a to 15b6f67 Compare October 31, 2023 07:34
Copy link
Contributor Author

@StrikeW StrikeW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine now. 😋

version,
}))
}
}
}

async fn drop_source(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the drop_relation() to support dropping the source streaming job if necessary.
https://github.com/risingwavelabs/risingwave/pull/12535/files#diff-1da683135f143ab95455c5fee12eca2dccaafe7d90b9e2cbc8c0ba330204659fR1263-R1265

proto/plan_common.proto Outdated Show resolved Hide resolved
src/frontend/src/handler/create_table.rs Outdated Show resolved Hide resolved
src/frontend/src/handler/create_table.rs Outdated Show resolved Hide resolved
src/stream/src/executor/utils.rs Outdated Show resolved Hide resolved
Comment on lines 40 to 42
if node.table_desc.is_none() {
return Ok(Box::new(DummyExecutor::new()));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still in use:

assert_matches!(batch_plan_node.node_body, Some(NodeBody::BatchPlan(_)));

src/meta/src/manager/streaming_job.rs Outdated Show resolved Hide resolved
.map(|c| {
all_column_ids
let (up_fragment_id, edge) = match table_job_type.as_ref() {
Some(TableJobType::SharedCdcSource) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refine the dispatcher part later, I have fired an issue.

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Please kindly check these suggestions before merging:

#12535 (comment)
#12535 (comment)

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Not dived into the executor implementation. Mainly focused on other parts.

src/frontend/src/handler/create_source.rs Outdated Show resolved Hide resolved
@StrikeW StrikeW force-pushed the siyuan/cdc-backfill-framework-multi-table branch from 1ddfb3b to 45eaf03 Compare October 31, 2023 09:39
@StrikeW StrikeW added this pull request to the merge queue Oct 31, 2023
Merged via the queue into main with commit c97e08c Oct 31, 2023
26 of 29 checks passed
@StrikeW StrikeW deleted the siyuan/cdc-backfill-framework-multi-table branch October 31, 2023 11:19
@neverchanje neverchanje added the user-facing-changes Contains changes that are visible to users label Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants