dm/loader/syncer: support extend column #3262

yufan022 · 2021-11-04T03:19:10Z

What problem does this PR solve?
migrate from pingcap/dm#2178
origin issue: https://github.com/pingcap/dm/issues/2111
relation pr: pingcap/tidb-tools#520

What is changed and how it works?
support extract table/schema/source names to specify column.

Load:
reuse reassemble function to rewrite sql.
https://github.com/yufan022/ticdc/blob/da32d489044dcbd1400a35a8c4e162802872b43a/dm/loader/convert_data.go#L185

Sync:
add extract values to rows before handle binlog event.
https://github.com/yufan022/ticdc/blob/da32d489044dcbd1400a35a8c4e162802872b43a/dm/syncer/syncer.go#L2032

Check List

Tests

Unit test
Integration test

Code changes

Has exported function/method change
Has exported variable/fields change
Has interface methods change
Has persistent data change

Side effects

Possible performance regression

Related changes

Need to cherry-pick to the release branch
Need to update the documentation

Release note

Support extracting source/schema/table name to specific column

ti-chi-bot · 2021-11-04T03:19:11Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

Ehco1996
GMHDBJD

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

CLAassistant · 2021-11-04T03:19:21Z

All committers have signed the CLA.

ti-chi-bot · 2021-11-04T03:19:22Z

Welcome @yufan022!

It looks like this is your first PR to pingcap/ticdc 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to pingcap/ticdc. 😃

yufan022 · 2021-11-04T03:52:32Z

For integration tests, should I create a new folder in dm/tests or just add it to some existing folders.

Ehco1996 · 2021-11-04T05:17:13Z

For integration tests, should I create a new folder in dm/tests or just add it to some existing folders.

i think this feature would be helpful in downstream_more_column case , so could please add a new sub test case here?

yufan022 · 2021-11-04T11:41:34Z

I added it to a new folder so that the logic would be clearer

dm/tests/extend_column/run.sh

Ehco1996

thanks for you work

some tips:

you can use make check make fmt make tidy to fmt your code, so that lint CI would be pass
use go get -u github.com/pingcap/tidb-tools@master to update tidb-tools version and this can make r.FetchExtendColumn really work
new added extend_column it test could be added to dm/tests/others_integration_2.txt and this test will running in CI

dm/tests/extend_column/conf/dm-task.yaml

dm/tests/extend_column/data/db1.prepare.sql

dm/tests/extend_column/run.sh

dm/tests/extend_column/conf/dm-task.yaml

dm/syncer/syncer.go

dm/loader/loader.go

dm/loader/convert_data.go

dm/syncer/syncer.go

yufan022 · 2021-11-08T05:09:27Z


[2021-11-08T04:07:04.141Z] go: downloading github.com/pingcap/tidb-tools v5.2.3-0.20211105044302-2dabb6641a6e+incompatible

[2021-11-08T04:07:19.035Z] # github.com/pingcap/tidb-tools/pkg/utils

[2021-11-08T04:07:19.035Z] open /nfs/cache/mod/github.com/pingcap/tidb-tools@v5.2.3-0.20211105044302-2dabb6641a6e+incompatible/pkg/utils/cpu_posix.go: no such file or directory

[2021-11-08T04:07:19.035Z] # github.com/pingcap/tidb-tools/pkg/etcd

[2021-11-08T04:07:19.035Z] open /nfs/cache/mod/github.com/pingcap/tidb-tools@v5.2.3-0.20211105044302-2dabb6641a6e+incompatible/pkg/etcd/etcd.go: no such file or directory

[2021-11-08T04:07:23.240Z] /usr/local/go/pkg/tool/linux_amd64/vet: chdir /nfs/cache/mod/github.com/pingcap/tidb-tools@v5.2.3-0.20211105044302-2dabb6641a6e+incompatible/pkg/table-filter: no such file or directory

[2021-11-08T04:07:23.240Z] /usr/local/go/pkg/tool/linux_amd64/vet: chdir /nfs/cache/mod/github.com/pingcap/tidb-tools@v5.2.3-0.20211105044302-2dabb6641a6e+incompatible/pkg/table-rule-selector: no such file or directory

[2021-11-08T04:07:23.240Z] /usr/local/go/pkg/tool/linux_amd64/vet: chdir /nfs/cache/mod/github.com/pingcap/tidb-tools@v5.2.3-0.20211105044302-2dabb6641a6e+incompatible/pkg/filter: no such file or directory

Ehco1996 · 2021-11-08T05:10:54Z

/run-dm-integration-tests

Ehco1996 · 2021-11-08T05:27:45Z


[2021-11-08T04:07:04.141Z] go: downloading github.com/pingcap/tidb-tools v5.2.3-0.20211105044302-2dabb6641a6e+incompatible

[2021-11-08T04:07:19.035Z] # github.com/pingcap/tidb-tools/pkg/utils

[2021-11-08T04:07:19.035Z] open /nfs/cache/mod/github.com/pingcap/tidb-tools@v5.2.3-0.20211105044302-2dabb6641a6e+incompatible/pkg/utils/cpu_posix.go: no such file or directory

[2021-11-08T04:07:19.035Z] # github.com/pingcap/tidb-tools/pkg/etcd

[2021-11-08T04:07:19.035Z] open /nfs/cache/mod/github.com/pingcap/tidb-tools@v5.2.3-0.20211105044302-2dabb6641a6e+incompatible/pkg/etcd/etcd.go: no such file or directory

[2021-11-08T04:07:23.240Z] /usr/local/go/pkg/tool/linux_amd64/vet: chdir /nfs/cache/mod/github.com/pingcap/tidb-tools@v5.2.3-0.20211105044302-2dabb6641a6e+incompatible/pkg/table-filter: no such file or directory

[2021-11-08T04:07:23.240Z] /usr/local/go/pkg/tool/linux_amd64/vet: chdir /nfs/cache/mod/github.com/pingcap/tidb-tools@v5.2.3-0.20211105044302-2dabb6641a6e+incompatible/pkg/table-rule-selector: no such file or directory

[2021-11-08T04:07:23.240Z] /usr/local/go/pkg/tool/linux_amd64/vet: chdir /nfs/cache/mod/github.com/pingcap/tidb-tools@v5.2.3-0.20211105044302-2dabb6641a6e+incompatible/pkg/filter: no such file or directory

@yufan022 this may caused by nfs cache in CI, some retry could help, build is successed in here

Ehco1996 · 2021-11-08T07:26:35Z

/run-dm-integration-tests

Ehco1996

LGTM when rest comments are resolved

dm/loader/loader.go

dm/loader/convert_data_test.go

lance6716

the discussion of https://github.com/pingcap/ticdc/pull/3262/files#r744353579 is a bit long, I'll list the unsettled problems here, please help check if I miss somthing

what's the meaning of "originalData"
how to try to eliminate the error of "column vs value mismatch"
what's the proper way to append extCols to WHERE clause

my opinion is

1 & 2: the problem to me is there's currently one version of upstream table structure / upstream binlog values in DM, but we now have two versions of with and without extended columns. Should we also let DM maintain two versions, or we can discard one of the two versions? I prefer we can discuss it in a separate issue and implement the perfect version in later PRs. Would you like to lead this feature with our help? @yufan022
3: currently WHERE clause tries to read downstream PK/UK to distinguish a row. If there exists a PK/UK, since it's unqiue we don't need to attach extCols. If there's no valid PK/UK which means all columns of PK/UK can be found in binlog, we should attach extCols to do a full column matching. Then it's related to the above question 😂

anyway, I think we can keep this PR lightweight and merge it as long as it's self-consistent. If this feature has no effects when it's not enabled, it's no harm for other user.

lance6716 · 2021-11-10T03:10:25Z

dm/loader/convert_data.go

+	if len(extendCol) > 0 {
+		columns = append(columns, extendCol...)
+	}
+	if hasGeneragedCols || len(extendCol) > 0 {


(To make sure I didn't miss something) why we switch to "INSERT with column names" for extended columns?

For generated columns, like tbl (c, c2) when c2 is generated, we cann't use INSERT INTO t VALUES (1) so we must write column names.

The answer may be that the extended columns are not the same order as we concatenate INSERT DMLs, so we must specify the column names?

Oh, It's not necessary.

Append extend column in history commits at here, It has been moved to the top, I'll remove this condition.

Currently the extended column must be at the end of all columns and there are requirements for the order.

dm/syncer/syncer.go

Ehco1996 · 2021-11-15T00:04:04Z

ping @yufan022

yufan022 · 2021-11-15T08:59:53Z

1 & 2: the problem to me is there's currently one version of upstream table structure / upstream binlog values in DM, but we now have two versions of with and without extended columns. Should we also let DM maintain two versions, or we can discard one of the two versions? I prefer we can discuss it in a separate issue and implement the perfect version in later PRs. Would you like to lead this feature with our help? @yufan022

I would be happy to work on improving this feature.

3: currently WHERE clause tries to read downstream PK/UK to distinguish a row. If there exists a PK/UK, since it's unqiue we don't need to attach extCols. If there's no valid PK/UK which means all columns of PK/UK can be found in binlog, we should attach extCols to do a full column matching. Then it's related to the above question 😂

Yep.

The current table structure comes from the downstream pre-created, so it always comes with extended columns, my idea is that we just add value to ensure that it is equal to the table.column.

yufan022 · 2021-11-15T09:11:14Z

@Ehco1996 Hi, do we have to make sure that the value here is the originData?

Ehco1996 · 2021-11-15T09:25:47Z

@Ehco1996 Hi, do we have to make sure that the value is the originData?

you mean originOldValues or originValues? i don't think we need to make sure this , those fileds is used to distinguish a row and if extend clols can do this job, we need not to keep originData is the data from binlog

yufan022 · 2021-11-15T10:05:19Z

@Ehco1996 Hi, do we have to make sure that the value is the originData?

you mean originOldValues or originValues? i don't think we need to make sure this , those fileds is used to distinguish a row and if extend clols can do this job, we need not to keep originData is the data from binlog

I think if we can append extValues to values/originOldValues/originValues before newDML, then we don't need to change the logic inside genSQL/whereColumnsAndValues.

Ehco1996 · 2021-11-15T10:13:10Z

@Ehco1996 Hi, do we have to make sure that the value is the originData?

you mean originOldValues or originValues? i don't think we need to make sure this , those fileds is used to distinguish a row and if extend clols can do this job, we need not to keep originData is the data from binlog

I think if we can append extValues to values/originOldValues/originValues before newDML, then we don't need to change the logic inside genSQL/whereColumnsAndValues.

LGTM please ptal @lance6716 @GMHDBJD

Ehco1996 · 2021-11-19T00:08:59Z

/run-dm-integration-tests

Ehco1996 · 2021-11-19T01:24:51Z

extend_column it-test faild 😂


[2021-11-19T00:27:35.433Z] dmctl test cmd: "start-task /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/ticdc/dm/tests/extend_column/conf/dm-task.yaml --remove-meta"

[2021-11-19T00:27:37.342Z] run tidb sql failed 1-th time, retry later

[2021-11-19T00:27:39.880Z] run tidb sql failed 1-th time, retry later

[2021-11-19T00:27:41.788Z] run tidb sql failed 2-th time, retry later

[2021-11-19T00:27:43.698Z] run tidb sql failed 3-th time, retry later

[2021-11-19T00:27:45.605Z] run tidb sql failed 4-th time, retry later

[2021-11-19T00:27:47.510Z] run tidb sql failed 5-th time, retry later

[2021-11-19T00:27:49.416Z] run tidb sql failed 6-th time, retry later

[2021-11-19T00:27:51.953Z] run tidb sql failed 7-th time, retry later

[2021-11-19T00:27:53.860Z] run tidb sql failed 8-th time, retry later

[2021-11-19T00:27:55.766Z] run tidb sql failed 9-th time, retry later

[2021-11-19T00:27:57.736Z] run tidb sql failed 10-th time, retry later

[2021-11-19T00:27:59.669Z] TEST FAILED: OUTPUT DOES NOT CONTAIN 'count(1): 6'

[2021-11-19T00:27:59.669Z] ____________________________________

[2021-11-19T00:27:59.669Z] [Fri Nov 19 08:27:57 CST 2021] Executing SQL: select count(1) from extend_column.y;

[2021-11-19T00:27:59.669Z] *************************** 1. row ***************************

[2021-11-19T00:27:59.669Z] count(1): 4

[2021-11-19T00:27:59.669Z] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ptal @yufan022

yufan022 · 2021-11-19T04:32:22Z

extend_column it-test faild 😂
...
ptal @yufan022

It looks like s.getTableInfo behavior is inconsistent after I merge the master branch.

yufan022 · 2021-11-19T05:02:30Z

It's caused by PR:3295, starting the load+increment task for the first time will read the table structure from the dump file, but the dump file does not contain extension columns.

Before this PR, table structure always load from downstream.

The potential solution is to parse the dump file and skip it if the table has extended columns enabled.
@Ehco1996 @lance6716 PTAL

yufan022 · 2021-11-19T08:13:05Z

/run-dm-integration-tests

Ehco1996

The potential solution is to parse the dump file and skip it if the table has extended columns enabled.

this is lgtm and it test ci is passed now i think this pr is ready for merge 🎉

GMHDBJD

I think

create table in downstream(May be automated in the future)
fetch table from upstream
fill the columns and datas for exta columns

is better than

create table in downstream
fetch downstream table when task start
fill the data for ext rows

GMHDBJD · 2021-11-22T05:24:05Z

dm/loader/convert_data.go

-
+	if len(table.extendCol) > 0 {
+		for _, v := range table.extendVal {
+			row = append(row, "'"+v+"'")


Maybe escape the values will be better https://github.com/pingcap/tidb/blob/4820f27f9dff7fdb54de310705254c54c9044fb9/dumpling/export/sql_type.go#L69
It's ok not to change

GMHDBJD · 2021-11-22T05:49:30Z

dm/syncer/syncer.go

+	rows := originRows
+	if extRows != nil {
+		rows = extRows
+	}
+
 	prunedColumns, prunedRows, err := pruneGeneratedColumnDML(tableInfo, rows)


This may cause a ErrSyncerUnitDMLPruneColumnMismatch error

Hi, since ti.Columns contains extend columns, extRows and ti.Columns are the same length.

Ok, at that time I thought the table structure was init from dump file. But I still prefer the tableinfo should be same as upstream. cc @lance6716

If the table contains extended columns, it skips reading the dump file. 27424e8

I left some comments here about why use the downstream table structure. PTAL

Ehco1996 · 2021-11-22T07:32:34Z

/run-dm-integration-tests

yufan022 · 2021-11-22T08:53:09Z

fetch table from upstream

There are two ways to get the table structure from upstream in dm (if my understanding is correct):

If task model is load+increment, the upstream table structure will be loaded from the dump file at startup.
If task model is only increment, we need use dmctl operate-schema set to set upstream table structure when the upstream and downstream table structures are different.

In the case of the second, if we had thousands of upstream tables, we would need to call dmctl operate-schema set for each table, which would require too many operations. so, I used the downstream table structure directly. In fact, I didn't change the logic for getTableInfo, the logic is already there.

Please point out if my understanding is incorrect.

GMHDBJD

LGTM

Ehco1996 · 2021-11-22T23:49:57Z

@lance6716 do you have more suggestions? if not i will merge this PR

lance6716 · 2021-11-23T00:20:56Z

@lance6716 do you have more suggestions? if not i will merge this PR

lgtm

Ehco1996 · 2021-11-23T00:21:42Z

/merge

ti-chi-bot · 2021-11-23T00:21:45Z

This pull request has been accepted and is ready to merge.

Commit hash: 8036fc4

Ehco1996 · 2021-11-23T01:38:11Z

/run-dm-integration-tests

ti-chi-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 4, 2021

yufan022 mentioned this pull request Nov 4, 2021

loader, syncer: support table transfer column pingcap/dm#2178

Closed

Ehco1996 self-requested a review November 4, 2021 05:18

Ehco1996 added the area/dm Issues or PRs related to DM. label Nov 4, 2021

yufan022 commented Nov 5, 2021

View reviewed changes

dm/tests/extend_column/run.sh Show resolved Hide resolved

Ehco1996 reviewed Nov 8, 2021

View reviewed changes

ti-chi-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 8, 2021

Ehco1996 mentioned this pull request Nov 9, 2021

support extend colum with lightning-loader #3339

Closed

Ehco1996 reviewed Nov 9, 2021

View reviewed changes

dm/loader/loader.go Show resolved Hide resolved

dm/loader/convert_data_test.go Outdated Show resolved Hide resolved

lance6716 reviewed Nov 10, 2021

View reviewed changes

yufan022 added 2 commits November 18, 2021 14:50

improve extend data

c62aaab

fix unit_test

43456cc

improve originRows/extRows

8276e94

skip read dump for extended columns

27424e8

Ehco1996 approved these changes Nov 19, 2021

View reviewed changes

ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 19, 2021

GMHDBJD reviewed Nov 22, 2021

View reviewed changes

Improved integration test of extend and generate column

8036fc4

GMHDBJD approved these changes Nov 22, 2021

View reviewed changes

ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 22, 2021

ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Nov 23, 2021

Merge branch 'master' into dm_2111

840b78e

ti-chi-bot merged commit 00ea942 into pingcap:master Nov 23, 2021

okJiang pushed a commit to okJiang/tiflow that referenced this pull request Dec 8, 2021

dm/loader/syncer: support extend column (pingcap#3262)

fe168f8

glorv mentioned this pull request Jan 27, 2022

lightning: tracking issue for integrate lightning into dm pingcap/tidb#32010

Closed

2 tasks

Frank945946 mentioned this pull request Sep 6, 2022

add 6.3.0 release notes pingcap/docs-cn#11115

Merged

13 tasks

lichunzhu mentioned this pull request Sep 14, 2022

Distinguish data source when merge shared tables with no shared key #3340

Closed

dm/loader/syncer: support extend column #3262

dm/loader/syncer: support extend column #3262

Conversation

yufan022 commented Nov 4, 2021 • edited

Check List

Release note

ti-chi-bot commented Nov 4, 2021 • edited

CLAassistant commented Nov 4, 2021 • edited

ti-chi-bot commented Nov 4, 2021

yufan022 commented Nov 4, 2021

Ehco1996 commented Nov 4, 2021

yufan022 commented Nov 4, 2021

Ehco1996 left a comment • edited

Choose a reason for hiding this comment

yufan022 commented Nov 8, 2021

Ehco1996 commented Nov 8, 2021

Ehco1996 commented Nov 8, 2021 • edited

Ehco1996 commented Nov 8, 2021

Ehco1996 left a comment

Choose a reason for hiding this comment

lance6716 left a comment

Choose a reason for hiding this comment

lance6716 Nov 10, 2021

Choose a reason for hiding this comment

yufan022 Nov 16, 2021

Choose a reason for hiding this comment

Ehco1996 commented Nov 15, 2021

yufan022 commented Nov 15, 2021

yufan022 commented Nov 15, 2021

Ehco1996 commented Nov 15, 2021

yufan022 commented Nov 15, 2021 • edited

Ehco1996 commented Nov 15, 2021 • edited

Ehco1996 commented Nov 19, 2021

Ehco1996 commented Nov 19, 2021 • edited

yufan022 commented Nov 19, 2021 • edited

yufan022 commented Nov 19, 2021 • edited

yufan022 commented Nov 19, 2021

Ehco1996 left a comment

Choose a reason for hiding this comment

GMHDBJD left a comment • edited

Choose a reason for hiding this comment

GMHDBJD Nov 22, 2021 • edited

Choose a reason for hiding this comment

GMHDBJD Nov 22, 2021

Choose a reason for hiding this comment

yufan022 Nov 22, 2021

Choose a reason for hiding this comment

GMHDBJD Nov 22, 2021

Choose a reason for hiding this comment

yufan022 Nov 22, 2021 • edited

Choose a reason for hiding this comment

Ehco1996 commented Nov 22, 2021

yufan022 commented Nov 22, 2021 • edited

GMHDBJD left a comment

Choose a reason for hiding this comment

Ehco1996 commented Nov 22, 2021

lance6716 commented Nov 23, 2021

Ehco1996 commented Nov 23, 2021

ti-chi-bot commented Nov 23, 2021

Ehco1996 commented Nov 23, 2021

yufan022 commented Nov 4, 2021 •

edited

ti-chi-bot commented Nov 4, 2021 •

edited

CLAassistant commented Nov 4, 2021 •

edited

Ehco1996 left a comment •

edited

Ehco1996 commented Nov 8, 2021 •

edited

yufan022 commented Nov 15, 2021 •

edited

Ehco1996 commented Nov 15, 2021 •

edited

Ehco1996 commented Nov 19, 2021 •

edited

yufan022 commented Nov 19, 2021 •

edited

yufan022 commented Nov 19, 2021 •

edited

GMHDBJD left a comment •

edited

GMHDBJD Nov 22, 2021 •

edited

yufan022 Nov 22, 2021 •

edited

yufan022 commented Nov 22, 2021 •

edited