-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dm/loader/syncer: support extend column #3262
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Welcome @yufan022! |
For integration tests, should I create a new folder in |
i think this feature would be helpful in |
I added it to a new folder so that the logic would be clearer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for you work
some tips:
- you can use
make check
make fmt
make tidy
to fmt your code, so that lint CI would be pass - use
go get -u github.com/pingcap/tidb-tools@master
to updatetidb-tools
version and this can maker.FetchExtendColumn
really work - new added
extend_column
it test could be added todm/tests/others_integration_2.txt
and this test will running in CI
|
/run-dm-integration-tests |
@yufan022 this may caused by nfs cache in CI, some retry could help, build is successed in here |
/run-dm-integration-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM when rest comments are resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the discussion of https://github.com/pingcap/ticdc/pull/3262/files#r744353579 is a bit long, I'll list the unsettled problems here, please help check if I miss somthing
- what's the meaning of "originalData"
- how to try to eliminate the error of "column vs value mismatch"
- what's the proper way to append
extCols
to WHERE clause
my opinion is
1 & 2: the problem to me is there's currently one version of upstream table structure / upstream binlog values in DM, but we now have two versions of with and without extended columns. Should we also let DM maintain two versions, or we can discard one of the two versions? I prefer we can discuss it in a separate issue and implement the perfect version in later PRs. Would you like to lead this feature with our help? @yufan022
3: currently WHERE clause tries to read downstream PK/UK to distinguish a row. If there exists a PK/UK, since it's unqiue we don't need to attach extCols
. If there's no valid PK/UK which means all columns of PK/UK can be found in binlog, we should attach extCols
to do a full column matching. Then it's related to the above question 😂
anyway, I think we can keep this PR lightweight and merge it as long as it's self-consistent. If this feature has no effects when it's not enabled, it's no harm for other user.
dm/loader/convert_data.go
Outdated
if len(extendCol) > 0 { | ||
columns = append(columns, extendCol...) | ||
} | ||
if hasGeneragedCols || len(extendCol) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(To make sure I didn't miss something) why we switch to "INSERT with column names" for extended columns?
For generated columns, like tbl (c, c2)
when c2 is generated, we cann't use INSERT INTO t VALUES (1) so we must write column names.
The answer may be that the extended columns are not the same order as we concatenate INSERT DMLs, so we must specify the column names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, It's not necessary.
Append extend column in history commits at here, It has been moved to the top, I'll remove this condition.
Currently the extended column must be at the end of all columns and there are requirements for the order.
ping @yufan022 |
I would be happy to work on improving this feature.
Yep. The current table structure comes from the downstream pre-created, so it always comes with extended columns, my idea is that we just add |
you mean |
I think if we can append extValues to values/originOldValues/originValues before newDML, then we don't need to change the logic inside genSQL/whereColumnsAndValues. |
LGTM please ptal @lance6716 @GMHDBJD |
/run-dm-integration-tests |
extend_column it-test faild 😂
ptal @yufan022 |
It looks like s.getTableInfo behavior is inconsistent after I merge the master branch. |
It's caused by PR:3295, starting the Before this PR, table structure always load from downstream. The potential solution is to parse the dump file and skip it if the table has extended columns enabled. |
/run-dm-integration-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The potential solution is to parse the dump file and skip it if the table has extended columns enabled.
this is lgtm and it test ci is passed now i think this pr is ready for merge 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
create table in downstream(May be automated in the future)
fetch table from upstream
fill the columns and datas for exta columns
is better than
create table in downstream
fetch downstream table when task start
fill the data for ext rows
|
||
if len(table.extendCol) > 0 { | ||
for _, v := range table.extendVal { | ||
row = append(row, "'"+v+"'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe escape the values will be better https://github.com/pingcap/tidb/blob/4820f27f9dff7fdb54de310705254c54c9044fb9/dumpling/export/sql_type.go#L69
It's ok not to change
rows := originRows | ||
if extRows != nil { | ||
rows = extRows | ||
} | ||
|
||
prunedColumns, prunedRows, err := pruneGeneratedColumnDML(tableInfo, rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may cause a ErrSyncerUnitDMLPruneColumnMismatch error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, since ti.Columns contains extend columns, extRows
and ti.Columns
are the same length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, at that time I thought the table structure was init from dump file. But I still prefer the tableinfo should be same as upstream. cc @lance6716
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/run-dm-integration-tests |
There are two ways to get the table structure from upstream in dm (if my understanding is correct):
In the case of the second, if we had thousands of upstream tables, we would need to call Please point out if my understanding is incorrect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@lance6716 do you have more suggestions? if not i will merge this PR |
lgtm |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 8036fc4
|
/run-dm-integration-tests |
What problem does this PR solve?
migrate from pingcap/dm#2178
origin issue: https://github.com/pingcap/dm/issues/2111
relation pr: pingcap/tidb-tools#520
What is changed and how it works?
support extract table/schema/source names to specify column.
Load:
reuse reassemble function to rewrite sql.
https://github.com/yufan022/ticdc/blob/da32d489044dcbd1400a35a8c4e162802872b43a/dm/loader/convert_data.go#L185
Sync:
add extract values to rows before handle binlog event.
https://github.com/yufan022/ticdc/blob/da32d489044dcbd1400a35a8c4e162802872b43a/dm/syncer/syncer.go#L2032
Check List
Tests
Code changes
Side effects
Related changes
Release note