Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning: improve post-import conflict detection 'error' semantic #51277

Merged
merged 38 commits into from
Mar 13, 2024

Conversation

lyzx2001
Copy link
Contributor

What problem does this PR solve?

Issue Number: ref #51036

Problem Summary:
Merge preprocess duplicate detection and post-import conflict detection.

What changed and how does it work?

Improve post-import conflict detection 'error' semantic.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 23, 2024
Copy link

tiprow bot commented Feb 23, 2024

Hi @lyzx2001. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

codecov bot commented Feb 23, 2024

Codecov Report

Merging #51277 (f8021a2) into master (cd60e7f) will increase coverage by 1.0876%.
Report is 6 commits behind head on master.
The diff coverage is 52.5714%.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #51277        +/-   ##
================================================
+ Coverage   72.4691%   73.5567%   +1.0876%     
================================================
  Files          1474       1475         +1     
  Lines        363628     437945     +74317     
================================================
+ Hits         263518     322138     +58620     
- Misses        80708      95538     +14830     
- Partials      19402      20269       +867     
Flag Coverage Δ
integration 50.4922% <39.4285%> (?)
unit 70.4977% <25.1428%> (-1.8548%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9957% <ø> (-2.3014%) ⬇️
parser ∅ <ø> (∅)
br 56.5143% <58.5585%> (+10.1013%) ⬆️

@lyzx2001 lyzx2001 changed the title [WIP] lightning: improve post-import conflict detection 'error' semantic lightning: improve post-import conflict detection 'error' semantic Feb 27, 2024
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2024
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 27, 2024
@lyzx2001
Copy link
Contributor Author

/test pull-lightning-integration-test

Copy link

tiprow bot commented Feb 27, 2024

@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-lightning-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 27, 2024
@lyzx2001
Copy link
Contributor Author

/test pull-lightning-integration-test

Copy link

tiprow bot commented Feb 27, 2024

@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-lightning-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lyzx2001
Copy link
Contributor Author

/test pull-lightning-integration-test

Copy link

tiprow bot commented Feb 27, 2024

@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-lightning-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lyzx2001
Copy link
Contributor Author

/test pull-lightning-integration-test

Copy link

tiprow bot commented Feb 27, 2024

@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-lightning-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 27, 2024
@lyzx2001
Copy link
Contributor Author

/test pull-lightning-integration-test

Copy link

tiprow bot commented Feb 27, 2024

@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-lightning-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lyzx2001
Copy link
Contributor Author

/test pull-lightning-integration-test

Copy link

tiprow bot commented Mar 12, 2024

@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-lightning-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25/30


valueStr, err := tables.GenIndexValueFromIndex(indexKey, indexValue, tbl.Meta(), idxInfo)
require.NoError(t, err)
require.Equal(t, []string{"23"}, valueStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the output for binary or bits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tables.GenIndexValueFromIndex's content is retrieved from the original function genKeyExistsErr in ddl

tidb/pkg/ddl/index.go

Lines 1601 to 1629 in 5745d3d

func genKeyExistsErr(key, value []byte, idxInfo *model.IndexInfo, tblInfo *model.TableInfo) error {
idxColLen := len(idxInfo.Columns)
indexName := fmt.Sprintf("%s.%s", tblInfo.Name.String(), idxInfo.Name.String())
colInfos := tables.BuildRowcodecColInfoForIndexColumns(idxInfo, tblInfo)
values, err := tablecodec.DecodeIndexKV(key, value, idxColLen, tablecodec.HandleNotNeeded, colInfos)
if err != nil {
logutil.BgLogger().Warn("decode index key value failed", zap.String("index", indexName),
zap.String("key", hex.EncodeToString(key)), zap.String("value", hex.EncodeToString(value)), zap.Error(err))
return kv.ErrKeyExists.FastGenByArgs(key, indexName)
}
valueStr := make([]string, 0, idxColLen)
for i, val := range values[:idxColLen] {
d, err := tablecodec.DecodeColumnValue(val, colInfos[i].Ft, time.Local)
if err != nil {
logutil.BgLogger().Warn("decode column value failed", zap.String("index", indexName),
zap.String("key", hex.EncodeToString(key)), zap.String("value", hex.EncodeToString(value)), zap.Error(err))
return kv.ErrKeyExists.FastGenByArgs(key, indexName)
}
str, err := d.ToString()
if err != nil {
str = string(val)
}
if types.IsBinaryStr(colInfos[i].Ft) || types.IsTypeBit(colInfos[i].Ft) {
str = util.FmtNonASCIIPrintableCharToHex(str)
}
valueStr = append(valueStr, str)
}
return kv.ErrKeyExists.FastGenByArgs(strings.Join(valueStr, "-"), indexName)
}

require.EqualError(t, newErr, "[Lightning:Restore:ErrFoundIndexConflictRecords]found index conflict records in table a, index name is 'a.key_b', unique key is '[7]', primary key is '3'")
}

func TestConvertToErrFoundConflictRecordsMultipleColumnsIndex(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestConvertToErrFoundConflictRecords and TestConvertToErrFoundConflictRecordsMultipleColumnsIndex only difference seems to be the number of columns in the table and the index. Could we unify it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three unit tests have been unified by extracting buildTableForTestConvertToErrFoundConflictRecords function.

@lyzx2001
Copy link
Contributor Author

/test pull-lightning-integration-test

Copy link

tiprow bot commented Mar 13, 2024

@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-lightning-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

br/pkg/lightning/backend/local/duplicate.go Show resolved Hide resolved
br/pkg/lightning/backend/local/duplicate.go Show resolved Hide resolved
br/pkg/lightning/backend/local/duplicate.go Outdated Show resolved Hide resolved
ErrInvalidMetaStatus = errors.Normalize("invalid meta status: '%s'", errors.RFCCodeText("Lightning:Restore:ErrInvalidMetaStatus"))
ErrTableIsChecksuming = errors.Normalize("table '%s' is checksuming", errors.RFCCodeText("Lightning:Restore:ErrTableIsChecksuming"))
ErrResolveDuplicateRows = errors.Normalize("resolve duplicate rows error on table '%s'", errors.RFCCodeText("Lightning:Restore:ErrResolveDuplicateRows"))
ErrFoundDuplicateKeys = errors.Normalize("found duplicate key '%s', value '%s'", errors.RFCCodeText("Lightning:Restore:ErrFoundDuplicateKey"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a todo here, this error should be delete and replace with below 2 more concrete error

return false, errors.Trace(err)
err = duplicateManager.CollectDuplicateRowsFromTiKV(ctx, local.importClientFactory, algorithm)
if err != nil {
return common.ErrFoundDataConflictRecords.Equal(err) || common.ErrFoundIndexConflictRecords.Equal(err), errors.Trace(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasDup should be useless if error happened, just return false?

and seems CollectDuplicateRowsFromTiKV might return ErrFoundDuplicateKeys

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don' think hasDup is useless. If error happened, it may be other error like decoding failed or other failure, not necessarily means we encounter duplicate. We still need common.ErrFoundDataConflictRecords.Equal(err) || common.ErrFoundIndexConflictRecords.Equal(err) to determine whether we have duplicates.

CollectDuplicateRowsFromTiKV will not return ErrFoundDuplicateKeys. Here it can only return ErrFoundDataConflictRecords or ErrFoundIndexConflictRecords.

@@ -90,6 +92,33 @@ type litBackendCtx struct {
etcdClient *clientv3.Client
}

func (bc *litBackendCtx) handleErrorAfterCollectRemoteDuplicateRows(err error, indexID int64, tbl table.Table, hasDupe bool) error {
if err != nil && !common.ErrFoundIndexConflictRecords.Equal(err) {
Copy link
Contributor

@D3Hunter D3Hunter Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's error, using hasDupe is not a common style, especially we are passing some context message through the error, it's not a good way to do it. i prefer we return 2 values (*DupInfo, error), error indicate something that we cannot handle any further, DupInfo is not null when there's duplicates found and it hold it's context information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here the error is correctly handled. We first determine
if err != nil && !common.ErrFoundIndexConflictRecords.Equal(err) .
In this case, the error means something that we cannot handle any further.

Then we use hasDupe, in this case either we have no error, or the error is ErrFoundIndexConflictRecords. In both cases, we can directly use hasDupe for further operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's very uncommon to used other return values when there's error returned

and this error is actually works as a struct to pass context info when duplicate is detected. so i prefer write in the way in previous comment

if there's no time to change, i think we can leave a todo to do it later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO added

@lyzx2001
Copy link
Contributor Author

/test pull-lightning-integration-test

Copy link

tiprow bot commented Mar 13, 2024

@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-lightning-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lyzx2001
Copy link
Contributor Author

/test pull-lightning-integration-test

@lyzx2001 lyzx2001 requested a review from D3Hunter March 13, 2024 08:43
Copy link

tiprow bot commented Mar 13, 2024

@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-lightning-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

ti-chi-bot bot commented Mar 13, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-03-12 02:47:13.298313837 +0000 UTC m=+740060.320560226: ☑️ agreed by Leavrth.
  • 2024-03-13 09:07:50.835466181 +0000 UTC m=+849297.857712554: ☑️ agreed by zimulala.

@lyzx2001
Copy link
Contributor Author

/test pull-lightning-integration-test

Copy link

tiprow bot commented Mar 13, 2024

@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-lightning-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

return false, errors.Trace(err)
}
return duplicateManager.HasDuplicate(), nil
}

// CollectRemoteDuplicateRows collect duplicate keys from remote TiKV storage. This keys may be duplicate with
// the data import by other lightning.
func (local *DupeController) CollectRemoteDuplicateRows(ctx context.Context, tbl table.Table, tableName string, opts *encode.SessionOptions) (hasDupe bool, err error) {
// TODO: revise the returned arguments to (hasDupe bool, dupInfo *DupInfo, err error) to distinguish the conflict error and the common error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: revise the returned arguments to (hasDupe bool, dupInfo *DupInfo, err error) to distinguish the conflict error and the common error
// TODO: revise the returned arguments to (dupInfo *DupInfo, err error) to distinguish the conflict error and the common error

hasDup can be infered from dupInfo != nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not necessarily? Only when conflict detection mode is set to error, CollectDuplicateRowsFromTiKV would return ErrFoundDataConflictRecords or ErrFoundIndexConflictRecords. For other modes, we still need duplicateManager.HasDuplicate() to determine whether we have duplicate. And this information is needed for lightning

hasRemoteDupe, e := dupeController.CollectRemoteDuplicateRows(ctx, tr.encTable, tr.tableName, opts)
if e != nil {
tr.logger.Error("collect remote duplicate keys failed", log.ShortError(e))
return false, errors.Trace(e)
}

Copy link

ti-chi-bot bot commented Mar 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, Leavrth, zimulala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Mar 13, 2024
@ti-chi-bot ti-chi-bot bot merged commit ccc453b into pingcap:master Mar 13, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants