-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lightning: improve post-import conflict detection 'error' semantic #51277
lightning: improve post-import conflict detection 'error' semantic #51277
Conversation
Hi @lyzx2001. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #51277 +/- ##
================================================
+ Coverage 72.4691% 73.5567% +1.0876%
================================================
Files 1474 1475 +1
Lines 363628 437945 +74317
================================================
+ Hits 263518 322138 +58620
- Misses 80708 95538 +14830
- Partials 19402 20269 +867
Flags with carried forward coverage won't be shown. Click here to find out more.
|
/test pull-lightning-integration-test |
@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-lightning-integration-test |
@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-lightning-integration-test |
@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-lightning-integration-test |
@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-lightning-integration-test |
@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-lightning-integration-test |
@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
25/30
|
||
valueStr, err := tables.GenIndexValueFromIndex(indexKey, indexValue, tbl.Meta(), idxInfo) | ||
require.NoError(t, err) | ||
require.Equal(t, []string{"23"}, valueStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the output for binary or bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tables.GenIndexValueFromIndex
's content is retrieved from the original function genKeyExistsErr
in ddl
Lines 1601 to 1629 in 5745d3d
func genKeyExistsErr(key, value []byte, idxInfo *model.IndexInfo, tblInfo *model.TableInfo) error { | |
idxColLen := len(idxInfo.Columns) | |
indexName := fmt.Sprintf("%s.%s", tblInfo.Name.String(), idxInfo.Name.String()) | |
colInfos := tables.BuildRowcodecColInfoForIndexColumns(idxInfo, tblInfo) | |
values, err := tablecodec.DecodeIndexKV(key, value, idxColLen, tablecodec.HandleNotNeeded, colInfos) | |
if err != nil { | |
logutil.BgLogger().Warn("decode index key value failed", zap.String("index", indexName), | |
zap.String("key", hex.EncodeToString(key)), zap.String("value", hex.EncodeToString(value)), zap.Error(err)) | |
return kv.ErrKeyExists.FastGenByArgs(key, indexName) | |
} | |
valueStr := make([]string, 0, idxColLen) | |
for i, val := range values[:idxColLen] { | |
d, err := tablecodec.DecodeColumnValue(val, colInfos[i].Ft, time.Local) | |
if err != nil { | |
logutil.BgLogger().Warn("decode column value failed", zap.String("index", indexName), | |
zap.String("key", hex.EncodeToString(key)), zap.String("value", hex.EncodeToString(value)), zap.Error(err)) | |
return kv.ErrKeyExists.FastGenByArgs(key, indexName) | |
} | |
str, err := d.ToString() | |
if err != nil { | |
str = string(val) | |
} | |
if types.IsBinaryStr(colInfos[i].Ft) || types.IsTypeBit(colInfos[i].Ft) { | |
str = util.FmtNonASCIIPrintableCharToHex(str) | |
} | |
valueStr = append(valueStr, str) | |
} | |
return kv.ErrKeyExists.FastGenByArgs(strings.Join(valueStr, "-"), indexName) | |
} |
require.EqualError(t, newErr, "[Lightning:Restore:ErrFoundIndexConflictRecords]found index conflict records in table a, index name is 'a.key_b', unique key is '[7]', primary key is '3'") | ||
} | ||
|
||
func TestConvertToErrFoundConflictRecordsMultipleColumnsIndex(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestConvertToErrFoundConflictRecords
and TestConvertToErrFoundConflictRecordsMultipleColumnsIndex
only difference seems to be the number of columns in the table and the index. Could we unify it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three unit tests have been unified by extracting buildTableForTestConvertToErrFoundConflictRecords
function.
/test pull-lightning-integration-test |
@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest lgtm
ErrInvalidMetaStatus = errors.Normalize("invalid meta status: '%s'", errors.RFCCodeText("Lightning:Restore:ErrInvalidMetaStatus")) | ||
ErrTableIsChecksuming = errors.Normalize("table '%s' is checksuming", errors.RFCCodeText("Lightning:Restore:ErrTableIsChecksuming")) | ||
ErrResolveDuplicateRows = errors.Normalize("resolve duplicate rows error on table '%s'", errors.RFCCodeText("Lightning:Restore:ErrResolveDuplicateRows")) | ||
ErrFoundDuplicateKeys = errors.Normalize("found duplicate key '%s', value '%s'", errors.RFCCodeText("Lightning:Restore:ErrFoundDuplicateKey")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a todo here, this error should be delete and replace with below 2 more concrete error
return false, errors.Trace(err) | ||
err = duplicateManager.CollectDuplicateRowsFromTiKV(ctx, local.importClientFactory, algorithm) | ||
if err != nil { | ||
return common.ErrFoundDataConflictRecords.Equal(err) || common.ErrFoundIndexConflictRecords.Equal(err), errors.Trace(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasDup
should be useless if error happened, just return false?
and seems CollectDuplicateRowsFromTiKV
might return ErrFoundDuplicateKeys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don' think hasDup
is useless. If error happened, it may be other error like decoding failed or other failure, not necessarily means we encounter duplicate. We still need common.ErrFoundDataConflictRecords.Equal(err) || common.ErrFoundIndexConflictRecords.Equal(err)
to determine whether we have duplicates.
CollectDuplicateRowsFromTiKV
will not return ErrFoundDuplicateKeys
. Here it can only return ErrFoundDataConflictRecords
or ErrFoundIndexConflictRecords
.
@@ -90,6 +92,33 @@ type litBackendCtx struct { | |||
etcdClient *clientv3.Client | |||
} | |||
|
|||
func (bc *litBackendCtx) handleErrorAfterCollectRemoteDuplicateRows(err error, indexID int64, tbl table.Table, hasDupe bool) error { | |||
if err != nil && !common.ErrFoundIndexConflictRecords.Equal(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's error, using hasDupe
is not a common style, especially we are passing some context message through the error
, it's not a good way to do it. i prefer we return 2 values (*DupInfo, error)
, error
indicate something that we cannot handle any further, DupInfo
is not null when there's duplicates found and it hold it's context information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here the error is correctly handled. We first determine
if err != nil && !common.ErrFoundIndexConflictRecords.Equal(err)
.
In this case, the error means something that we cannot handle any further.
Then we use hasDupe
, in this case either we have no error, or the error is ErrFoundIndexConflictRecords
. In both cases, we can directly use hasDupe
for further operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's very uncommon to used other return values when there's error returned
and this error is actually works as a struct to pass context info when duplicate is detected. so i prefer write in the way in previous comment
if there's no time to change, i think we can leave a todo to do it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO added
/test pull-lightning-integration-test |
@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-lightning-integration-test |
@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-lightning-integration-test |
@lyzx2001: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
return false, errors.Trace(err) | ||
} | ||
return duplicateManager.HasDuplicate(), nil | ||
} | ||
|
||
// CollectRemoteDuplicateRows collect duplicate keys from remote TiKV storage. This keys may be duplicate with | ||
// the data import by other lightning. | ||
func (local *DupeController) CollectRemoteDuplicateRows(ctx context.Context, tbl table.Table, tableName string, opts *encode.SessionOptions) (hasDupe bool, err error) { | ||
// TODO: revise the returned arguments to (hasDupe bool, dupInfo *DupInfo, err error) to distinguish the conflict error and the common error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: revise the returned arguments to (hasDupe bool, dupInfo *DupInfo, err error) to distinguish the conflict error and the common error | |
// TODO: revise the returned arguments to (dupInfo *DupInfo, err error) to distinguish the conflict error and the common error |
hasDup can be infered from dupInfo != nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not necessarily? Only when conflict detection mode is set to error
, CollectDuplicateRowsFromTiKV
would return ErrFoundDataConflictRecords
or ErrFoundIndexConflictRecords
. For other modes, we still need duplicateManager.HasDuplicate()
to determine whether we have duplicate. And this information is needed for lightning
tidb/br/pkg/lightning/importer/table_import.go
Lines 1055 to 1059 in 7548df7
hasRemoteDupe, e := dupeController.CollectRemoteDuplicateRows(ctx, tr.encTable, tr.tableName, opts) | |
if e != nil { | |
tr.logger.Error("collect remote duplicate keys failed", log.ShortError(e)) | |
return false, errors.Trace(e) | |
} |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: D3Hunter, Leavrth, zimulala The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What problem does this PR solve?
Issue Number: ref #51036
Problem Summary:
Merge preprocess duplicate detection and post-import conflict detection.
What changed and how does it work?
Improve post-import conflict detection 'error' semantic.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.