Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: batch check the constrains when we add a unique-index. #7132

Merged
merged 19 commits into from Aug 6, 2018

Conversation

@winkyao
Copy link
Member

commented Jul 23, 2018

What have you changed? (mandatory)

Before this PR, we check if the key is duplicate every row, it will be much slower than adding a non-unique index. In this PR, before we create a unique index, we batch check the keys, and skip the keys that already exists.

I built a cluster using ansible on my virtual machine. And use importer to produce a table with 1 million rows. Then I try to add a unique index on it, with this PR and the master branch. I got the result:

2018/07/23 21:47:25.738 adapter.go:363: [warning] [SLOW_QUERY] cost_time:7m25.495286579s succ:true con:17 user:root@127.0.0.1 txn_start_ts:401697197322665986 database:test sql:alter table t add unique index b(b)

2018/07/23 22:01:42.412 adapter.go:363: [warning] [SLOW_QUERY] cost_time:1m41.406208324s succ:true con:1 user:root@127.0.0.1 txn_start_ts:401697512092336130 database:test sql:alter table t add unique index b(b)

master branch costs 7m25s, and this pr costs 1m41s to finish it. This PR has about 77.3% improvement.

This PR needs to cherry-pick to 2.0.

What is the type of the changes? (mandatory)

  • Improvement (non-breaking change which is an improvement to an existing feature)

How has this PR been tested? (mandatory)

Exist tests.

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

no

Does this PR affect tidb-ansible update? (mandatory)

no

Does this PR need to be added to the release notes? (mandatory)

YES:

release note:
Spead up adding unique index by batch checking the constraints.

-->

Refer to a related PR or issue link (optional)

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

winkyao added 2 commits Jul 23, 2018
@@ -453,6 +453,8 @@ type indexRecord struct {
handle int64
key []byte // It's used to lock a record. Record it to reduce the encoding time.
vals []types.Datum // It's the index values.
// skip indicate the index key is already exists, we should not add it.

This comment has been minimized.

Copy link
@shenli

shenli Jul 23, 2018

Member

indicates that .....

defaultVals []types.Datum // It's used to reduce the number of new slice.
idxRecords []*indexRecord // It's used to reduce the number of new slice.
rowMap map[int64]types.Datum // It's the index column values map. It is used to reduce the number of making map.
defaultVals []types.Datum // It's used to reduce the number of new slice.

This comment has been minimized.

Copy link
@shenli

shenli Jul 23, 2018

Member

The comments for the following attribute are almost the same. We could use one comment for them. Such as:
"The following attributes are used to reduce memory allocation."

defaultVals: make([]types.Datum, len(t.Cols())),
rowMap: make(map[int64]types.Datum, len(colFieldMap)),
}
w.reAllocIdxKeyBufs(w.batchCnt)

This comment has been minimized.

Copy link
@shenli

shenli Jul 23, 2018

Member

Why do we need to reallocate it?

}
}
// Constrains is already checked.
w.sessCtx.GetSessionVars().StmtCtx.BatchCheck = true

This comment has been minimized.

Copy link
@shenli

shenli Jul 23, 2018

Member

stmtCtx.BatchCheck = true

winkyao added 2 commits Jul 24, 2018
@winkyao

This comment has been minimized.

Copy link
Member Author

commented Jul 24, 2018

@shenli PTAL

@winkyao

This comment has been minimized.

Copy link
Member Author

commented Jul 24, 2018

defaultVals: make([]types.Datum, len(t.Cols())),
rowMap: make(map[int64]types.Datum, len(colFieldMap)),
}
w.initBatchCheckBufs(w.batchCnt)

This comment has been minimized.

Copy link
@lamxTyler

lamxTyler Jul 24, 2018

Member

Why init here? If the index is not unique, it is a waste to init it.


// 1. unique-key is duplicate and the handle is equal, skip it.
// 2. unique-key is duplicate and the handle is not equal, return duplicate error.
// 3. non-unique-key is duplicate, skip it.

This comment has been minimized.

Copy link
@lamxTyler

lamxTyler Jul 24, 2018

Member

Why can we skip it?

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 24, 2018

Author Member

You mean which one?

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 24, 2018

Author Member

Backfill indices only need to add the not exist index, if the index already exists, why we need to add it again?

This comment has been minimized.

Copy link
@lamxTyler

lamxTyler Jul 24, 2018

Member

Say if there is a unique index (a), and if there are two rows (null), (null), then all the rows need to be added.

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 24, 2018

Author Member

Actually, it will be added, because the null value in unique-key is regarded as non-distinct key, so we will append the handle to key, so the twos (null) (null) will have the different key.

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 24, 2018

Author Member

I will add a unit test case to eliminate your doubt.

winkyao added 2 commits Jul 24, 2018
@winkyao

This comment has been minimized.

Copy link
Member Author

commented Jul 24, 2018

@lamxTyler PTAL

// The index is already exists, we skip it, no needs to backfill it.
// The following update, delete, insert on these rows, TiDB can handle it correctly.
if idxRecord.skip {
continue

This comment has been minimized.

Copy link
@crazycs520

crazycs520 Jul 24, 2018

Contributor

Is skip maybe cause the addedCount wrong? see this PR : https://github.com/pingcap/tidb/pull/6980/files

This comment has been minimized.

Copy link
@winkyao

winkyao Jul 31, 2018

Author Member

No, this skipped row will not affect addedCount, it is expected, but scanCount should increace.

@@ -452,6 +452,8 @@ type indexRecord struct {
handle int64
key []byte // It's used to lock a record. Record it to reduce the encoding time.
vals []types.Datum // It's the index values.
// skip indicates that the index key is already exists, we should not add it.

This comment has been minimized.

Copy link
@jackysp

jackysp Jul 31, 2018

Member

It's better to move the comment to the end of the next line.

@jackysp

This comment has been minimized.

Copy link
Member

commented Jul 31, 2018

Please resolve the conflicts.

@winkyao

This comment has been minimized.

Copy link
Member Author

commented Aug 1, 2018

@jackysp PTAL

@jackysp

This comment has been minimized.

Copy link
Member

commented Aug 1, 2018

Please fix CI and resolve the conflicts again.

@winkyao

This comment has been minimized.

Copy link
Member Author

commented Aug 1, 2018

@winkyao

This comment has been minimized.

Copy link
Member Author

commented Aug 1, 2018

@lamxTyler PTAL

Copy link
Member

left a comment

LGTM

@jackysp
jackysp approved these changes Aug 2, 2018
Copy link
Member

left a comment

LGTM

winkyao added 2 commits Aug 2, 2018
@winkyao

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2018

/run-all-tests

@winkyao

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2018

/rebuild

@winkyao

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2018

/run-unit-test

Copy link
Member

left a comment

LGTM

@zimulala zimulala added the status/LGT3 label Aug 6, 2018
@winkyao winkyao merged commit 326baac into pingcap:master Aug 6, 2018
4 checks passed
4 checks passed
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci-tidb/build Jenkins job succeeded.
Details
license/cla Contributor License Agreement is signed.
Details
@winkyao winkyao deleted the winkyao:speedup_creating_unique_key branch Aug 6, 2018
winkyao added a commit to winkyao/tidb that referenced this pull request Aug 31, 2018
winkyao added a commit that referenced this pull request Sep 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.