executor: improve performance for insert ignore on duplicate key update statement #6760

jackysp · 2018-06-05T10:53:45Z

What have you changed?

Split the batch-check functions into a single file.
Refactor batch-check keys info structure to one record key and some index keys.
Refactor insert executor to speed up insert ignore on duplicate key update statement.

What are the type of the changes?

Improvement

How has this PR been tested?

unit tests

Benchmark result

When insert 27 rows in one statement,

Best case:

master	this pr
19.3s	0.9s

Worst case:

master	this pr
28.7s	11.6s

PTAL @coocood @XuHuaiyu

shenli · 2018-06-05T15:08:26Z

Good job!

shenli · 2018-06-05T15:08:38Z

/run-all-tests

coocood · 2018-06-06T09:35:22Z

executor/batch_checker.go

+	// Batch get values.
+	nKeys := 0
+	for _, r := range toBeCheckRows {
+		nKeys += len(r.uniqueKeys)


handleKey should be considered?

coocood · 2018-06-06T09:44:03Z

executor/batch_checker.go

+	dupErr      error
+}
+
+type toBeCheckRow struct {


s/toBeCheckRow/toBeCheckedRow

Actually, I want a more meaningful name. Any suggestions?

I think toBeCheckedRow is ok.

coocood · 2018-06-11T08:59:56Z

executor/batch_checker.go

+}
+
+// batchGetInsertKeys uses batch-get to fetch all key-value pairs to be checked for ignore or duplicate key update.
+func (b *batchChecker) batchGetInsertKeys(ctx sessionctx.Context, t table.Table, newRows []types.DatumRow) ([]toBeCheckRow, map[string][]byte, error) {


Can we assign the results to b and only returns error?

coocood · 2018-06-11T09:22:15Z

executor/batch_checker.go

+}
+
+// batchGetOldValues gets the values of storage in batch.
+func (b *batchChecker) batchGetOldValues(ctx sessionctx.Context, t table.Table, handles []int64) (map[string][]byte, error) {


Can we just set the values into b.dupOldRowValues?

coocood · 2018-06-11T09:29:01Z

executor/batch_checker.go

+func (b *batchChecker) initDupOldRowValue(ctx sessionctx.Context, t table.Table, newRows []types.DatumRow) (err error) {
+	b.dupOldRowValues = make(map[string][]byte, len(newRows))
+	handles := make([]int64, 0, len(newRows))
+	for _, r := range b.toBeCheckRows {


How about we loop the toBeCheckedRows two times.
The first time we only set dupOldRowValues for handle key.
The second time we collect unique key handles, then batch get those handles.
Then we can extract two methods, initDupOldRowFromHandleKey and initDupOldRowFromUniqueKey.

jackysp · 2018-06-12T03:02:58Z

PTAL @coocood

coocood · 2018-06-12T03:17:07Z

LGTM

jackysp · 2018-06-12T03:26:44Z

@XuHuaiyu PTAL

zimulala · 2018-06-13T11:39:08Z

executor/batch_checker.go

+}
+
+type keyValueWithDupInfo struct {
+	newKeyValue keyValue


Could we rename newKeyValue to newKV? I think r.handleKey.newKeyValue.key is too long.

zimulala · 2018-06-13T11:39:27Z

executor/batch_checker.go

+type batchChecker struct {
+	// For duplicate key update
+	toBeCheckedRows []toBeCheckedRow
+	dupKeyValues    map[string][]byte


Ditto. s/dupKeyValues/dupKVs

zimulala · 2018-06-13T12:28:01Z

executor/insert.go

+					return errors.Trace(err)
+				}
+				delete(e.dupKeyValues, string(r.handleKey.newKeyValue.key))
+				newRows[i] = nil


I think we can remove this line like line142.

jackysp · 2018-06-14T04:54:04Z

/run-all-tests

jackysp · 2018-06-14T04:56:00Z

PTAL @zimulala @coocood . I've fixed a refactor bug and added its test case.

jackysp · 2018-06-18T07:03:33Z

PTAL @zimulala @coocood @XuHuaiyu

coocood · 2018-06-19T06:23:32Z

LGTM

zimulala

LGTM

jackysp added 3 commits June 4, 2018 19:38

refactor for reading

a69dde7

speed up insert into ignore on duplicate update

325dfb8

change one function name

74d08e1

jackysp changed the title ~~executor: improve performance of insert ignore on duplicate key update statement~~ executor: improve performance for insert ignore on duplicate key update statement Jun 5, 2018

jackysp added the all-tests-passed label Jun 6, 2018

coocood reviewed Jun 6, 2018

View reviewed changes

coocood reviewed Jun 11, 2018

View reviewed changes

jackysp added 2 commits June 11, 2018 21:25

address comments

db8c3cd

merge master

4df4f28

coocood added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 12, 2018

zimulala reviewed Jun 13, 2018

View reviewed changes

fix a refactor bug and add its test case

b1e7c43

zimulala reviewed Jun 19, 2018

View reviewed changes

Merge branch 'master' into insert_ignore_ondup

c60143d

zimulala added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 19, 2018

zimulala approved these changes Jun 19, 2018

View reviewed changes

jackysp merged commit 3c0bfc1 into pingcap:master Jun 19, 2018

jackysp deleted the insert_ignore_ondup branch July 3, 2018 05:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executor: improve performance for insert ignore on duplicate key update statement #6760

executor: improve performance for insert ignore on duplicate key update statement #6760

jackysp commented Jun 5, 2018

shenli commented Jun 5, 2018

shenli commented Jun 5, 2018

coocood Jun 6, 2018

coocood Jun 6, 2018

jackysp Jun 7, 2018

coocood Jun 11, 2018

coocood Jun 11, 2018

coocood Jun 11, 2018

coocood Jun 11, 2018

jackysp commented Jun 12, 2018

coocood commented Jun 12, 2018

jackysp commented Jun 12, 2018

zimulala Jun 13, 2018

zimulala Jun 13, 2018

zimulala Jun 13, 2018

jackysp commented Jun 14, 2018

jackysp commented Jun 14, 2018 •

edited

jackysp commented Jun 18, 2018

coocood commented Jun 19, 2018

zimulala left a comment

executor: improve performance for insert ignore on duplicate key update statement #6760

executor: improve performance for insert ignore on duplicate key update statement #6760

Conversation

jackysp commented Jun 5, 2018

What have you changed?

What are the type of the changes?

How has this PR been tested?

Benchmark result

shenli commented Jun 5, 2018

shenli commented Jun 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackysp commented Jun 12, 2018

coocood commented Jun 12, 2018

jackysp commented Jun 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackysp commented Jun 14, 2018

jackysp commented Jun 14, 2018 • edited

jackysp commented Jun 18, 2018

coocood commented Jun 19, 2018

zimulala left a comment

Choose a reason for hiding this comment

jackysp commented Jun 14, 2018 •

edited