Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: improve performance for insert ignore on duplicate key update statement #6760

Merged
merged 7 commits into from Jun 19, 2018

Conversation

jackysp
Copy link
Member

@jackysp jackysp commented Jun 5, 2018

What have you changed?

  • Split the batch-check functions into a single file.
  • Refactor batch-check keys info structure to one record key and some index keys.
  • Refactor insert executor to speed up insert ignore on duplicate key update statement.

What are the type of the changes?

  • Improvement

How has this PR been tested?

unit tests

Benchmark result

When insert 27 rows in one statement,

  • Best case:
master this pr
19.3s 0.9s
  • Worst case:
master this pr
28.7s 11.6s

PTAL @coocood @XuHuaiyu

@jackysp jackysp changed the title executor: improve performance of insert ignore on duplicate key update statement executor: improve performance for insert ignore on duplicate key update statement Jun 5, 2018
@shenli
Copy link
Member

shenli commented Jun 5, 2018

Good job!

@shenli
Copy link
Member

shenli commented Jun 5, 2018

/run-all-tests

// Batch get values.
nKeys := 0
for _, r := range toBeCheckRows {
nKeys += len(r.uniqueKeys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleKey should be considered?

dupErr error
}

type toBeCheckRow struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/toBeCheckRow/toBeCheckedRow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I want a more meaningful name. Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think toBeCheckedRow is ok.

}

// batchGetInsertKeys uses batch-get to fetch all key-value pairs to be checked for ignore or duplicate key update.
func (b *batchChecker) batchGetInsertKeys(ctx sessionctx.Context, t table.Table, newRows []types.DatumRow) ([]toBeCheckRow, map[string][]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assign the results to b and only returns error?

}

// batchGetOldValues gets the values of storage in batch.
func (b *batchChecker) batchGetOldValues(ctx sessionctx.Context, t table.Table, handles []int64) (map[string][]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just set the values into b.dupOldRowValues?

func (b *batchChecker) initDupOldRowValue(ctx sessionctx.Context, t table.Table, newRows []types.DatumRow) (err error) {
b.dupOldRowValues = make(map[string][]byte, len(newRows))
handles := make([]int64, 0, len(newRows))
for _, r := range b.toBeCheckRows {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we loop the toBeCheckedRows two times.
The first time we only set dupOldRowValues for handle key.
The second time we collect unique key handles, then batch get those handles.
Then we can extract two methods, initDupOldRowFromHandleKey and initDupOldRowFromUniqueKey.

@jackysp
Copy link
Member Author

jackysp commented Jun 12, 2018

PTAL @coocood

@coocood
Copy link
Member

coocood commented Jun 12, 2018

LGTM

@coocood coocood added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 12, 2018
@jackysp
Copy link
Member Author

jackysp commented Jun 12, 2018

@XuHuaiyu PTAL

}

type keyValueWithDupInfo struct {
newKeyValue keyValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename newKeyValue to newKV? I think r.handleKey.newKeyValue.key is too long.

type batchChecker struct {
// For duplicate key update
toBeCheckedRows []toBeCheckedRow
dupKeyValues map[string][]byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. s/dupKeyValues/dupKVs

return errors.Trace(err)
}
delete(e.dupKeyValues, string(r.handleKey.newKeyValue.key))
newRows[i] = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this line like line142.

@jackysp
Copy link
Member Author

jackysp commented Jun 14, 2018

/run-all-tests

@jackysp
Copy link
Member Author

jackysp commented Jun 14, 2018

PTAL @zimulala @coocood . I've fixed a refactor bug and added its test case.

@jackysp
Copy link
Member Author

jackysp commented Jun 18, 2018

PTAL @zimulala @coocood @XuHuaiyu

@coocood
Copy link
Member

coocood commented Jun 19, 2018

LGTM

Copy link
Contributor

@zimulala zimulala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zimulala zimulala added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 19, 2018
@jackysp jackysp merged commit 3c0bfc1 into pingcap:master Jun 19, 2018
@jackysp jackysp deleted the insert_ignore_ondup branch July 3, 2018 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants