Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: support index_look_up_merge_join to speed up index_look_up_join #9571

Closed

Conversation

lzmhhh123
Copy link
Member

What problem does this PR solve?

Support index_look_up_merge_join for index_look_up_join, reference issue #8470.

What is changed and how it works?

In the process of index_look_up_join execution, if the inner table is PhysicalIndexScan and the join keys of the inner table are the suffix of the index, then we can choose index_look_up_merge_join.

Some details:

  1. Because the merge tasks returned from inner workers are unsorted, the results of index_look_up_merge_join can't keep the order of the outer table by join keys.
  2. The time complexity of index_look_up_merge_join are same as index_look_up_join. But it eliminates the process of the hash map. So in theory, it will speed up.

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change
  • Has exported variable/fields change

Side effects

  • Increased code complexity

Related changes

  • Need to cherry-pick to the release branch
  • Need to be included in the release note

@lzmhhh123 lzmhhh123 changed the title Executor: Support index_look_up_merge_join to speed up index_look_up_join executor: Support index_look_up_merge_join to speed up index_look_up_join Mar 6, 2019
@codecov-io
Copy link

codecov-io commented Mar 7, 2019

Codecov Report

Merging #9571 into master will decrease coverage by 0.4687%.
The diff coverage is 77.5641%.

@@               Coverage Diff                @@
##             master      #9571        +/-   ##
================================================
- Coverage   81.4123%   80.9435%   -0.4688%     
================================================
  Files           423        419         -4     
  Lines         90700      89723       -977     
================================================
- Hits          73841      72625      -1216     
- Misses        11568      11828       +260     
+ Partials       5291       5270        -21

@lzmhhh123
Copy link
Member Author

lzmhhh123 commented Mar 11, 2019

PTAL. @zz-jason @XuHuaiyu

@lzmhhh123
Copy link
Member Author

I test the index_look_up_merge_join with TPC-H data.

The sql is:

select /*+ tidb_inlj(orders, lineitem) */
	lineitem.L_ORDERKEY,
	lineitem.L_LINENUMBER,
	orders.O_ORDERKEY
	from
		orders
		left join lineitem
	on orders.O_ORDERKEY = lineitem.L_ORDERKEY
	order by orders.O_ORDERKEY;

The memory cost is about 20% lower than index_look_up_join by eliminating the hash map.
And execution time is similar to index_look_up_join, improve a little.

Here's Frame gragh:
image

executor/index_lookup_merge_join.go Outdated Show resolved Hide resolved
executor/index_lookup_merge_join.go Outdated Show resolved Hide resolved
executor/index_lookup_merge_join.go Outdated Show resolved Hide resolved
@@ -1529,33 +1529,108 @@ func (b *executorBuilder) buildIndexLookUpJoin(v *plannercore.PhysicalIndexJoin)
if defaultValues == nil {
defaultValues = make([]types.Datum, len(innerTypes))
}
outerKeyCols := make([]int, len(v.OuterJoinKeys))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would outerKeyColIdx be clearer?

// If the inner join keys are the suffix set of the index, then return the IndexLookUpMergeJoin
var isIndexScan bool
var is *plannercore.PhysicalIndexScan
if ir, ok := innerPlan.(*plannercore.PhysicalIndexReader); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about innerPlan.(*plannercore.PhysicalTableReader) && IndexPlans[0].(*PhysicalTableScan) is true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If code like this? How to get the IndexScan?

@lzmhhh123 lzmhhh123 force-pushed the dev/index_look_up_merge_join branch from f44d7d5 to b29e4ec Compare July 2, 2019 11:45
Signed-off-by: lzmhhh123 <lzmhhh123@gmail.com>
@lzmhhh123
Copy link
Member Author

ready to be reviewed. @XuHuaiyu @zz-jason @lamxTyler @eurekaka @qw4990

executor/index_lookup_merge_join.go Outdated Show resolved Hide resolved
executor/builder.go Outdated Show resolved Hide resolved
executor/builder.go Outdated Show resolved Hide resolved
sort.Slice(rows, func(i, j int) bool {
for id, joinKey := range omw.joinKeys {
cmp, _, err := omw.compareFuncs[id](omw.ctx, joinKey, joinKey, rows[i], rows[j])
terror.Log(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will ignoring this error make the result wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's as same as index_lookup_join. Look at

// We only compare rows with the same type, no error to return.

executor/index_lookup_merge_join.go Outdated Show resolved Hide resolved
@lzmhhh123 lzmhhh123 force-pushed the dev/index_look_up_merge_join branch from 2752248 to 4ad2c89 Compare July 15, 2019 06:44
"github.com/pingcap/tidb/util/memory"
"github.com/pingcap/tidb/util/ranger"
"go.uber.org/zap"
"golang.org/x/net/context"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed.

if ir, ok := innerPlan.(*PhysicalIndexReader); ok {
is, isIndexScan = ir.IndexPlans[0].(*PhysicalIndexScan)
}
if ilr, ok := innerPlan.(*PhysicalIndexLookUpReader); ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also consider PhysicalTableReader?

// or the outer join keys are NOT the prefix of the prop items,
// then we can't execute merge join.
if canKeepOuterOrder || needOuterSort {
is.KeepOrder = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also set IndexLookupReader or IndexReader to keep order?

}
// canKeepOuterOrder means whether the prop items are the prefix of the outer join keys.
canKeepOuterOrder := len(propItems) <= len(join.OuterJoinKeys)
for i := range propItems {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for i := 0; i < len(propItems) && canKeepOuterOrder; i++ {
}

// then we can't execute merge join.
if canKeepOuterOrder || needOuterSort {
is.KeepOrder = true
join.IsMergeJoin = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about providing a new physical operator named IndexMergeJoin?

if ilr, ok := innerPlan.(*PhysicalIndexLookUpReader); ok {
is, isIndexScan = ilr.IndexPlans[0].(*PhysicalIndexScan)
}
if isIndexScan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if !isIndexScan {
return
}

break
}
}
if isInnerKeysPrefix {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

if err != nil {
return err
}
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate

return 0, nil
}

func (imw *innerMergeWorker) constructDatumLookupKeys(task *lookUpMergeJoinTask) ([]*indexJoinLookUpContent, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the existing innerWorker.constructDatumLookupKey since the logic of these two functions are the same?

ranger.CutDatumByPrefixLen(&dLookUpKey.keys[i], imw.colLens[i], imw.rowTypes[imw.keyCols[i]])
}
}
// dLookUpKey is sorted and deduplicated at sortAndDedupLookUpContents.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless comment

outerRow := task.outerResult.GetRow(task.cursor)
if task.sameKeyIter == nil || task.sameKeyIter.Current() == task.sameKeyIter.End() {
cmpResult := -1
if len(task.outerMatch) > 0 && !task.outerMatch[task.cursor] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will len(task.outerMatch) == 0?

return nil
}

task.innerIter = chunk.NewIterator4Chunk(task.innerResult.GetChunk(task.innerCursor))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use task.innerIter = chunk.NewIterator4List ?

}

task.innerIter = chunk.NewIterator4Chunk(task.innerResult.GetChunk(task.innerCursor))
task.innerIter.Begin()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

innerRow := task.innerIter.Begin()

if task.sameKeyIter == nil || task.sameKeyIter.Current() == task.sameKeyIter.End() {
cmpResult := -1
if len(task.outerMatch) > 0 && !task.outerMatch[task.cursor] {
task.cursor++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use outerIter instead of cursor?

task.sameKeyRows = task.sameKeyRows[:0]
task.sameKeyIter = chunk.NewIterator4Slice(task.sameKeyRows)
task.sameKeyIter.Begin()
if task.innerCursor >= task.innerResult.NumChunks() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to the first line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants