Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expression, plan: extract the same part from DNF's leaves. #5831

Merged
merged 19 commits into from
Feb 26, 2018

Conversation

winoros
Copy link
Member

@winoros winoros commented Feb 8, 2018

e.g.
select * from t1, t2 where (t1.a=t2.a and ...) or (t1.a=t2.a and ...) or (t1.a=t2.a and ...)
We should extract the t1.a=t2.a out of the DNF, so it can be used as the join's join key condition.
And this let tidb push more predicates across the join.

I'll add enough tests in next commit.
Unit test is not easy to see what predicates join holds, add explain tests in integration test is a better choice i think.
Unit test added.

PTAL @hanfei1991 @lamxTyler @zz-jason

}
v := make([]types.Datum, 0, len(sf.GetArgs())+1)
var err error
sf.hashcode, err = codec.EncodeValue(sc, nil, types.NewStringDatum(sf.FuncName.L))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd better avoid to use Datum, it's not efficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any good suggestion? This is just some code i recovered since it has been deleted once.
I came up with the idea that:

Constant: ConstantFlag+encoded value
Column: ColumnFlag+encoded value
ScalarFunction: SFFlag+encoded function name + 0(or other flag byte) + encoded arg_1 + 0 + ... 

This way column and scalarFunction can call EncodeBytes([]byte) directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for ScalaFunction: "SFFlag+encoded function name + encoded arg0 + encoded arg1 ... " is distinguishable ?

Copy link
Member Author

@winoros winoros Feb 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. I'll do this in later pr.

@shenli
Copy link
Member

shenli commented Feb 8, 2018

Good job!

terror.Log(errors.Trace(err))
}
c.hashcode, err = codec.EncodeValue(sc, nil, c.Value)
terror.Log(errors.Trace(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if the err is nil ?

Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it is better to put the test code as close as possible to the code itself so that future modifications can find if there is something wrong ASAP. We can just write the unit test for extractFiltersFromDNFs.

if cnt < len(dnfItems) {
delete(hashcode2Expr, hashcode)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no such item, we can just finish here?

hashcode2Expr := make(map[string]expression.Expression)
firstRun := true
for _, dnfItem := range dnfItems {
// We need this because there may be the case like `select * from t, t1 where (t.a=t1.a and t.a=t1.a) or (something).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make sure that no such thing will happen by eliminating them at an earlier stage?

Copy link
Member Author

@winoros winoros Feb 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what you say is a better practice.

@winoros
Copy link
Member Author

winoros commented Feb 9, 2018

/run-sqllogic-test

@winoros
Copy link
Member Author

winoros commented Feb 9, 2018

For TPCH-10G 19.sql execution time from N/A to 20s:

mysql> select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#52'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
        and l_quantity >= 4 and l_quantity <= 4 + 10
        and p_size between 1 and 5
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#11'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
        and l_quantity >= 18 and l_quantity <= 18 + 10
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#51'
        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
        and l_quantity >= 29 and l_quantity <= 29 + 10
        and p_size between 1 and 15
     ->     sum(l_extendedprice* (1 - l_discount)) as revenue
    -> from
    ->     lineitem,
    ->     part
    -> where
    ->     (
    ->         p_partkey = l_partkey
    ->         and p_brand = 'Brand#52'
    ->         and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
    ->         and l_quantity >= 4 and l_quantity <= 4 + 10
    ->         and p_size between 1 and 5
    ->         and l_shipmode in ('AIR', 'AIR REG')
    ->         and l_shipinstruct = 'DELIVER IN PERSON'
    ->     )
    ->     or
    ->     (
    ->         p_partkey = l_partkey
    ->         and p_brand = 'Brand#11'
    ->         and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
    ->         and l_quantity >= 18 and l_quantity <= 18 + 10
    ->         and p_size between 1 and 10
    ->         and l_shipmode in ('AIR', 'AIR REG')
    ->         and l_shipinstruct = 'DELIVER IN PERSON'
    ->     )
    ->     or
    ->     (
    ->         p_partkey = l_partkey
    ->         and p_brand = 'Brand#51'
    ->         and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
    ->         and l_quantity >= 29 and l_quantity <= 29 + 10
    ->         and p_size between 1 and 15
    ->         and l_shipmode in ('AIR', 'AIR REG')
    ->         and l_shipinstruct = 'DELIVER IN PERSON'
    ->     );
+---------------+
| revenue       |
+---------------+
| 39074546.6022 |
+---------------+
1 row in set (20.04 sec)

@winoros winoros changed the title expression, plan: extract the same filters from join's DNF conditions. expression, plan: extract the same part from DNF's leaves. Feb 9, 2018
v := make([]types.Datum, 0, len(sf.GetArgs())+1)
var err error
sf.hashcode, err = codec.EncodeValue(sc, nil, types.NewStringDatum(sf.FuncName.L))
terror.Log(errors.Trace(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if err is not nil ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a check inside the terror.Log

v = append(v, types.NewBytesDatum(arg.HashCode(sc)))
}
sf.hashcode = sf.hashcode[:0]
sf.hashcode, err = codec.EncodeValue(sc, sf.hashcode, v...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

// We need this because there may be the case like `select * from t, t1 where (t.a=t1.a and t.a=t1.a) or (something).
// We should make sure that the two `t.a=t1.a` contributes only once.
innerMap := make(map[string]struct{})
if sf, ok := dnfItem.(*ScalarFunction); ok && sf.FuncName.L == ast.LogicAnd {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to check if it is a real cnf. It's all right to flatten conditions directly and loop it.

}
newDNFItems := make([]Expression, 0, len(dnfItems))
for _, dnfItem := range dnfItems {
if sf, ok := dnfItem.(*ScalarFunction); ok && sf.FuncName.L == ast.LogicAnd {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

if len(newCNFItems) == 1 {
newDNFItems = append(newDNFItems, newCNFItems[0])
} else if len(newCNFItems) > 1 {
newDNFItems = append(newDNFItems, ComposeCNFCondition(ctx, newCNFItems...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComposeCNF can also handle single condition

case 1:
return extractedExpr, newDNFItems[0]
default:
return extractedExpr, ComposeDNFCondition(ctx, newDNFItems...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComposeDNF also process the nil condition and single condition

@@ -273,3 +274,104 @@ func Contains(exprs []Expression, e Expression) bool {
}
return false
}

// ExtractFiltersFromDNFs checks whether the cond is DNF and extract the same expression in its leaf items.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain the returning values.

codeMap := make(map[string]int)
hashcode2Expr := make(map[string]Expression)
firstRun := true
for _, dnfItem := range dnfItems {
Copy link
Member

@hanfei1991 hanfei1991 Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use an index i and check if i == 0, so that we needn't firstRun.

hashcode2Expr := make(map[string]Expression)
firstRun := true
for _, dnfItem := range dnfItems {
// We need this because there may be the case like `select * from t, t1 where (t.a=t1.a and t.a=t1.a) or (something).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this comments to line 313

exprStr string
result string
remained string
resultStr string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above two fields are not used.

},
{
exprStr: "a = 1 or a = 1 or (a = 1 and b = 1)",
result: "[eq(test.t.a, 1) eq(test.t.b, 1)]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct? It should be a = 1, not a = 1 and b = 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the extracted part is a = 1, and the remained part is b = 1. So the result is a = 1 and b = 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the assignement that a = 1, b = 2, it is true for a = 1 or a = 1 or (a = 1 and b = 1), but false for a = 1 and b =1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's fixed.

},
{
exprStr: "(a = 1 and b = 1) or (a = 1 and b = 1 and c = 1) or (a = 1 and b = 1 and c > 2 and c < 3)",
result: "[eq(test.t.a, 1) eq(test.t.b, 1) or(eq(test.t.c, 1), and(gt(test.t.c, 2), lt(test.t.c, 3)))]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extracted part is a = 1 and b = 1, the remained part is c = 1 or (c > 2 and c < 3).
So the result is a = 1 and b = 1 and (c = 1 or (c > 2 and c < 3)).

@shenli
Copy link
Member

shenli commented Feb 23, 2018

@winoros Please resolve the conflicts.

@@ -79,6 +80,9 @@ type Expression interface {

// ExplainInfo returns operator information to be explained.
ExplainInfo() string

// HashCode creates the hashcode for expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more comments here. What is this function used for.


// extractFiltersFromDNF extracts the same condition that occurs in every DNF item and remove them from dnf leaves.
func extractFiltersFromDNF(ctx context.Context, dnfFunc *ScalarFunction) ([]Expression, Expression) {
dnfItems := FlattenDNFConditions(dnfFunc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If len(dnfItems) == 1, could we skip the following logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or expression cannot have only one leaf.

@shenli
Copy link
Member

shenli commented Feb 23, 2018

/run-all-tests

1 similar comment
@iamxy
Copy link
Member

iamxy commented Feb 23, 2018

/run-all-tests

@shenli
Copy link
Member

shenli commented Feb 24, 2018

Please fix the CI.

Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alivxxx alivxxx added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 24, 2018
@shenli
Copy link
Member

shenli commented Feb 26, 2018

/run-all-tests

@@ -273,3 +274,84 @@ func Contains(exprs []Expression, e Expression) bool {
}
return false
}

// ExtractFiltersFromDNFs checks whether the cond is DNF and extract the same expression in its leaf items and return them as a slice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is not correct. This function not only extract&return common conditions.

@shenli
Copy link
Member

shenli commented Feb 26, 2018

LGTM

@winoros winoros added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 26, 2018
@winoros winoros merged commit cd63332 into pingcap:master Feb 26, 2018
@winoros winoros deleted the join-filters branch February 26, 2018 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants