-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expression, plan: extract the same part from DNF's leaves. #5831
Conversation
} | ||
v := make([]types.Datum, 0, len(sf.GetArgs())+1) | ||
var err error | ||
sf.hashcode, err = codec.EncodeValue(sc, nil, types.NewStringDatum(sf.FuncName.L)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd better avoid to use Datum
, it's not efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any good suggestion? This is just some code i recovered since it has been deleted once.
I came up with the idea that:
Constant: ConstantFlag+encoded value
Column: ColumnFlag+encoded value
ScalarFunction: SFFlag+encoded function name + 0(or other flag byte) + encoded arg_1 + 0 + ...
This way column and scalarFunction can call EncodeBytes([]byte)
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for ScalaFunction: "SFFlag+encoded function name + encoded arg0 + encoded arg1 ... " is distinguishable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. I'll do this in later pr.
Good job! |
expression/constant.go
Outdated
terror.Log(errors.Trace(err)) | ||
} | ||
c.hashcode, err = codec.EncodeValue(sc, nil, c.Value) | ||
terror.Log(errors.Trace(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check if the err is nil ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it is better to put the test code as close as possible to the code itself so that future modifications can find if there is something wrong ASAP. We can just write the unit test for extractFiltersFromDNFs
.
plan/predicate_push_down.go
Outdated
if cnt < len(dnfItems) { | ||
delete(hashcode2Expr, hashcode) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no such item, we can just finish here?
plan/predicate_push_down.go
Outdated
hashcode2Expr := make(map[string]expression.Expression) | ||
firstRun := true | ||
for _, dnfItem := range dnfItems { | ||
// We need this because there may be the case like `select * from t, t1 where (t.a=t1.a and t.a=t1.a) or (something). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just make sure that no such thing will happen by eliminating them at an earlier stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what you say is a better practice.
/run-sqllogic-test |
For TPCH-10G 19.sql execution time from N/A to 20s: mysql> select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#52'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 4 and l_quantity <= 4 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#11'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 18 and l_quantity <= 18 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#51'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 29 and l_quantity <= 29 + 10
and p_size between 1 and 15
-> sum(l_extendedprice* (1 - l_discount)) as revenue
-> from
-> lineitem,
-> part
-> where
-> (
-> p_partkey = l_partkey
-> and p_brand = 'Brand#52'
-> and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
-> and l_quantity >= 4 and l_quantity <= 4 + 10
-> and p_size between 1 and 5
-> and l_shipmode in ('AIR', 'AIR REG')
-> and l_shipinstruct = 'DELIVER IN PERSON'
-> )
-> or
-> (
-> p_partkey = l_partkey
-> and p_brand = 'Brand#11'
-> and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
-> and l_quantity >= 18 and l_quantity <= 18 + 10
-> and p_size between 1 and 10
-> and l_shipmode in ('AIR', 'AIR REG')
-> and l_shipinstruct = 'DELIVER IN PERSON'
-> )
-> or
-> (
-> p_partkey = l_partkey
-> and p_brand = 'Brand#51'
-> and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
-> and l_quantity >= 29 and l_quantity <= 29 + 10
-> and p_size between 1 and 15
-> and l_shipmode in ('AIR', 'AIR REG')
-> and l_shipinstruct = 'DELIVER IN PERSON'
-> );
+---------------+
| revenue |
+---------------+
| 39074546.6022 |
+---------------+
1 row in set (20.04 sec) |
v := make([]types.Datum, 0, len(sf.GetArgs())+1) | ||
var err error | ||
sf.hashcode, err = codec.EncodeValue(sc, nil, types.NewStringDatum(sf.FuncName.L)) | ||
terror.Log(errors.Trace(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if err is not nil ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a check inside the terror.Log
v = append(v, types.NewBytesDatum(arg.HashCode(sc))) | ||
} | ||
sf.hashcode = sf.hashcode[:0] | ||
sf.hashcode, err = codec.EncodeValue(sc, sf.hashcode, v...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
expression/util.go
Outdated
// We need this because there may be the case like `select * from t, t1 where (t.a=t1.a and t.a=t1.a) or (something). | ||
// We should make sure that the two `t.a=t1.a` contributes only once. | ||
innerMap := make(map[string]struct{}) | ||
if sf, ok := dnfItem.(*ScalarFunction); ok && sf.FuncName.L == ast.LogicAnd { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to check if it is a real cnf. It's all right to flatten conditions directly and loop it.
expression/util.go
Outdated
} | ||
newDNFItems := make([]Expression, 0, len(dnfItems)) | ||
for _, dnfItem := range dnfItems { | ||
if sf, ok := dnfItem.(*ScalarFunction); ok && sf.FuncName.L == ast.LogicAnd { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
expression/util.go
Outdated
if len(newCNFItems) == 1 { | ||
newDNFItems = append(newDNFItems, newCNFItems[0]) | ||
} else if len(newCNFItems) > 1 { | ||
newDNFItems = append(newDNFItems, ComposeCNFCondition(ctx, newCNFItems...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ComposeCNF can also handle single condition
expression/util.go
Outdated
case 1: | ||
return extractedExpr, newDNFItems[0] | ||
default: | ||
return extractedExpr, ComposeDNFCondition(ctx, newDNFItems...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ComposeDNF also process the nil condition and single condition
expression/util.go
Outdated
@@ -273,3 +274,104 @@ func Contains(exprs []Expression, e Expression) bool { | |||
} | |||
return false | |||
} | |||
|
|||
// ExtractFiltersFromDNFs checks whether the cond is DNF and extract the same expression in its leaf items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain the returning values.
expression/util.go
Outdated
codeMap := make(map[string]int) | ||
hashcode2Expr := make(map[string]Expression) | ||
firstRun := true | ||
for _, dnfItem := range dnfItems { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use an index i
and check if i == 0
, so that we needn't firstRun.
expression/util.go
Outdated
hashcode2Expr := make(map[string]Expression) | ||
firstRun := true | ||
for _, dnfItem := range dnfItems { | ||
// We need this because there may be the case like `select * from t, t1 where (t.a=t1.a and t.a=t1.a) or (something). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this comments to line 313
expression/integration_test.go
Outdated
exprStr string | ||
result string | ||
remained string | ||
resultStr string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above two fields are not used.
expression/integration_test.go
Outdated
}, | ||
{ | ||
exprStr: "a = 1 or a = 1 or (a = 1 and b = 1)", | ||
result: "[eq(test.t.a, 1) eq(test.t.b, 1)]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct? It should be a = 1
, not a = 1 and b = 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the extracted part is a = 1
, and the remained part is b = 1
. So the result is a = 1 and b = 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the assignement that a = 1, b = 2
, it is true for a = 1 or a = 1 or (a = 1 and b = 1)
, but false for a = 1 and b =1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's fixed.
expression/integration_test.go
Outdated
}, | ||
{ | ||
exprStr: "(a = 1 and b = 1) or (a = 1 and b = 1 and c = 1) or (a = 1 and b = 1 and c > 2 and c < 3)", | ||
result: "[eq(test.t.a, 1) eq(test.t.b, 1) or(eq(test.t.c, 1), and(gt(test.t.c, 2), lt(test.t.c, 3)))]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extracted part is a = 1 and b = 1
, the remained part is c = 1 or (c > 2 and c < 3)
.
So the result is a = 1 and b = 1 and (c = 1 or (c > 2 and c < 3))
.
@winoros Please resolve the conflicts. |
expression/expression.go
Outdated
@@ -79,6 +80,9 @@ type Expression interface { | |||
|
|||
// ExplainInfo returns operator information to be explained. | |||
ExplainInfo() string | |||
|
|||
// HashCode creates the hashcode for expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add more comments here. What is this function used for.
expression/util.go
Outdated
|
||
// extractFiltersFromDNF extracts the same condition that occurs in every DNF item and remove them from dnf leaves. | ||
func extractFiltersFromDNF(ctx context.Context, dnfFunc *ScalarFunction) ([]Expression, Expression) { | ||
dnfItems := FlattenDNFConditions(dnfFunc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If len(dnfItems) == 1, could we skip the following logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or expression cannot have only one leaf.
/run-all-tests |
1 similar comment
/run-all-tests |
Please fix the CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
expression/util.go
Outdated
@@ -273,3 +274,84 @@ func Contains(exprs []Expression, e Expression) bool { | |||
} | |||
return false | |||
} | |||
|
|||
// ExtractFiltersFromDNFs checks whether the cond is DNF and extract the same expression in its leaf items and return them as a slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is not correct. This function not only extract&return common conditions.
LGTM |
e.g.
select * from t1, t2 where (t1.a=t2.a and ...) or (t1.a=t2.a and ...) or (t1.a=t2.a and ...)
We should extract the
t1.a=t2.a
out of the DNF, so it can be used as the join's join key condition.And this let tidb push more predicates across the join.
I'll add enough tests in next commit.Unit test is not easy to see what predicates join holds, add explain tests in integration test is a better choice i think.Unit test added.
PTAL @hanfei1991 @lamxTyler @zz-jason