Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util,executor: use MutableString as key for DecimalSet #9913

Merged
merged 9 commits into from Apr 1, 2019

Conversation

XuHuaiyu
Copy link
Contributor

@XuHuaiyu XuHuaiyu commented Mar 27, 2019

What problem does this PR solve?

fix #9900 (comment)

What is changed and how it works?

Using MutableString as key for DecimalSet.
The root reason is the same as #9901.

Check List

Tests

  • Integration test

Code changes

  • Has exported function/method change

Side effects

  • Possible performance regression
    I've tested it using tpch 10G dataset.
tidb [10.0.1.5]> desc lineitem;
+-----------------+---------------+------+------+---------+-------+
| Field           | Type          | Null | Key  | Default | Extra |
+-----------------+---------------+------+------+---------+-------+
...
| L_QUANTITY      | decimal(15,2) | NO   |      | NULL    |       |
...
+-----------------+---------------+------+------+---------+-------+

tidb [10.0.1.5]> select count(L_QUANTITY) from lineitem;
+-------------------+
| count(L_QUANTITY) |
+-------------------+
|          59986052 |
+-------------------+

tidb [10.0.1.5]> select count(distinct L_QUANTITY) from lineitem;
+----------------------------+
| count(distinct L_QUANTITY) |
+----------------------------+
|                         50 |
+----------------------------+

tidb [10.0.1.5]> select sum(distinct L_QUANTITY) from lineitem;
Before this commit(agg phase/ total phase) After this commit(agg phase/ total phase) performance regression(agg phase/ total phase)
7.21s/19s 12.86s/24.7s 78%/30%

Related changes

  • Need to cherry-pick to the release branch

@XuHuaiyu XuHuaiyu added type/bug-fix This PR fixes a bug. sig/execution SIG execution labels Mar 27, 2019
@XuHuaiyu
Copy link
Contributor Author

XuHuaiyu commented Mar 27, 2019

wait for #9901

@codecov
Copy link

codecov bot commented Mar 27, 2019

Codecov Report

Merging #9913 into master will increase coverage by 0.0047%.
The diff coverage is 75%.

@@               Coverage Diff                @@
##             master      #9913        +/-   ##
================================================
+ Coverage   77.5399%   77.5447%   +0.0047%     
================================================
  Files           404        403         -1     
  Lines         81772      81678        -94     
================================================
- Hits          63406      63337        -69     
+ Misses        13666      13642        -24     
+ Partials       4700       4699         -1

@XuHuaiyu
Copy link
Contributor Author

/run-all-tests tidb-test=pr/582

1 similar comment
@XuHuaiyu
Copy link
Contributor Author

/run-all-tests tidb-test=pr/582

@XuHuaiyu
Copy link
Contributor Author

/run-all-tests tidb-test=pr/582

@@ -38,7 +39,7 @@ type partialResult4SumDistinctFloat64 struct {

type partialResult4SumDistinctDecimal struct {
partialResult4SumDecimal
valSet set.DecimalSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about keep DecimalSet and use MyDecimal.ToHashKey to make DecimalSet.Exist correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep DecimalSet here, we'll call ToHashKey twice if DecimalSet.Exist() returns false. One in DecimalSet.Exist(), another in DecimalSet.Insert(). @qw4990

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add one method like InsertIfNotExists and return whether it's a value to be inserted.

One question is that, is the behavior of the current DecimalSet right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question is that, is the behavior of the current DecimalSet right?

Do you find anything wrong? @winoros

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since the DecimalSet only be in used in the places you changed in this pr.

I just wonder that whether there'll be some cases in the future that will need the original DecimalSet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DecimalSet will not be used by other places now. It was introduced when we were refactoring the agg. @winoros

@XuHuaiyu XuHuaiyu requested review from qw4990 and winoros and removed request for qw4990 March 29, 2019 06:54
@XuHuaiyu
Copy link
Contributor Author

/run-all-tests

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added status/LGT1 Indicates that a PR has LGTM 1. status/all tests passed labels Mar 29, 2019
@XuHuaiyu
Copy link
Contributor Author

XuHuaiyu commented Apr 1, 2019

PTAL @winoros

Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Apr 1, 2019
@zz-jason zz-jason merged commit 833ccf8 into pingcap:master Apr 1, 2019
@zz-jason
Copy link
Member

zz-jason commented Apr 1, 2019

@XuHuaiyu please cherry pick this PR to release-2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution status/LGT2 Indicates that a PR has LGTM 2. type/bug-fix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wrong result when select count/sum(distinct ) from int_col union all decimal_col
4 participants