Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation with sort and limit is not push down sort and limit parameter to TiKV #3938

Closed
blacktear23 opened this issue Jul 28, 2017 · 4 comments

Comments

@blacktear23
Copy link
Contributor

blacktear23 commented Jul 28, 2017

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

Execute a SQL with aggregation, sort and limit got a plan not efficient.

Table structure:

CREATE TABLE `indextest` (
  `name` varchar(255) NOT NULL,
  `success` int(11) NOT NULL,
  `date` int(11) NOT NULL,
  KEY `idx_date` (`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

Execute SQL:

explain select sum(success), name from indextest group by name order by sum(success) desc limit 0, 10;

If indextest has more than 10000000 distinct name so the plan will execute like:

1. TiKV do aggregation and send all data to TiDB
2. TiDB do hash aggregation
3. TiDB do sort and limit

step 1 will cause TiKV send huge data to TiDB, but actually we just need first 10 records.

  1. What did you expect to see?

The TableScan plan should include sort and limit like SELECT statements.

  1. What did you see instead?

Execution Plan

*************************** 1. row ***************************
      ID: TableScan_8
    Json: {
    "db": "test",
    "table": "indextest",
    "desc": false,
    "keep order": false,
    "push down info": {
        "limit": 0,
        "aggregated push down": true,
        "gby items": [
            "test.indextest.name"
        ],
        "agg funcs": [
            "sum(test.indextest.success)",
            "firstrow(test.indextest.name)"
        ],
        "access conditions": null,
        "index filter conditions": null,
        "table filter conditions": null
    }
}
ParentID: HashAgg_9
*************************** 2. row ***************************
      ID: HashAgg_9
    Json: {
    "AggFuncs": [
        "sum([test.indextest.success])",
        "firstrow([test.indextest.name])"
    ],
    "GroupByItems": [
        "[test.indextest.name]"
    ],
    "child": "TableScan_8"
}
ParentID: Sort_4
*************************** 3. row ***************************
      ID: Sort_4
    Json: {
    "exprs": [
        {
            "Expr": "sel_agg_3",
            "Desc": true
        }
    ],
    "limit": 10,
    "child": "HashAgg_9"
}
ParentID:
  1. What version of TiDB are you using (tidb-server -V)?

Git Commit Hash: e94996a
UTC Build Time: 2017-07-28 10:41:27

@blacktear23 blacktear23 changed the title Aggregation with sort and limit is not push down to TiKV Aggregation with sort and limit is not push down sort and limit parameter to TiKV Jul 28, 2017
@hanfei1991
Copy link
Member

Hay, if TableScan plan should include sort and limit like SELECT statements., we may get wrong result.

@blacktear23
Copy link
Contributor Author

@hanfei1991 I got it. Is there has some method to optimize this problem? I got a bad performance when do aggregation like this.

@blacktear23
Copy link
Contributor Author

If SQL without order by just has a limit can we add limit info to push down?

@hanfei1991
Copy link
Member

We will support StreamAggregation to keep the pipeline running when the group by key is also an index key. It will be a great progress when agg meets a limit operator. @zimulala will do this work next month. please keep patient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants