Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partition pruning not effective on date scenarios #19941

Open
ghost opened this issue Sep 10, 2020 · 8 comments · Fixed by #20064
Open

partition pruning not effective on date scenarios #19941

ghost opened this issue Sep 10, 2020 · 8 comments · Fixed by #20064

Comments

@ghost
Copy link

ghost commented Sep 10, 2020

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
 d date NOT NULL
) PARTITION BY RANGE (YEAR(d)) (
 PARTITION p2016 VALUES LESS THAN (2017),
 PARTITION p2017 VALUES LESS THAN (2018),
 PARTITION p2018 VALUES LESS THAN (2019),
 PARTITION p2019 VALUES LESS THAN (2020),
 PARTITION pmax VALUES LESS THAN MAXVALUE
);

INSERT INTO t1 VALUES 
 ('2016-01-01'),
 ('2016-06-01'),
 ('2016-09-01'),
 ('2017-01-01'),
 ('2017-06-01'),
 ('2017-09-01'),
 ('2018-01-01'),
 ('2018-06-01'),
 ('2018-09-01'),
 ('2019-01-01'),
 ('2019-06-01'),
 ('2019-09-01'),
 ('2020-01-01'),
 ('2020-06-01'),
 ('2020-09-01');

EXPLAIN SELECT * FROM t1 WHERE d BETWEEN '2018-01-01 00:00' AND '2018-12-31 23:59:59';
EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2019-01-01';

2. What did you expect to see? (Required)

In MySQL (see "partitions" column):

mysql [localhost:5731] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE d BETWEEN '2018-01-01 00:00' AND '2018-12-31 23:59:59';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t1    | p2018      | ALL  | NULL          | NULL | NULL    | NULL |    3 |    33.33 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:5731] {msandbox} (test) > EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2019-01-01';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t1    | p2018      | ALL  | NULL          | NULL | NULL    | NULL |    3 |    33.33 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

3. What did you see instead (Required)

In TiDB (see "access object"):

mysql> EXPLAIN SELECT * FROM t1 WHERE d BETWEEN '2018-01-01 00:00' AND '2018-12-31 23:59:59';
+-------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
| id                      | estRows  | task      | access object | operator info                                                                        |
+-------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
| TableReader_7           | 250.00   | root      | partition:all | data:Selection_6                                                                     |
| └─Selection_6           | 250.00   | cop[tikv] |               | ge(test.t1.d, 2018-01-01 00:00:00.000000), le(test.t1.d, 2018-12-31 23:59:59.000000) |
|   └─TableFullScan_5     | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                                       |
+-------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2019-01-01';
+------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
| id                           | estRows  | task      | access object | operator info                                                                        |
+------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
| StreamAgg_20                 | 1.00     | root      |               | funcs:count(Column#5)->Column#3                                                      |
| └─TableReader_21             | 1.00     | root      | partition:all | data:StreamAgg_9                                                                     |
|   └─StreamAgg_9              | 1.00     | cop[tikv] |               | funcs:count(1)->Column#5                                                             |
|     └─Selection_19           | 250.00   | cop[tikv] |               | ge(test.t1.d, 2018-01-01 00:00:00.000000), lt(test.t1.d, 2019-01-01 00:00:00.000000) |
|       └─TableFullScan_18     | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                                       |
+------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

4. What is your TiDB version? (Required)

mysql> SELECT tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v4.0.0-beta.2-1194-gc022fdcae
Edition: Community
Git Commit Hash: c022fdcaebd311f59f5c179b2a6a2f7e808a3340
Git Branch: master
UTC Build Time: 2020-09-09 12:13:26
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set (0.00 sec)
@ghost ghost added the type/bug This issue is a bug. label Sep 10, 2020
@ghost
Copy link
Author

ghost commented Sep 10, 2020

@imtbkcat @tiancaiamao PTAL

@imtbkcat imtbkcat self-assigned this Sep 11, 2020
@ghost ghost added this to Issue Backlog: Need Triage in SIG Planner Kanban via automation Sep 14, 2020
@ghost ghost mentioned this issue Sep 23, 2020
9 tasks
SIG Planner Kanban automation moved this from Issue Backlog: Need Triage to Coding Finished (This Week) Sep 25, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Sep 27, 2020

Integrity check:
component severity RCA symptom trigger_condition affect_version fix_version fields are empty
@imtbkcat
Please comment /info to get template

@ghost
Copy link
Author

ghost commented Sep 27, 2020

This looks mostly fixed, with the exception of TiDB needing to access {p2018, p2019}. MySQL can prune this to just p2018 - is there an off-by-one error in pruning?

mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2019-01-01';
+------------------------------------+----------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| id                                 | estRows  | task      | access object             | operator info                                                                        |
+------------------------------------+----------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| HashAgg_15                         | 1.00     | root      |                           | funcs:count(Column#4)->Column#3                                                      |
| └─PartitionUnion_16                | 2.00     | root      |                           |                                                                                      |
|   ├─StreamAgg_31                   | 1.00     | root      |                           | funcs:count(Column#6)->Column#4                                                      |
|   │ └─TableReader_32               | 1.00     | root      |                           | data:StreamAgg_20                                                                    |
|   │   └─StreamAgg_20               | 1.00     | cop[tikv] |                           | funcs:count(1)->Column#6                                                             |
|   │     └─Selection_30             | 250.00   | cop[tikv] |                           | ge(test.t1.d, 2018-01-01 00:00:00.000000), lt(test.t1.d, 2019-01-01 00:00:00.000000) |
|   │       └─TableFullScan_29       | 10000.00 | cop[tikv] | table:t1, partition:p2018 | keep order:false, stats:pseudo                                                       |
|   └─StreamAgg_50                   | 1.00     | root      |                           | funcs:count(Column#8)->Column#4                                                      |
|     └─TableReader_51               | 1.00     | root      |                           | data:StreamAgg_39                                                                    |
|       └─StreamAgg_39               | 1.00     | cop[tikv] |                           | funcs:count(1)->Column#8                                                             |
|         └─Selection_49             | 250.00   | cop[tikv] |                           | ge(test.t1.d, 2018-01-01 00:00:00.000000), lt(test.t1.d, 2019-01-01 00:00:00.000000) |
|           └─TableFullScan_48       | 10000.00 | cop[tikv] | table:t1, partition:p2019 | keep order:false, stats:pseudo                                                       |
+------------------------------------+----------+-----------+---------------------------+--------------------------------------------------------------------------------------+
12 rows in set (0.00 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2018-12-31';
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| id                         | estRows | task      | access object             | operator info                                                                        |
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| StreamAgg_12               | 1.00    | root      |                           | funcs:count(1)->Column#3                                                             |
| └─TableReader_19           | 0.07    | root      |                           | data:Selection_18                                                                    |
|   └─Selection_18           | 0.07    | cop[tikv] |                           | ge(test.t1.d, 2018-01-01 00:00:00.000000), lt(test.t1.d, 2018-12-31 00:00:00.000000) |
|     └─TableFullScan_17     | 3.00    | cop[tikv] | table:t1, partition:p2018 | keep order:false, stats:pseudo                                                       |
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
4 rows in set (0.01 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d <= '2018-12-31';
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| id                         | estRows | task      | access object             | operator info                                                                        |
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| StreamAgg_12               | 1.00    | root      |                           | funcs:count(1)->Column#3                                                             |
| └─TableReader_19           | 0.07    | root      |                           | data:Selection_18                                                                    |
|   └─Selection_18           | 0.07    | cop[tikv] |                           | ge(test.t1.d, 2018-01-01 00:00:00.000000), le(test.t1.d, 2018-12-31 00:00:00.000000) |
|     └─TableFullScan_17     | 3.00    | cop[tikv] | table:t1, partition:p2018 | keep order:false, stats:pseudo                                                       |
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

@ghost ghost reopened this Sep 27, 2020
SIG Planner Kanban automation moved this from Coding Finished (This Week) to Issue Backlog: Need Triage Sep 27, 2020
@pingcap pingcap deleted a comment from ti-srebot Nov 4, 2020
@seiya-annie
Copy link

/info

@ti-srebot
Copy link
Contributor

Please edit this comment to complete the following information

Not a bug

  1. Remove the 'type/bug' label
  2. Add notes to indicate why it is not a bug

Duplicate bug

  1. Add the 'type/duplicate' label
  2. Add the link to the original bug

Bug

Note: Make Sure that 'component', and 'severity' labels are added
Example for how to fill out the template: #20100

1. Root Cause Analysis (RCA) (optional)

2. Symptom (optional)

3. All Trigger Conditions (optional)

4. Workaround (optional)

5. Affected versions

6. Fixed versions

@mjonss
Copy link
Contributor

mjonss commented Feb 14, 2022

Same with dynamic prune mode:

tidb> set @@session.tidb_partition_prune_mode = dynamic;
Query OK, 0 rows affected (0,00 sec)

tidb> EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2019-01-01';
+------------------------------+----------+-----------+-----------------------+--------------------------------------------------------------------------------------+
| id                           | estRows  | task      | access object         | operator info                                                                        |
+------------------------------+----------+-----------+-----------------------+--------------------------------------------------------------------------------------+
| StreamAgg_20                 | 1.00     | root      |                       | funcs:count(Column#5)->Column#3                                                      |
| └─TableReader_21             | 1.00     | root      | partition:p2018,p2019 | data:StreamAgg_9                                                                     |
|   └─StreamAgg_9              | 1.00     | cop[tikv] |                       | funcs:count(1)->Column#5                                                             |
|     └─Selection_19           | 250.00   | cop[tikv] |                       | ge(test.t1.d, 2018-01-01 00:00:00.000000), lt(test.t1.d, 2019-01-01 00:00:00.000000) |
|       └─TableFullScan_18     | 10000.00 | cop[tikv] | table:t1              | keep order:false, stats:pseudo                                                       |
+------------------------------+----------+-----------+-----------------------+--------------------------------------------------------------------------------------+
5 rows in set (0,00 sec)

@mjonss
Copy link
Contributor

mjonss commented Feb 14, 2022

The difference between MySQL and TiDB is that MySQL will check the 'transitions' for YEAR() function, so MySQL will prune correctly for < '2019-01-01' while TiDB will not. Notice that this is only for the "first transition value" and TiDB will prune correctly for < '2019-01-02 (which includes the p2019 partition).

@mjonss
Copy link
Contributor

mjonss commented Feb 14, 2022

/component tablepartition

@ti-chi-bot ti-chi-bot added the component/tablepartition This issue is related to Table Partition of TiDB. label Feb 14, 2022
@morgo morgo removed their assignment May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tablepartition This issue is related to Table Partition of TiDB. type/enhancement type/performance
Projects
Partition Table
  
Backlogs in Sprint 3
SIG Planner Kanban
  
Issue Backlog: Need Triage
Development

Successfully merging a pull request may close this issue.

9 participants