Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary incorrect query results occur after create table. #44314

Closed
wd0517 opened this issue May 31, 2023 · 8 comments · Fixed by #44409
Closed

Temporary incorrect query results occur after create table. #44314

wd0517 opened this issue May 31, 2023 · 8 comments · Fixed by #44409

Comments

@wd0517
Copy link
Contributor

wd0517 commented May 31, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

CREATE DATABASE `django_tests` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci */;
USE `django_tests`;

CREATE TABLE `queries_identifier` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `name` varchar(100) COLLATE utf8mb4_general_ci NOT NULL,
   PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;


 CREATE TABLE `queries_program` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `identifier_id` int(11) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  UNIQUE KEY `identifier_id` (`identifier_id`),
  CONSTRAINT `queries_program_identifier_id_70ff12a6_fk_queries_identifier_id` FOREIGN KEY (`identifier_id`) REFERENCES `django_tests`.`queries_identifier` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

CREATE TABLE `queries_channel` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `identifier_id` int(11) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  UNIQUE KEY `identifier_id` (`identifier_id`),
  CONSTRAINT `queries_channel_identifier_id_06ac3513_fk_queries_identifier_id` FOREIGN KEY (`identifier_id`) REFERENCES `django_tests`.`queries_identifier` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

INSERT  INTO queries_identifier(`id`, `name`) values(13, 'i1'), (14, 'i2'), (15, 'i3');
INSERT  INTO queries_program(`id`, `identifier_id`) values(8, 13), (9, 14);
INSERT  INTO queries_channel(`id`, `identifier_id`) values(5, 13);

SELECT `queries_identifier`.`id`, `queries_identifier`.`name` 
FROM `queries_identifier` 
LEFT OUTER JOIN `queries_channel` ON (`queries_identifier`.`id` = `queries_channel`.`identifier_id`) 
INNER JOIN `queries_program` ON (`queries_identifier`.`id` = `queries_program`.`identifier_id`) 
WHERE ((`queries_channel`.`id` = 5 AND `queries_program`.`id` = 9) OR `queries_program`.`id` = 8) 
ORDER BY `queries_identifier`.`id` ASC;

2. What did you expect to see? (Required)

+----+------+
| id | name |
+----+------+
| 13 | i1   |
+----+------+

3. What did you see instead (Required)

During a period of time after creating the table.

+----+------+
| id | name |
+----+------+
| 13 | i1   |
| 14 | i2   |
+----+------+

4. What is your TiDB version? (Required)

I can reproduce this bug in both 7.1.0 and 6.6.0.

Release Version: v7.1.0
Edition: Community
Git Commit Hash: 635a4362235e8a3c0043542e629532e3c7bb2756
Git Branch: heads/refs/tags/v7.1.0
UTC Build Time: 2023-05-30 10:50:03
GoVersion: go1.20.3
Race Enabled: false
TiKV Min Version: 6.2.0-alpha
Check Table Before Drop: false
Store: tikv
Release Version: v6.6.0-serverless
Edition: Community
Git Commit Hash: 5dbf0508b23c87a625e17bd3a992be17e7635c1d
Git Branch: release-6.6-serverless
UTC Build Time: 2023-05-31 10:36:07
GoVersion: go1.19.4
Race Enabled: false
TiKV Min Version: 6.1.0
Check Table Before Drop: false
Store: tikv
@wd0517 wd0517 added the type/bug This issue is a bug. label May 31, 2023
@ti-chi-bot ti-chi-bot bot added may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 labels Jun 2, 2023
@jebter jebter added the sig/execution SIG execution label Jun 2, 2023
@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Jun 2, 2023

tidb:4100 [django_tests]>  SELECT /*+ hash_join(queries_identifier)*/ `queries_identifier`.`id`, `queries_identifier`.`name` FROM `queries_identifier` LEFT OUTER JOIN `queries_channe                                                                                                                                 = `queries_program`.`identifier_id`) WHERE ((`queries_channel`.`id` = 5 AND `queries_program`.`id` = 9) OR `queries_program`.`id` = 8) ORDER BY `queries_identifier`.`id` ASC;
+----+------+
| id | name |
+----+------+
| 13 | i1   |
+----+------+
1 row in set (0.00 sec)

tidb:4100 [django_tests]> explain  SELECT /*+ hash_join(queries_identifier)*/ `queries_identifier`.`id`, `queries_identifier`.`name` FROM `queries_identifier` LEFT OUTER JOIN `queries_channel` ON (`queries_identifier`.`id` = `queries_channel`.`identifier_id`) INNER JOIN `queries_program` ON (`queries_identifier`.`id` = `queries_program`.`identifier_id`) WHERE ((`queries_channel`.`id` = 5 AND `queries_program`.`id` = 9) OR `queries_program`.`id` = 8) ORDER BY `queries_identifier`.`id` ASC;
+-----------------------------------+---------+-----------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                                | estRows | task      | access object            | operator info                                                                                                                                                                                                                                      |
+-----------------------------------+---------+-----------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Sort_12                           | 2.50    | root      |                          | django_tests.queries_identifier.id                                                                                                                                                                                                                 |
| └─HashJoin_15                     | 2.50    | root      |                          | inner join, equal:[eq(django_tests.queries_identifier.id, django_tests.queries_program.identifier_id)], other cond:or(and(eq(django_tests.queries_channel.id, 5), eq(django_tests.queries_program.id, 9)), eq(django_tests.queries_program.id, 8)) |
|   ├─Batch_Point_Get_35(Build)     | 2.00    | root      | table:queries_program    | handle:[8 9], keep order:false, desc:false                                                                                                                                                                                                         |
|   └─HashJoin_27(Probe)            | 3.00    | root      |                          | left outer join, equal:[eq(django_tests.queries_identifier.id, django_tests.queries_channel.identifier_id)]                                                                                                                                        |
|     ├─TableReader_32(Build)       | 1.00    | root      |                          | data:TableFullScan_31                                                                                                                                                                                                                              |
|     │ └─TableFullScan_31          | 1.00    | cop[tikv] | table:queries_channel    | keep order:false, stats:pseudo                                                                                                                                                                                                                     |
|     └─TableReader_30(Probe)       | 3.00    | root      |                          | data:TableFullScan_29                                                                                                                                                                                                                              |
|       └─TableFullScan_29          | 3.00    | cop[tikv] | table:queries_identifier | keep order:false, stats:pseudo                                                                                                                                                                                                                     |
+-----------------------------------+---------+-----------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.00 sec)

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Jun 2, 2023

drop stats queries_identifier;
drop stats queries_program;
drop stats queries_channel;


tidb:4100 [django_tests]> SELECT `queries_identifier`.`id`, `queries_identifier`.`name` FROM `queries_identifier` LEFT OUTER JOIN `queries_channel` ON (`queries_identifier`.`id` = `queries_channel`.`identifier_id`) INNER JOIN `queries_program` ON (`queries_identifier`.`id` = `queries_program`.`identifier_id`) WHERE ((`queries_channel`.`id` = 5 AND `queries_program`.`id` = 9) OR `queries_program`.`id` = 8) ORDER BY `queries_identifier`.`id` ASC;
+----+------+
| id | name |
+----+------+
| 13 | i1   |
| 14 | i2   |
+----+------+
2 rows in set (0.00 sec)

tidb:4100 [django_tests]> explain analyze SELECT `queries_identifier`.`id`, `queries_identifier`.`name` FROM `queries_identifier` LEFT OUTER JOIN `queries_channel` ON (`queries_identifier`.`id` = `queries_channel`.`identifier_id`) INNER JOIN `queries_program` ON (`queries_identifier`.`id` = `queries_program`.`identifier_id`) WHERE ((`queries_channel`.`id` = 5 AND `queries_program`.`id` = 9) OR `queries_program`.`id` = 8) ORDER BY `queries_identifier`.`id` ASC;
+---------------------------------------+---------+---------+-----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+
| id                                    | estRows | actRows | task      | access object                                             | execution info                                                                                                                                                                                                      | operator info                                                                                                                                                                                                                                                                                                                                                                        | memory    | disk    |
+---------------------------------------+---------+---------+-----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+
| Sort_13                               | 3.12    | 2       | root      |                                                           | time:555.8µs, loops:2, RU:0.000000                                                                                                                                                                                  | django_tests.queries_identifier.id                                                                                                                                                                                                                                                                                                                                                   | 1.01 KB   | 0 Bytes |
| └─IndexJoin_20                        | 3.12    | 2       | root      |                                                           | time:545.8µs, loops:2, inner:{total:138.1µs, concurrency:5, task:1, construct:4.94µs, fetch:127.2µs, build:1.24µs}, probe:20µs                                                                                      | left outer join, inner:IndexReader_19, outer key:django_tests.queries_identifier.id, inner key:django_tests.queries_channel.identifier_id, equal cond:eq(django_tests.queries_identifier.id, django_tests.queries_channel.identifier_id), other cond:or(and(eq(django_tests.queries_channel.id, 5), eq(django_tests.queries_program.id, 9)), eq(django_tests.queries_program.id, 8)) | 18.4 KB   | N/A     |
|   ├─IndexHashJoin_35(Build)           | 2.50    | 2       | root      |                                                           | time:338.3µs, loops:3, inner:{total:210.9µs, concurrency:5, task:1, construct:3.89µs, fetch:201.6µs, build:3.13µs, join:3.4µs}                                                                                      | inner join, inner:TableReader_30, outer key:django_tests.queries_program.identifier_id, inner key:django_tests.queries_identifier.id, equal cond:eq(django_tests.queries_program.identifier_id, django_tests.queries_identifier.id)                                                                                                                                                  | 41.3 KB   | N/A     |
|   │ ├─Batch_Point_Get_40(Build)       | 2.00    | 2       | root      | table:queries_program                                     | time:72.3µs, loops:3, BatchGet:{num_rpc:1, total_time:16.9µs}                                                                                                                                                       | handle:[8 9], keep order:false, desc:false                                                                                                                                                                                                                                                                                                                                           | N/A       | N/A     |
|   │ └─TableReader_30(Probe)           | 2.00    | 2       | root      |                                                           | time:141.8µs, loops:2, cop_task: {num: 1, max: 98.9µs, proc_keys: 0, rpc_num: 1, rpc_time: 88.9µs, copr_cache_hit_ratio: 0.00, build_task_duration: 17µs, max_distsql_concurrency: 1, max_extra_concurrency: 1}     | data:TableRangeScan_29                                                                                                                                                                                                                                                                                                                                                               | N/A       | N/A     |
|   │   └─TableRangeScan_29             | 2.00    | 2       | cop[tikv] | table:queries_identifier                                  | tikv_task:{time:78µs, loops:0}                                                                                                                                                                                      | range: decided by [django_tests.queries_program.identifier_id], keep order:false, stats:pseudo                                                                                                                                                                                                                                                                                       | N/A       | N/A     |
|   └─IndexReader_19(Probe)             | 2.50    | 1       | root      |                                                           | time:88µs, loops:2, cop_task: {num: 1, max: 60.2µs, proc_keys: 0, rpc_num: 1, rpc_time: 50.9µs, copr_cache_hit_ratio: 0.00, build_task_duration: 1.97µs, max_distsql_concurrency: 1}                                | index:IndexRangeScan_18                                                                                                                                                                                                                                                                                                                                                              | 218 Bytes | N/A     |
|     └─IndexRangeScan_18               | 2.50    | 1       | cop[tikv] | table:queries_channel, index:identifier_id(identifier_id) | tikv_task:{time:45µs, loops:0}                                                                                                                                                                                      | range: decided by [eq(django_tests.queries_channel.identifier_id, django_tests.queries_identifier.id)], keep order:false, stats:pseudo                                                                                                                                                                                                                                               | N/A       | N/A     |
+---------------------------------------+---------+---------+-----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+
8 rows in set (0.00 sec)

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Jun 2, 2023

It seems like the correct plan should be the following plan shows. The where filters should be put into a Selection but not the other conditions of the outer join.

+-------------------------------------------+---------+---------+-----------+-----------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+
| id                                        | estRows | actRows | task      | access object                                             | execution info                                                                                                                                                                                                       | operator info                                                                                                                                                                                                                            | memory    | disk    |
+-------------------------------------------+---------+---------+-----------+-----------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+
| Sort_10                                   | 2.00    | 1       | root      |                                                           | time:487.2µs, loops:2, RU:0.000000                                                                                                                                                                                   | django_tests.queries_identifier.id                                                                                                                                                                                                       | 24.5 KB   | 0 Bytes |
| └─Projection_12                           | 2.00    | 1       | root      |                                                           | time:476.9µs, loops:2, Concurrency:OFF                                                                                                                                                                               | django_tests.queries_identifier.id, django_tests.queries_identifier.name                                                                                                                                                                 | 41.0 KB   | N/A     |
|   └─Selection_13                          | 2.00    | 1       | root      |                                                           | time:474.6µs, loops:2                                                                                                                                                                                                | or(and(eq(django_tests.queries_channel.id, 5), eq(django_tests.queries_program.id, 9)), eq(django_tests.queries_program.id, 8))                                                                                                          | 17.0 KB   | N/A     |
|     └─IndexJoin_17                        | 2.50    | 2       | root      |                                                           | time:460.9µs, loops:3, inner:{total:98.2µs, concurrency:5, task:1, construct:1.98µs, fetch:91.9µs, build:712ns}, probe:1.57µs                                                                                        | left outer join, inner:IndexReader_16, outer key:django_tests.queries_identifier.id, inner key:django_tests.queries_channel.identifier_id, equal cond:eq(django_tests.queries_identifier.id, django_tests.queries_channel.identifier_id) | 18.4 KB   | N/A     |
|       ├─IndexHashJoin_30(Build)           | 2.50    | 2       | root      |                                                           | time:318.1µs, loops:3, inner:{total:213.7µs, concurrency:5, task:1, construct:3.67µs, fetch:205.7µs, build:8.31µs, join:3.03µs}                                                                                      | inner join, inner:TableReader_25, outer key:django_tests.queries_program.identifier_id, inner key:django_tests.queries_identifier.id, equal cond:eq(django_tests.queries_program.identifier_id, django_tests.queries_identifier.id)      | 27.2 KB   | N/A     |
|       │ ├─Batch_Point_Get_33(Build)       | 2.00    | 2       | root      | table:queries_program                                     | time:54µs, loops:3, BatchGet:{num_rpc:1, total_time:13.4µs}                                                                                                                                                          | handle:[8 9], keep order:false, desc:false                                                                                                                                                                                               | N/A       | N/A     |
|       │ └─TableReader_25(Probe)           | 2.00    | 2       | root      |                                                           | time:149µs, loops:2, cop_task: {num: 1, max: 103.1µs, proc_keys: 0, rpc_num: 1, rpc_time: 91.1µs, copr_cache_hit_ratio: 0.00, build_task_duration: 18.6µs, max_distsql_concurrency: 1, max_extra_concurrency: 1}     | data:TableRangeScan_24                                                                                                                                                                                                                   | N/A       | N/A     |
|       │   └─TableRangeScan_24             | 2.00    | 2       | cop[tikv] | table:queries_identifier                                  | tikv_task:{time:79.2µs, loops:0}                                                                                                                                                                                     | range: decided by [django_tests.queries_program.identifier_id], keep order:false, stats:pseudo                                                                                                                                           | N/A       | N/A     |
|       └─IndexReader_16(Probe)             | 2.50    | 1       | root      |                                                           | time:65.5µs, loops:2, cop_task: {num: 1, max: 45.1µs, proc_keys: 0, rpc_num: 1, rpc_time: 37.6µs, copr_cache_hit_ratio: 0.00, build_task_duration: 2.28µs, max_distsql_concurrency: 1}                               | index:IndexRangeScan_15                                                                                                                                                                                                                  | 218 Bytes | N/A     |
|         └─IndexRangeScan_15               | 2.50    | 1       | cop[tikv] | table:queries_channel, index:identifier_id(identifier_id) | tikv_task:{time:32.6µs, loops:0}                                                                                                                                                                                     | range: decided by [eq(django_tests.queries_channel.identifier_id, django_tests.queries_identifier.id)], keep order:false, stats:pseudo                                                                                                   | N/A       | N/A     |
+-------------------------------------------+---------+---------+-----------+-----------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+
10 rows in set (0.00 sec)

@XuHuaiyu XuHuaiyu added sig/planner SIG: Planner and removed sig/execution SIG execution labels Jun 2, 2023
@elsa0520
Copy link
Contributor

elsa0520 commented Jun 2, 2023

The main reason is that 'outer join order'. I disable the 'tidb_enable_outer_join_reorder' and the where condition appears on the inner join other condition which is correct.
When the 'tidb_enable_outer_join_reorder' is ON, the join order changed to
``` queries_identifier inner join `queries_program` left join `queries_channel`
and the where condition has been pushed down on left join other condition which is incorrect.

The error plan is following:
The where clause should not be pushed down when the under join is left outer join.
image

@winoros Please take a look ~

@elsa0520
Copy link
Contributor

elsa0520 commented Jun 2, 2023

/assign @winoros

@elsa0520
Copy link
Contributor

elsa0520 commented Jun 2, 2023

/affect v6.5 ~ master

@elsa0520
Copy link
Contributor

elsa0520 commented Jun 2, 2023

v6.4 the 'tidb_enable_outer_join_reorder' is OFF in default so it is not affected

@AilinKid
Copy link
Contributor

AilinKid commented Jun 5, 2023

mock the below code in rule_join_reorder_greedy.go L60 to make sure reproduce is stable.

	if strings.HasPrefix(s.ctx.GetSessionVars().StmtCtx.OriginalSQL, "explain SELECT `queries_identifier`.`id`") {
		fmt.Println(1)
	}
	// mock code to let channel table has the most cost.
	for i, one := range s.curJoinGroup {
		if  ds, ok := one.p.(*DataSource); ok {
			if strings.HasSuffix(ds.tableInfo.Name.L, "channel") {
				s.curJoinGroup[i].cumCost += 100
				break
			}
		}
	}

Reason: outer join's other condition influence is quite different from inner join's other condition, the latter one functions as a simple filter, just integrated with inner join. The former functions as a simple filter but also decides whether the outer join is matched or not, consequently deciding whether appending an additional null (which means this row itself is not pruned in the final result)

So when enumerating join order when encountering inner join combined with outer join, those original other conditions bonded with an outer join, just let it be; those additional remained other conditions shouldn't be put down into it, adding additional selection above is a resolution.

@AilinKid AilinKid removed may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 labels Jun 5, 2023
ti-chi-bot bot pushed a commit that referenced this issue Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants