Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected result of subquery #42912

Closed
DerZc opened this issue Apr 11, 2023 · 5 comments · Fixed by #43247
Closed

Unexpected result of subquery #42912

DerZc opened this issue Apr 11, 2023 · 5 comments · Fixed by #43247
Assignees
Labels
fuzz/sqlancer may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 severity/moderate sig/planner SIG: Planner type/bug This issue is a bug.

Comments

@DerZc
Copy link

DerZc commented Apr 11, 2023

Bug Report

1. Minimal reproduce step (Required)

Consider the following program:

CREATE TABLE t0(c0 TEXT(328) );

CREATE VIEW v0(c0) AS SELECT 'c' FROM t0;

INSERT INTO t0 VALUES (-12);

SELECT v0.c0 AS c0 FROM  v0 NATURAL RIGHT JOIN t0  WHERE (1 !=((v0.c0)REGEXP(-7)));  -- empty

SELECT COUNT(v0.c0) AS c0 FROM v0 WHERE EXISTS(SELECT v0.c0 AS c0 FROM v0 NATURAL RIGHT JOIN t0  WHERE (1 !=((v0.c0)REGEXP(-7))));  -- 1

The first query produce empty result, I put the first query in an EXISTS expression, which should be equals to FALSE, but the second query produce 1.

2. What did you expect to see? (Required)

The second query produce 0.
I run this query in MySQL, and it produce 0. https://www.db-fiddle.com/f/wNt1byF7hj6NXEDKDEsp4B/0

3. What did you see instead (Required)

The second query produce 1.

4. What is your TiDB version? (Required)

c233969

image

@DerZc DerZc added the type/bug This issue is a bug. label Apr 11, 2023
@aytrack aytrack added the sig/execution SIG execution label Apr 11, 2023
@ti-chi-bot ti-chi-bot added may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 labels Apr 17, 2023
@zanmato1984
Copy link
Contributor

This is an alias resolution issue.

If an alias, e.g. v1 is assigned to the top-level v0 to disambiguate the alias resolution, the plan will be:

mysql> explain SELECT COUNT(v1.c0) AS c0 FROM v0 v1 WHERE EXISTS(SELECT v0.c0 AS c0 FROM v0 NATURAL RIGHT JOIN t0  WHERE (1 !=((v0.c0)REGEXP(-7))));
+--------------------+---------+------+---------------+----------------------------------+
| id                 | estRows | task | access object | operator info                    |
+--------------------+---------+------+---------------+----------------------------------+
| StreamAgg_28       | 1.00    | root |               | funcs:count(Column#3)->Column#10 |
| └─TableDual_30     | 0.00    | root |               | rows:0                           |
+--------------------+---------+------+---------------+----------------------------------+
2 rows in set (0.01 sec)

As shown, the (non-correlated) subquery is evaluated in place and emits 0 rows as expected.

However the plan of the query in question is:

mysql> explain SELECT COUNT(v0.c0) AS c0 FROM v0 WHERE EXISTS(SELECT v0.c0 AS c0 FROM v0 NATURAL RIGHT JOIN t0  WHERE (1 !=((v0.c0)REGEXP(-7))));
+------------------------------------+---------+-----------+---------------+----------------------------------------------------+
| id                                 | estRows | task      | access object | operator info                                      |
+------------------------------------+---------+-----------+---------------+----------------------------------------------------+
| StreamAgg_18                       | 1.00    | root      |               | funcs:count(Column#3)->Column#10                   |
| └─HashJoin_19                      | 0.80    | root      |               | CARTESIAN semi join                                |
|   ├─HashJoin_24(Build)             | 1.00    | root      |               | right outer join, equal:[eq(Column#6, test.t0.c0)] |
|   │ ├─Projection_25(Build)         | 1.00    | root      |               | c->Column#6                                        |
|   │ │ └─TableReader_27             | 1.00    | root      |               | data:TableFullScan_26                              |
|   │ │   └─TableFullScan_26         | 1.00    | cop[tikv] | table:t0      | keep order:false, stats:pseudo                     |
|   │ └─TableReader_29(Probe)        | 1.00    | root      |               | data:TableFullScan_28                              |
|   │   └─TableFullScan_28           | 1.00    | cop[tikv] | table:t0      | keep order:false, stats:pseudo                     |
|   └─Projection_20(Probe)           | 1.00    | root      |               | c->Column#3                                        |
|     └─TableReader_22               | 1.00    | root      |               | data:TableFullScan_21                              |
|       └─TableFullScan_21           | 1.00    | cop[tikv] | table:t0      | keep order:false, stats:pseudo                     |
+------------------------------------+---------+-----------+---------------+----------------------------------------------------+
11 rows in set (0.01 sec)

As shown, it is recognized as a correlated subquery (and the consequential de-correlated plan), with the correlated condition being 1 !=((v0.c0)REGEXP(-7)).

More specifically, when the v0 is resolved to be the outer table, the query in question is effectively equal to:

SELECT COUNT(v0.c0) AS c0 FROM v0 WHERE EXISTS(SELECT v0.c0 AS c0 FROM v0 NATURAL RIGHT JOIN t0) AND (1 !=((v0.c0)REGEXP(-7)));

where the subquery emits 1 row, thus EXISTS evaluates true, and 1 !=((v0.c0)REGEXP(-7)) evaluates true as well. So the final COUNT emits 1.

Not sure this is a parser or planner issue. Add both labels for the corresponding people to future categorize.

@zanmato1984 zanmato1984 added sig/planner SIG: Planner sig/sql-infra SIG: SQL Infra and removed sig/execution SIG execution labels Apr 17, 2023
@bb7133
Copy link
Member

bb7133 commented Apr 17, 2023

@zanmato1984 @chrysan I believe this is related to the de-correlation of subqueries.

@bb7133 bb7133 removed the sig/sql-infra SIG: SQL Infra label Apr 17, 2023
@bb7133
Copy link
Member

bb7133 commented Apr 17, 2023

However, I doubt if this could be treat as 'critical bug' since it looks like an edge-case.

@aytrack
Copy link
Contributor

aytrack commented Apr 18, 2023

Downgrade to moderate since it an edge-case.

@AilinKid
Copy link
Contributor

AilinKid commented Apr 20, 2023

a traditional name resolution priority problem as we talked before

image

for the cases, when building a where clause (1 !=((v0.c0)REGEXP(-7))), the resolution flow is:

  1. find the col in current p (built from natural join, only t0.c0 is remained, v0.c0 is folded)
  2. find the col in the outer layer scope, (for case 1, nothing here, find the col in step 3 (non-correlated); for case 2, find the col here (outer scope has exactly the same name v0.c0, Hops!) and return immediately (causing correlated))
  3. find the col in special cases, for example here, the current layer's nature-join col name folding

the sequence is not that right, we can simply move 3 forward 2 to make this right.

PS. Every outer scope layer should also think about nature-join folding, we didn't think about that as yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fuzz/sqlancer may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 severity/moderate sig/planner SIG: Planner type/bug This issue is a bug.
Projects
None yet
8 participants