Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join on index has incorrect range and will contain NULL value #7226

Closed
birdstorm opened this issue Aug 1, 2018 · 6 comments

Comments

@birdstorm
Copy link
Member

commented Aug 1, 2018

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.

Make a simple join test which has NULL value with this file. IndexTest.txt

select a.id_dt, a.tp_decimal from full_data_type_table_idx a join full_data_type_table_idx b on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
  1. What did you expect to see?
mysql> select a.id_dt, a.tp_decimal from full_data_type_table_idx a join full_data_type_table_idx b on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
+-------+------------+
| id_dt | tp_decimal |
+-------+------------+
|    -1 |          0 |
|    -1 |          0 |
|     0 |          0 |
........
|    98 |         98 |
|    99 |         99 |
|   100 |        100 |
+-------+------------+
110 rows in set (0.02 sec)
  1. What did you see instead?
mysql> select a.id_dt, a.tp_decimal from full_data_type_table_idx a join full_data_type_table_idx b on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
+-------+------------+
| id_dt | tp_decimal |
+-------+------------+
| -1000 |       NULL |
|    -1 |          0 |
|    -1 |          0 |
|     0 |          0 |
..........
|    98 |         98 |
|    99 |         99 |
|   100 |        100 |
+-------+------------+
111 rows in set (0.02 sec)
mysql> explain select a.id_dt, a.tp_decimal from full_data_type_table_idx a join full_data_type_table_idx b on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
+--------------------------+--------+------+---------------------------------------------------------------+
| id                       | count  | task | operator info                                                 |
+--------------------------+--------+------+---------------------------------------------------------------+
| Projection_910           | 128.75 | root | a.id_dt, a.tp_decimal                                         |
| └─MergeJoin_911          | 128.75 | root | inner join, left key:a.tp_decimal, right key:b.tp_decimal     |
|   ├─IndexReader_44       | 103.00 | root | index:IndexScan_43                                            |
|   │ └─IndexScan_43       | 103.00 | cop  | table:a, index:tp_decimal, range:[NULL,+inf], keep order:true |
|   └─IndexReader_481      | 103.00 | root | index:IndexScan_480                                           |
|     └─IndexScan_480      | 103.00 | cop  | table:b, index:tp_decimal, range:[NULL,+inf], keep order:true |
+--------------------------+--------+------+---------------------------------------------------------------+
6 rows in set (0.00 sec)

You can see that the range is [NULL, +inf]

  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?

Latest master build. It also AFFECTS release-2.0 branch

In addition, Hash join is correct. @winoros

mysql> select /*+ TIDB_HJ(a, b) */ a.id_dt, a.tp_decimal from full_data_type_table_idx a use index(idx_tp_decimal) join full_data_type_table_idx b use index(idx_tp_decimal) on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
+-------+------------+
| id_dt | tp_decimal |
+-------+------------+
|    -1 |          0 |
|    -1 |          0 |
|     0 |          0 |
........
|    98 |         98 |
|    99 |         99 |
|   100 |        100 |
+-------+------------+
110 rows in set (0.02 sec)

mysql> explain select /*+ TIDB_HJ(a, b) */ a.id_dt, a.tp_decimal from full_data_type_table_idx a use index(idx_tp_decimal) join full_data_type_table_idx b use index(idx_tp_decimal) on a.tp_decimal = b.tp_decimal order by a.tp_decimal;
+----------------------------+--------+------+--------------------------------------------------------------------------+
| id                         | count  | task | operator info                                                            |
+----------------------------+--------+------+--------------------------------------------------------------------------+
| Sort_6                     | 128.75 | root | a.tp_decimal:asc                                                         |
| └─Projection_8             | 128.75 | root | a.id_dt, a.tp_decimal                                                    |
|   └─HashLeftJoin_16        | 128.75 | root | inner join, inner:IndexReader_21, equal:[eq(a.tp_decimal, b.tp_decimal)] |
|     ├─IndexReader_19       | 103.00 | root | index:IndexScan_18                                                       |
|     │ └─IndexScan_18       | 103.00 | cop  | table:a, index:tp_decimal, range:[NULL,+inf], keep order:false           |
|     └─IndexReader_21       | 103.00 | root | index:IndexScan_20                                                       |
|       └─IndexScan_20       | 103.00 | cop  | table:b, index:tp_decimal, range:[NULL,+inf], keep order:false           |
+----------------------------+--------+------+--------------------------------------------------------------------------+
7 rows in set (0.01 sec)
@zhexuany

This comment has been minimized.

Copy link
Member

commented Aug 1, 2018

@winoros It seems you are the right person to ping.

@zz-jason

This comment has been minimized.

Copy link
Member

commented Aug 2, 2018

@zhexuany The behavior of MergeJoin is not correct, I think you can take a look and try to fix it.

@zz-jason

This comment has been minimized.

Copy link
Member

commented Aug 2, 2018

I'll fix this as soon as possible. Hope it can be fixed in 2.0.6.

@zz-jason

This comment has been minimized.

Copy link
Member

commented Aug 2, 2018

Seems like both "Index Join" and "Merge Join" have the same problem:

TiDB(localhost:4000) > desc select /*+ TIDB_INLJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+--------------------------+-------+------+-----------------------------------------------------------------------------+
| id                       | count | task | operator info                                                               |
+--------------------------+-------+------+-----------------------------------------------------------------------------+
| Projection_15            | 3.75  | root | t1.a                                                                        |
| └─IndexJoin_19           | 3.75  | root | inner join, inner:IndexReader_18, outer key:t1.a, inner key:t2.a            |
|   ├─IndexReader_22       | 3.00  | root | index:IndexScan_21                                                          |
|   │ └─IndexScan_21       | 3.00  | cop  | table:t1, index:a, range:[NULL,+inf], keep order:true, stats:pseudo         |
|   └─IndexReader_18       | 0.00  | root | index:IndexScan_17                                                          |
|     └─IndexScan_17       | 0.00  | cop  | table:t2, index:a, range: decided by [t1.a], keep order:false, stats:pseudo |
+--------------------------+-------+------+-----------------------------------------------------------------------------+
6 rows in set (0.00 sec)

TiDB(localhost:4000) > select /*+ TIDB_INLJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+------+
| a    |
+------+
| NULL |
| 1.01 |
| 2.01 |
+------+
3 rows in set (0.00 sec)

TiDB(localhost:4000) > desc select /*+ TIDB_SMJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+--------------------------+-------+------+---------------------------------------------------------------------+
| id                       | count | task | operator info                                                       |
+--------------------------+-------+------+---------------------------------------------------------------------+
| Projection_16            | 3.75  | root | t1.a                                                                |
| └─MergeJoin_17           | 3.75  | root | inner join, left key:t1.a, right key:t2.a                           |
|   ├─IndexReader_12       | 3.00  | root | index:IndexScan_11                                                  |
|   │ └─IndexScan_11       | 3.00  | cop  | table:t1, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
|   └─IndexReader_15       | 3.00  | root | index:IndexScan_14                                                  |
|     └─IndexScan_14       | 3.00  | cop  | table:t2, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
+--------------------------+-------+------+---------------------------------------------------------------------+
6 rows in set (0.00 sec)

TiDB(localhost:4000) > select /*+ TIDB_SMJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+------+
| a    |
+------+
| NULL |
| 1.01 |
| 2.01 |
+------+
3 rows in set (0.00 sec)

TiDB(localhost:4000) > desc select /*+ TIDB_HJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+----------------------------+-------+------+-------------------------------------------------------------+
| id                         | count | task | operator info                                               |
+----------------------------+-------+------+-------------------------------------------------------------+
| Sort_6                     | 3.75  | root | t1.a:asc                                                    |
| └─Projection_8             | 3.75  | root | t1.a                                                        |
|   └─HashLeftJoin_16        | 3.75  | root | inner join, inner:TableReader_21, equal:[eq(t1.a, t2.a)]    |
|     ├─TableReader_19       | 3.00  | root | data:TableScan_18                                           |
|     │ └─TableScan_18       | 3.00  | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
|     └─TableReader_21       | 3.00  | root | data:TableScan_20                                           |
|       └─TableScan_20       | 3.00  | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo |
+----------------------------+-------+------+-------------------------------------------------------------+
7 rows in set (0.00 sec)

TiDB(localhost:4000) > select /*+ TIDB_HJ(t1) */ t1.a from t t1 join t t2 on t1.a=t2.a order by t1.a;
+------+
| a    |
+------+
| 1.01 |
| 2.01 |
+------+
2 rows in set (0.01 sec)
@zz-jason

This comment has been minimized.

Copy link
Member

commented Aug 2, 2018

For "Merge Join", the compare function is gained from "util/chunk/compare.go":

108 func cmpMyDecimal(l Row, lCol int, r Row, rCol int) int {
109     lNull, rNull := l.IsNull(lCol), r.IsNull(rCol)
110     if lNull || rNull {
111         return cmpNull(lNull, rNull)
112     }
113     lDec, rDec := l.GetMyDecimal(lCol), r.GetMyDecimal(rCol)
114     return lDec.Compare(rDec)
115 }

And the behavior of cmpNull is:

 58 func cmpNull(lNull, rNull bool) int {
 59     if lNull && rNull {
 60         return 0
 61     }
 62     if lNull {
 63         return -1
 64     }
 65     return 1
 66 }

In a word, these functions have this common behavior: NULL is equal to NULL. We should check whether the join key has any NULL value before comparison.

@birdstorm

This comment has been minimized.

Copy link
Member Author

commented Aug 6, 2018

fixed by #7255

@birdstorm birdstorm closed this Aug 6, 2018
@zz-jason zz-jason self-assigned this Aug 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.