Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGEXP fails with invalid pattern #48360

Open
dveeden opened this issue Nov 7, 2023 · 7 comments
Open

REGEXP fails with invalid pattern #48360

dveeden opened this issue Nov 7, 2023 · 7 comments
Labels
affects-7.1 affects-7.5 affects-8.1 compatibility-mysql8 This is a compatibility issue with MySQL 8.0(but NOT 5.7) may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 severity/major sig/execution SIG execution type/bug This issue is a bug.

Comments

@dveeden
Copy link
Contributor

dveeden commented Nov 7, 2023

Bug Report

1. Minimal reproduce step (Required)

CREATE TABLE t1 (id INT PRIMARY KEY AUTO_INCREMENT, data VARCHAR(255));
SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;

2. What did you expect to see? (Required)

MySQL 8.2.0

sql> CREATE TABLE t1 (id INT PRIMARY KEY AUTO_INCREMENT, data VARCHAR(255));
Query OK, 0 rows affected (0.1066 sec)

sql> SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
Empty set (0.0264 sec)

3. What did you see instead (Required)

sql> CREATE TABLE t1 (id INT PRIMARY KEY AUTO_INCREMENT, data VARCHAR(255));
Query OK, 0 rows affected (0.1961 sec)

sql> SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
ERROR: 1105 (HY000): other error: Invalid regexp pattern: Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
    [[,]23[],]
    ^
error: unclosed character class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)

4. What is your TiDB version? (Required)

Release Version: v7.4.0
Edition: Community
Git Commit Hash: 38cb4f3312be9199a983c0ef282d2ea2e28a7824
Git Branch: heads/refs/tags/v7.4.0
UTC Build Time: 2023-10-10 14:18:50
GoVersion: go1.21.1
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.0011 sec)

Also reproduced with v7.1.1

@dveeden dveeden added type/bug This issue is a bug. sig/sql-infra SIG: SQL Infra compatibility-mysql8 This is a compatibility issue with MySQL 8.0(but NOT 5.7) affects-7.1 affects-7.4 labels Nov 7, 2023
@dveeden dveeden self-assigned this Nov 7, 2023
@dveeden
Copy link
Contributor Author

dveeden commented Nov 7, 2023

Looks like this doesn't happen with unistore, but only with TiKV.

@dveeden
Copy link
Contributor Author

dveeden commented Nov 7, 2023

This looks like a TiKV pushdown issue.

sql> SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
ERROR: 1105 (HY000): other error: Invalid regexp pattern: Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
    [[,]23[],]
    ^
error: unclosed character class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)

sql> EXPLAIN SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
+--------------------------+---------+-----------+---------------+------------------------------------+
| id                       | estRows | task      | access object | operator info                      |
+--------------------------+---------+-----------+---------------+------------------------------------+
| Limit_8                  | 1.00    | root      |               | offset:0, count:1                  |
| └─TableReader_13         | 1.00    | root      |               | data:Limit_12                      |
|   └─Limit_12             | 1.00    | cop[tikv] |               | offset:0, count:1                  |
|     └─Selection_11       | 1.00    | cop[tikv] |               | regexp(test.t1.data, "[[,]23[],]") |
|       └─TableFullScan_10 | 1.25    | cop[tikv] | table:t1      | keep order:false, stats:pseudo     |
+--------------------------+---------+-----------+---------------+------------------------------------+
5 rows in set (0.0014 sec)

sql> INSERT INTO mysql.expr_pushdown_blacklist VALUES ('regexp','tikv','Issue #48360');
Query OK, 1 row affected (0.0120 sec)

sql> ADMIN RELOAD expr_pushdown_blacklist;
Query OK, 0 rows affected (0.0009 sec)

sql> EXPLAIN SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
+------------------------+---------+-----------+---------------+------------------------------------+
| id                     | estRows | task      | access object | operator info                      |
+------------------------+---------+-----------+---------------+------------------------------------+
| Limit_8                | 1.00    | root      |               | offset:0, count:1                  |
| └─Selection_12         | 1.00    | root      |               | regexp(test.t1.data, "[[,]23[],]") |
|   └─TableReader_11     | 1.25    | root      |               | data:TableFullScan_10              |
|     └─TableFullScan_10 | 1.25    | cop[tikv] | table:t1      | keep order:false, stats:pseudo     |
+------------------------+---------+-----------+---------------+------------------------------------+
4 rows in set, 1 warning (0.0008 sec)
Warning (code 1105): Scalar function 'regexp'(signature: RegexpLikeSig, return type: bigint(1)) is not supported to push down to tikv now.

sql> SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
Empty set (0.0016 sec)

@dveeden
Copy link
Contributor Author

dveeden commented Nov 7, 2023

@dveeden
Copy link
Contributor Author

dveeden commented Nov 7, 2023

From TiKV logs:

[2023/11/07 10:24:56.237 +01:00] [WARN] [endpoint.rs:823] [error-response] [err="Invalid regexp pattern: Syntax(\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nregex parse error:\n    [[,]23[],]\n    ^\nerror: unclosed character class\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n)"]

From TiDB logs:

[2023/11/07 10:24:56.238 +01:00] [WARN] [coprocessor.go:1352] ["other error"] [conn=214365184997720471] [txnStartTS=445474169476612097] [regionID=14] [bucketsVer=0] [latestBucketsVer=0] [rangeNums=1] [firstRangeStartKey="t\ufffd\u0000\u0000\u0000\u0000\u0000\u0000X_r\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"] [lastRangeEndKey="t\ufffd\u0000\u0000\u0000\u0000\u0000\u0000X_r\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\u0000"] [storeAddr=127.0.0.1:20160] [error="other error: Invalid regexp pattern: Syntax(\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nregex parse error:\n    [[,]23[],]\n    ^\nerror: unclosed character class\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n)"]
[2023/11/07 10:24:56.238 +01:00] [INFO] [conn.go:1184] ["command dispatched failed"] [conn=214365184997720471] [connInfo="id:214365184997720471, addr:127.0.0.1:36168 status:10, collation:utf8mb4_0900_ai_ci, user:root"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="SELECT data FROM t1 WHERE data REGEXP '[\\[,]23[\\],]' LIMIT 1"] [txn_mode=PESSIMISTIC] [timestamp=445474169476612097] [err="other error: Invalid regexp pattern: Syntax(\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nregex parse error:\n    [[,]23[],]\n    ^\nerror: unclosed character class\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n)\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleCopResponse\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:1347\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleCopPagingResult\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:1282\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleTaskOnce\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:1221\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleTask\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:1106\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).run\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/store/copr/coprocessor.go:799\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"]

@dveeden dveeden removed their assignment Nov 7, 2023
@lcwangchao lcwangchao added sig/execution SIG execution and removed sig/sql-infra SIG: SQL Infra labels Nov 8, 2023
@zanmato1984
Copy link
Contributor

zanmato1984 commented Nov 8, 2023

This seems caused by the same reason as tikv/tikv#15478 (comment)

Might be a duplication as tikv/tikv#15478

@gengliqi
Copy link
Contributor

gengliqi commented Nov 8, 2023

The escape character \ will be removed before regexp handles it.

mysql> select '[\[,]23[\],]';
+------------+
| [[,]23[],] |
+------------+
| [[,]23[],] |
+------------+
1 row in set (0.00 sec)

If we want to keep the \, \\ must be used.

mysql> select '[\\[,]23[\\],]';
+--------------+
| [\[,]23[\],] |
+--------------+
| [\[,]23[\],] |
+--------------+
1 row in set (0.00 sec)

If we use \\, the regexp library of rust in TiKV can work well.
However, it seems that the regexp library of go can identify the [ and ] in the middle of [] as a normal character so it doesn't report any error with no escape character.

@dveeden
Copy link
Contributor Author

dveeden commented Jan 3, 2024

Expanding the test a bit:

CREATE TABLE t1 (id INT PRIMARY KEY AUTO_INCREMENT, data VARCHAR(255));
SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
SELECT data FROM t1 WHERE data REGEXP '[\\[,]23[\\],]' LIMIT 1;
INSERT INTO t1 VALUES (1,'foobar'),(2,'[23]');
SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
SELECT data FROM t1 WHERE data REGEXP '[\\[,]23[\\],]' LIMIT 1;

TiDB v7.5.0 with default settings:

sql> CREATE TABLE t1 (id INT PRIMARY KEY AUTO_INCREMENT, data VARCHAR(255));
Query OK, 0 rows affected (0.1956 sec)

sql> SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
ERROR: 1105 (HY000): other error: Invalid regexp pattern: Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
    [[,]23[],]
    ^
error: unclosed character class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)

sql> SELECT data FROM t1 WHERE data REGEXP '[\\[,]23[\\],]' LIMIT 1;
Empty set (0.0016 sec)

sql> INSERT INTO t1 VALUES (1,'foobar'),(2,'[23]');
Query OK, 2 rows affected (0.0131 sec)

Records: 2  Duplicates: 0  Warnings: 0

sql> SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
ERROR: 1105 (HY000): other error: Invalid regexp pattern: Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
    [[,]23[],]
    ^
error: unclosed character class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)

sql> SELECT data FROM t1 WHERE data REGEXP '[\\[,]23[\\],]' LIMIT 1;
+------+
| data |
+------+
| [23] |
+------+
1 row in set (0.0014 sec)

TiDB v7.5.0, with regexp pushdown disabled:

sql> CREATE TABLE t1 (id INT PRIMARY KEY AUTO_INCREMENT, data VARCHAR(255));
Query OK, 0 rows affected (0.1620 sec)

sql> SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
Empty set (0.0079 sec)

sql> SELECT data FROM t1 WHERE data REGEXP '[\\[,]23[\\],]' LIMIT 1;
Empty set (0.0035 sec)

sql> INSERT INTO t1 VALUES (1,'foobar'),(2,'[23]');
Query OK, 2 rows affected (0.0212 sec)

Records: 2  Duplicates: 0  Warnings: 0

sql> SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
+------+
| data |
+------+
| [23] |
+------+
1 row in set (0.0031 sec)

sql> SELECT data FROM t1 WHERE data REGEXP '[\\[,]23[\\],]' LIMIT 1;
+------+
| data |
+------+
| [23] |
+------+
1 row in set (0.0038 sec)

MySQL 8.2.0

sql> CREATE TABLE t1 (id INT PRIMARY KEY AUTO_INCREMENT, data VARCHAR(255));
Query OK, 0 rows affected (0.0656 sec)

sql> SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
Empty set (0.0011 sec)

sql> SELECT data FROM t1 WHERE data REGEXP '[\\[,]23[\\],]' LIMIT 1;
Empty set (0.0004 sec)

sql> INSERT INTO t1 VALUES (1,'foobar'),(2,'[23]');
Query OK, 2 rows affected (0.0196 sec)

Records: 2  Duplicates: 0  Warnings: 0

sql> SELECT data FROM t1 WHERE data REGEXP '[\[,]23[\],]' LIMIT 1;
ERROR: 3696 (HY000): The regular expression contains an unclosed bracket expression.

sql> SELECT data FROM t1 WHERE data REGEXP '[\\[,]23[\\],]' LIMIT 1;
+------+
| data |
+------+
| [23] |
+------+
1 row in set (0.0010 sec)

Conclusions:

  1. In MySQL the regexp doesn't seem to be executed/validated if the table is empty. With a non-empty table a double backslash is needed (\\)
  2. In TiDB with default settings the regexp seems to always be executed/validate even if the table is empty. The double backslash seems to work fine.
  3. In TiDB with pushdown of REGEXP to TiKV disabled it allows both single (\) and double (\\)backslash.

Looks like TiDB with pushdown disabled should be changed to more strictly validate the regexp and have the same behavior as TiDB with pushdown enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.1 affects-7.5 affects-8.1 compatibility-mysql8 This is a compatibility issue with MySQL 8.0(but NOT 5.7) may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 severity/major sig/execution SIG execution type/bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

6 participants