Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLSTATE[HY000]: General error: 3696 The regular expression contains an unclosed bracket expression. on line: 499 in /var/www/html/wire/core/WireDatabasePDO.php #1173

Closed
adrianbj opened this issue May 15, 2020 · 25 comments

Comments

@adrianbj
Copy link

adrianbj commented May 15, 2020

Short description of the issue

This error appears when using a selector like:

$pages->find("title~=aaa");

image

Optional: Suggestion for a possible fix

The generated SQL is:

SELECT pages.id,pages.parent_id,pages.templates_id
FROM `pages`
JOIN field_title AS field_title ON field_title.pages_id=pages.id AND ((((field_title.data REGEXP '([[[:blank:][:punct:]]|^)aaa([[:blank:][:punct:]]|$)') ) ))
WHERE (pages.status<1024)
AND (pages.templates_id!=52 AND pages.templates_id!=45 AND pages.templates_id!=47)
AND (pages.templates_id!=52 AND pages.templates_id!=45 AND pages.templates_id!=47)
GROUP BY pages.id

image

To fix, you can escape the first square bracket so the query looks like:

SELECT pages.id,pages.parent_id,pages.templates_id
FROM `pages`
JOIN field_title AS field_title ON field_title.pages_id=pages.id AND ((((field_title.data REGEXP '(\\[[[:blank:][:punct:]]|^)aaa([[:blank:][:punct:]]|$)') ) ))
WHERE (pages.status<1024)
AND (pages.templates_id!=52 AND pages.templates_id!=45 AND pages.templates_id!=47)
AND (pages.templates_id!=52 AND pages.templates_id!=45 AND pages.templates_id!=47)
GROUP BY pages.id

image

Error happens on two different servers:

Server A
MySQL Server: 8.0.19
MySQL Client: mysqlnd 7.4.4

Server B:
MySQL Server: 8.0.20-0ubuntu0.20.04.1
MySQL Client: mysqlnd 7.4.3

This error didn't happen last year (this is a seasonal site) and I just recently did a full server software update which I am certain updated the version of MySQL, so I expect that is the problem but it will start breaking lots of sites I would expect.

@teppokoivula
Copy link

teppokoivula commented May 16, 2020

Very loosely related, posting here to provide context (https://bugs.mysql.com/bug.php?id=95347):

between 5.8 and 8.0, we introduced a new regular expression library so that we could support new functions (https://mysqlserverteam.com/new-regular-expression-functions-in-mysql-8-0/). This library obviously has a different implementation, so it will behave differently in some corner cases.

MySQL 8.0 docs (https://dev.mysql.com/doc/refman/8.0/en/regexp.html) state that...

ICU also interprets square brackets as metacharacters, but only the opening square bracket need be escaped to be used as a literal character

Note: this might not be related at all. Engine change probably is, but not the latter part.

In fact looking at the regexp above, it does look like a real issue. I mean — there is indeed one unclosed bracket there. @ryancramerdesign probably knows better, but looks to me like it's not doing exactly what it should, and if so (assuming that there's one extraneous bracket) escaping would fix the error, but also result in wrong behaviour.

@ryancramerdesign
Copy link
Member

@teppokoivula @adrianbj Yes, it does look like there is an extra bracket there. I just tried removing it here, and am getting the same results as with it there, so it's a mystery why MySQL doesn't seem to care (except Adrian's version), or if it is there for some reason that I've forgotten, and needs to be escaped.

Adrian, you mentioned that escaping the first bracket fixed the issue there, can you confirm that it does match 2-3 character words in a ~= search? (which is the only place this particular code would run). And any difference if you just remove the bracket completely?

In my case (on MAMP) I get the same results whether using [[[, [[ or \\[[[ and doing a search for title~=new, it matches all pages with the word "new" in the title. I'm thinking that [[ is the right one that should be kept, but again not positive if the extra bracket was a mistake or serves some purpose I don't remember.

(Note as soon as you hit 4 characters it starts using a fulltext index rather than this regexp).

@adrianbj
Copy link
Author

adrianbj commented May 19, 2020

Thanks @ryancramerdesign and @teppokoivula

So I get the same results whether I escape the first [ or if I just remove it. However, the results are not really what is expected.

Let me step back a bit. This is the code I have been using for a long time without problems (until MySQL update).

// first try a fulltext match of keywords anywhere in title/body
$matches = $pages->find("title~=$q, parent=/members/");

// if the above doesn't find anything (perhaps due to MySQL minimum word length or stopwords), switch to non-indexed phrase match
if(!count($matches)) {
    $matches = $pages->find("title%=$q, parent=/members/");
}

This is an ajax search which triggers as soon as there are 3 characters entered in an input to match a member's name (title).

If I type rog then the generated query is:


SELECT pages.id,pages.parent_id,pages.templates_id FROM `pages` JOIN field_title AS field_title ON field_title.pages_id=pages.id AND ((((field_title.data REGEXP '([[[:blank:][:punct:]]\|^)rog([[:blank:][:punct:]]\|$)') ) )) LEFT JOIN field_first_name AS _sort_first_name ON _sort_first_name.pages_id=pages.id WHERE (pages.parent_id=7802) AND (pages.status<1024) GROUP BY pages.id ORDER BY _sort_first_name.data

This used to work fine and match a member whose name was "Roger Brown", but now I get that unescaped bracket error and if I remove (or escape) that extra bracket, I don't get any results. However, if I do the same selector search with roger (more than 3 characters), the generated query is:

SELECT pages.id,pages.parent_id,pages.templates_id,MATCH(field_title.data) AGAINST('roger') AS _score_field_title_data1
FROM `pages`
JOIN field_title AS field_title ON field_title.pages_id=pages.id AND (((MATCH(field_title.data) AGAINST('+roger' IN BOOLEAN MODE) ) ))
LEFT JOIN field_first_name AS _sort_first_name ON _sort_first_name.pages_id=pages.id
WHERE (pages.parent_id=7802)
AND (pages.status<1024)
GROUP BY pages.id
ORDER BY _sort_first_name.data,_score_field_title_data1 DESC

which returns the correct result.

Also, in case it's of interest, if I modify the regex version of the search to find roger, eg:

SELECT pages.id,pages.parent_id,pages.templates_id
FROM `pages`
JOIN field_title AS field_title ON field_title.pages_id=pages.id AND ((((field_title.data REGEXP '([[:blank:][:punct:]]|^)roger([[:blank:][:punct:]]|$)') ) ))
LEFT JOIN field_first_name AS _sort_first_name ON _sort_first_name.pages_id=pages.id
WHERE (pages.parent_id=7802)
AND (pages.status<1024)
GROUP BY pages.id
ORDER BY _sort_first_name.data

then it works and the member is found correctly.

So to sum up, ~ selectors with 3 characters no longer seem to work at all with the bracket error, but even if that is corrected, the returned results are not correct.

Does that help?

@ryancramerdesign
Copy link
Member

@adrianbj

This used to work fine and match a member whose name was "Roger Brown", but now I get that unescaped bracket error and if I remove (or escape) that extra bracket, I don't get any results.

The select query you had directly above the quoted text had zero brackets in it at all. So the error was not referring to that query.

What I understand is triggering the error you are seeing is if using a ~= operator with a value that is less than 4 characters, like "rog". This makes it use a word-matching RLIKE, since the length is too short for MySQL's index.

Asking for a full word match ~=, a search for "rog" could never match "roger". But %=rog is asking for a partial match using LIKE, which should match "roger" as well as "balrog".

So to sum up, ~ selectors with 3 characters no longer seem to work at all with the bracket error, but even if that is corrected, the returned results are not correct.

With your tests I'd stick to just ~= or %= (1 at a time) so that it's not possible to confuse the two in the results and error messages. We know that the ~=rog isn't working in your case, but since you said your test did a fallback to a %=rog that one should have matched. So you've got me curious about that one, maybe MySQL 8 also broke regular LIKE expressions as well?

Back to the ~=, on that MySQL page it says this (you also mentioned earlier):

ICU also interprets square brackets as metacharacters, but only the opening square bracket need be escaped to be used as a literal character

We aren't trying to match a literal bracket, we want it to interpret the bracket as a metacharacter to create a named character class, so unless I'm misreading it, no brackets should be escaped here. Here's an updated RLIKE expression to try if you get a chance:

$regex = "([[:blank:]]|[[:punct:]]|[[space]]|^)$v([[:blank:]]|[[:punct:]]|[[space]]|$)";

@adrianbj
Copy link
Author

The select query you had directly above the quoted text had zero brackets in it at all. So the error was not referring to that query.

Sorry, I pasted the wrong one - I have updated the post to show the correct one.

We know that the ~=rog isn't working in your case, but since you said your test did a fallback to a %=rog that one should have matched.

Well, it would, except we are getting that exception being thrown because this version of MySQL doesn't like the extra bracket.

Here are some example tests / results:

image

image

image

image

I also just tested:

$regex = "([[:blank:]]|[[:punct:]]|[[space]]|^)$v([[:blank:]]|[[:punct:]]|[[space]]|$)";

and that deals with the error and also returns the correct matching members, so it looks like that is what is needed. Will you update the PW core with that change?

@ryancramerdesign
Copy link
Member

ryancramerdesign commented May 19, 2020 via email

@adrianbj
Copy link
Author

I will update to that tomorrow. Thanks.

Thank you!

@adrianbj
Copy link
Author

@ryancramerdesign - just looking at the latest commits and of course, the original issue is fixed, but I did just notice that if you try:

$pages->find("name~=aaa");

you get this error:
Exception: SQLSTATE[HY000]: General error: 3685 Illegal argument to a regular expression. on line: 499 in /Users/ajones/Sites/ecoreportcard/wire/core/WireDatabasePDO.php

The query that is generated is:

SELECT pages.id,pages.parent_id,pages.templates_id 
FROM `pages` 
WHERE (pages.name RLIKE '[[:<:]]aaa[[:>:]]') AND (pages.status<1024) 
GROUP BY pages.id

Should the ~ condition in a selector support the name field?

@ryancramerdesign
Copy link
Member

ryancramerdesign commented May 22, 2020 via email

@adrianbj
Copy link
Author

@ryancramerdesign - unfortunately that doesn't seem to help. The generated query is now:

SELECT pages.id,pages.parent_id,pages.templates_id
FROM `pages`
WHERE (pages.name RLIKE :str)
AND (pages.status<:int)
GROUP BY pages.id

and I am still getting the same error.

@adrianbj
Copy link
Author

Also, while you're in DB mode, any chance you could please take a look at: #1142

@ryancramerdesign
Copy link
Member

ryancramerdesign commented May 22, 2020 via email

@adrianbj
Copy link
Author

This seems to explain the problem:
https://stackoverflow.com/a/60906360/1524576

@adrianbj
Copy link
Author

adrianbj commented May 22, 2020

A little OT, but it seems a shame to me that I can no longer see the exact query that is processed with the bound parameters replaced with the actual values.

On the last dev version of PW, via Tracy's Debug Mode panel > Selector Queries I see:

SELECT pages.id,pages.parent_id,pages.templates_id
FROM `pages`
WHERE (pages.name RLIKE '[[:<:]]aaa[[:>:]]')
AND (pages.status<1024)
GROUP BY pages.id

but with today's DB refactoring, now all I see is:

SELECT pages.id,pages.parent_id,pages.templates_id
FROM `pages`
WHERE (pages.name RLIKE :str)
AND (pages.status<:int)
GROUP BY pages.id

which makes it harder to debug selectors. Any chance there is a way I can get the final query instead for Tracy?

FYI - this is where I am collected selector queries for display in that panel:
https://github.com/adrianbj/TracyDebugger/blob/48b4ea7b4565f03d566620550b1ea976accd0a15/TracyDebugger.module.php#L486-L489

@ryancramerdesign
Copy link
Member

ryancramerdesign commented May 22, 2020 via email

@adrianbj
Copy link
Author

adrianbj commented May 22, 2020

I completely understand the desire to use binds and glad to hear that you're keen to find a way to have debug versions that will show the values.

The tools provided by Tracy were essential in making this update possible.

Great, glad to hear it was useful!

Is Tracy pulling these from Database::queryLog() or somewhere else?

Just hooking into PageFinder::getQuery. I don't think I can use queryLog - IIRC there isn't a way to match selectors to SQL queries via that, which is the key feature of the "Selector Queries" table.

As noted above, you can see the code here: https://github.com/adrianbj/TracyDebugger/blob/48b4ea7b4565f03d566620550b1ea976accd0a15/TracyDebugger.module.php#L486-L489

PS - if you have any ideas for making this table better, please let me know.

@ryancramerdesign
Copy link
Member

ryancramerdesign commented May 22, 2020 via email

@adrianbj
Copy link
Author

it appears the pre 8.x MySQL versions and 8.x are completely incompatible in this way

I am not sure about this - I think 5.x also supports \b - I think it's just that 8.x doesn't support [[:>:]]

This might be useful: https://www.codeproject.com/Articles/4421/Henry-Spencer-s-Regexp-Engine-Revisited and it references what is supported by the spencer regex library which is what was previously used by MySQL before 8 went to ICU regex.

@ryancramerdesign
Copy link
Member

 I am not sure about this - I think 5.x also supports \b - I think it's just that 8.x doesn't support [[:>:]]

@adrianbj Prior to 8.x does support a \b option, but it means something completely different. Rather than matching a word boundary, it matches a blank (i.e. like a space). Blanks don't appear in page names, whereas word boundaries do. Matching a blank means actually matching a character, whereas matching a boundary means matching a position. For instance, take the string a-b — a word boundary would match the position before the a, after the a, before the b and after the b. Whereas a blank wouldn't match anything in that string.

I thought there's always a possibility the online info wasn't right, and \b might properly match a word boundary in 5.x, but I tried, and no such luck. I think this is at least one case where we are going to need to know which MySQL version is present in order to deliver the right query. I'm curious, if you have a chance sometime, and try replacing this... (PageFinder.php line 2419):

$bindKey = $query->bindValueGetKey('[[:<:]]' . $word . '[[:>:]]');

...with one of these below, which one works?

$bindKey = $query->bindValueGetKey("\\b$word\\b"); // option A
$bindKey = $query->bindValueGetKey("\\\\b$word\\\\b"); // option B

I'm guessing option B, based on the info in the link you sent. I'm going to try and get setup with MySQL 8 here so I can test this stuff here as well.

@BernhardBaumrock
Copy link

BernhardBaumrock commented May 24, 2020

Just wanted to let you know that I just released RockFinder3 (https://github.com/baumrock/RockFinder3). As you can see in the readme I rely on the plain SQL on several places. The less critical part is dumping SQL for debugging, but the more critical part is when I use SQL for joining data as sub-queries or for extending the basic features of plain databaseSelect queries.

Take this example:

$owners = $RockFinder3
  ->find("template=person")
  ->addColumns(['title', 'age', 'weight'])
  ->setName('owner');
$cats = $RockFinder3
  ->find("template=cat")
  ->addColumns(['title', 'owner'])
  ->join($owners)
  ->getSQL();
db($RockFinder3->getObject("
  SELECT AVG(`owner:age`) FROM ($cats) AS tmp
"));

This worked until the latest update using bind values:

Exception: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ':int) AND (pages.status<:intA) GROUP BY pages.id ) AS `join_owner_5ecacb69' 

This is how it used to work until the update: https://github.com/baumrock/RockFinder3#option-2-sql-string-modification

This also breaks the whole join() feature of my module:
img

Would be great to get some words on how/if/when I can expect an update that also works for my module. Thank you!

PS: I'm using getQuery() of the DatabaseQuerySelect object to get the SQL query: https://github.com/baumrock/RockFinder3/blob/12062e5dba51ed5a3d662c1712c4d8dea57b1d83/RockFinder3.module.php#L505 - so a separate method to return the query with all values would be great (essential).

@ryancramerdesign
Copy link
Member

@adrianbj Version 3.0.158 does convert bind queries to value queries, just for the ones returned by $database->queryLog(). I checked and Tracy appears to be picking those up, so that's good. It's the PageFinder selector queries that it isn't, but there's a way to get them now. There's a new DatabaseQuerySelect::getDebugQuery() method now, so in TD you should be able to call that method rather than getQuery, and get the query with all the bind values populated.

if(method_exists($query, 'getDebugQuery')) {
  // 3.0.158 and newer
  $sql = $query->getDebugQuery();
} else {
  // 3.0.157 and earlier (what you currently use)
  $sql = $query->getQuery(); 
}

@BernhardBaumrock I don't know what that could be, but if your module depends on queries not using bind values, then I don't think there would be a way around that. But I think it's more likely that that's not the case. The changes on this part continued this week, so give the latest version 3.0.158 a try and maybe it'll work with your module? Let me know if not.

@adrianbj
Copy link
Author

@ryancramerdesign - you can ignore my comment on the relevant commit for this. I see what to do now, but you have a bug in the str_replace call in that new getDebugQuery() method - it only has two arguments :)

@adrianbj
Copy link
Author

After I add $sql as the third argument, everything is working properly again and the selector queries table has the bound values - awesome!

adrianbj referenced this issue in processwire/processwire May 29, 2020
…eturns the query (string) with all bind variables populated into the SQL
@ryancramerdesign
Copy link
Member

@adrianbj Just fixed, thanks :)

@BernhardBaumrock
Copy link

@ryancramerdesign thx, the update did also fix the breaking change for RockFinder3 - everything working again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants