-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different preg_match result with -d pcre.jit=0
#11374
Comments
Huh. It's backwards: normally these sorts of bugs happen with jit=1 and turning it off allows the regex to work... That's a gigantic regex. Can you boil it down to something manageable that still reproduces? And either way, does it also happen with the |
Sure: turn on JIT 😉 Otherwise, maybe? You could experiment with the regex - add/remove optimizations, that sort of thing - to see if you can trigger some different internal code paths to bypass wherever the bug is. It'd be easier to do that if you could narrow down the source of the problem first, of course. And there's always the option of breaking it down into smaller regexes... I mean really, that's 100+ lines of (formatted) regex in there. May or may not help, but it might be a nice gain regardless. |
In my case there's |
The issue is indeed cause by JIT, not the other way around. Simpler reproducible case: $regex = '
(?<types>
(?:
(?:\{ (?&types) \})
| (a)
)
(\*?)
)
';
var_dump(preg_match('{^' . $regex . '$}x', '{a}', $matches), $matches); This is the correct output, with
This is the incorrect output, with
Note the missing capture for group Looking at the
With JIT:
So, the issue seems to originate from PCRE. I'll check if the issue was already fixed/reported upstream. Edit: Things are a bit more complicated. Removing the |
Here is online repro - https://3v4l.org/W4UkB, problem is present somewhere between PCRE >8.41 and <= 10.32 versions. |
Yeah so it does seem JIT is the odd one out after all, although the output doesn't make sense to me. |
|
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#define PCRE2_CODE_UNIT_WIDTH 8
#include <pcre2.h>
void test(bool use_jit) {
const char *pattern = "(?<all>(?:(?:a(?&all))|(b))(c?))";
const char *subject = "aabc";
int errnumber;
PCRE2_SIZE erroffset;
pcre2_code *re = pcre2_compile((PCRE2_SPTR)pattern, strlen(pattern), 0, &errnumber, &erroffset, NULL);
if (use_jit) {
pcre2_jit_compile(re, PCRE2_JIT_COMPLETE);
}
pcre2_match_data *match_data = pcre2_match_data_create_from_pattern(re, NULL);
int count;
if (!use_jit) {
count = pcre2_match(re, (PCRE2_SPTR)subject, strlen(subject), 0, 0, match_data, NULL);
} else {
count = pcre2_jit_match(re, (PCRE2_SPTR)subject, strlen(subject), 0, 0, match_data, NULL);
}
PCRE2_SIZE *offsets = pcre2_get_ovector_pointer(match_data);
printf("%s\n", subject);
printf("%d\n", count);
for (uint32_t i = 0; i < count * 2; i += 2) {
printf("%lu - %lu\n", offsets[i], offsets[i + 1]);
}
pcre2_match_data_free(match_data);
pcre2_code_free(re);
}
int main(void) {
test(false);
test(true);
return 0;
}
The problem is reproducible with pcre-only. |
@iluuu1994 like in https://3v4l.org/5YhJP I would like to test parsing /wo and /w PCRE JIT in phpunit tests, but I cannot change the regexes (as they are not defined in the tests) - what are the options I can convince php to not cache the regexes, can I clear the regex cache by some php/userland function? |
@mvorisek An implementation that proves there's no performance hit. That's very unlikely. A function to clear the cache might be reasonable. Alternatively we might incorporate |
In general clearing cache should not be needed if the cache is not mutated during matching, but when the php.ini is changed, no cache with the old setting should be used. So either incorporate For current php versions, I coded the cache clearing by dummy matching 4096 regexes - https://3v4l.org/NrsHc -- the cache key is build in: https://github.com/php/php-src/blob/32968f8de0/ext/pcre/php_pcre.c#L615-L619 the |
@mvorisek As you have probably seen I had to revert the last PR due to bit performance implications. I also closed #11511 for the same reason. The performance drop is smaller but still more than I'd expect. The alternative approach would be to clear the cache, but that would require at least an e-mail to the mailing list. I don't think it's worth it, given this is an edge case and should be fixed in PCRE. If you'd like to start this discussion, please send a mail to the internals list. Here's a patch: Patch
diff --git a/ext/pcre/php_pcre.c b/ext/pcre/php_pcre.c
index 6ad0b6eb76..b4a94a6ee4 100644
--- a/ext/pcre/php_pcre.c
+++ b/ext/pcre/php_pcre.c
@@ -2961,6 +2961,30 @@ PHP_FUNCTION(preg_last_error_msg)
}
/* }}} */
+PHP_FUNCTION(preg_cache_clear)
+{
+ ZEND_PARSE_PARAMETERS_NONE();
+
+ zend_hash_clean(&PCRE_G(pcre_cache));
+}
+
+PHP_FUNCTION(preg_cache_remove)
+{
+ zend_string *regex;
+
+ ZEND_PARSE_PARAMETERS_START(1, 1)
+ Z_PARAM_STR(regex)
+ ZEND_PARSE_PARAMETERS_END();
+
+ zend_hash_del(&PCRE_G(pcre_cache), regex);
+
+ if (BG(ctype_string)) {
+ zend_string *key = zend_string_concat2(ZSTR_VAL(BG(ctype_string)), ZSTR_LEN(BG(ctype_string)), ZSTR_VAL(regex), ZSTR_LEN(regex));
+ zend_hash_del(&PCRE_G(pcre_cache), regex);
+ zend_string_release(key);
+ }
+}
+
/* {{{ module definition structures */
zend_module_entry pcre_module_entry = {
diff --git a/ext/pcre/php_pcre.stub.php b/ext/pcre/php_pcre.stub.php
index 1b06075885..de361bf1dc 100644
--- a/ext/pcre/php_pcre.stub.php
+++ b/ext/pcre/php_pcre.stub.php
@@ -140,3 +140,7 @@ function preg_grep(string $pattern, array $array, int $flags = 0): array|false {
function preg_last_error(): int {}
function preg_last_error_msg(): string {}
+
+function preg_cache_clear(): void {}
+
+function preg_cache_remove(string $pattern): void {}
diff --git a/ext/pcre/php_pcre_arginfo.h b/ext/pcre/php_pcre_arginfo.h
index a4132e28e5..b9cdbeb1f9 100644
--- a/ext/pcre/php_pcre_arginfo.h
+++ b/ext/pcre/php_pcre_arginfo.h
@@ -1,5 +1,5 @@
/* This is a generated file, edit the .stub.php file instead.
- * Stub hash: 7f27807e45df9c9b5011aa20263c9789896acfbc */
+ * Stub hash: cd384188597be4586ac0df414a2d831ed5b31edd */
ZEND_BEGIN_ARG_WITH_RETURN_TYPE_MASK_EX(arginfo_preg_match, 0, 2, MAY_BE_LONG|MAY_BE_FALSE)
ZEND_ARG_TYPE_INFO(0, pattern, IS_STRING, 0)
@@ -62,6 +62,13 @@ ZEND_END_ARG_INFO()
ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_preg_last_error_msg, 0, 0, IS_STRING, 0)
ZEND_END_ARG_INFO()
+ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_preg_cache_clear, 0, 0, IS_VOID, 0)
+ZEND_END_ARG_INFO()
+
+ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_preg_cache_remove, 0, 1, IS_VOID, 0)
+ ZEND_ARG_TYPE_INFO(0, pattern, IS_STRING, 0)
+ZEND_END_ARG_INFO()
+
ZEND_FUNCTION(preg_match);
ZEND_FUNCTION(preg_match_all);
@@ -74,6 +81,8 @@ ZEND_FUNCTION(preg_quote);
ZEND_FUNCTION(preg_grep);
ZEND_FUNCTION(preg_last_error);
ZEND_FUNCTION(preg_last_error_msg);
+ZEND_FUNCTION(preg_cache_clear);
+ZEND_FUNCTION(preg_cache_remove);
static const zend_function_entry ext_functions[] = {
@@ -88,6 +97,8 @@ static const zend_function_entry ext_functions[] = {
ZEND_FE(preg_grep, arginfo_preg_grep)
ZEND_FE(preg_last_error, arginfo_preg_last_error)
ZEND_FE(preg_last_error_msg, arginfo_preg_last_error_msg)
+ ZEND_FE(preg_cache_clear, arginfo_preg_cache_clear)
+ ZEND_FE(preg_cache_remove, arginfo_preg_cache_remove)
ZEND_FE_END
}; |
What about: diff --git a/ext/pcre/php_pcre.c b/ext/pcre/php_pcre.c
index 6249a80..c381706 100644
--- a/ext/pcre/php_pcre.c
+++ b/ext/pcre/php_pcre.c
@@ -638,10 +638,15 @@ PHPAPI pcre_cache_entry* pcre_get_compiled_regex_cache_ex(zend_string *regex, in
back the compiled pattern, otherwise go on and compile it. */
zv = zend_hash_find(&PCRE_G(pcre_cache), key);
if (zv) {
- if (key != regex) {
- zend_string_release_ex(key, 0);
+ pcre_cache_entry *pce = (pcre_cache_entry*)Z_PTR_P(zv);
+ if (!(pce->preg_options & PREG_JIT) == !PCRE_G(jit)) {
+ if (key != regex) {
+ zend_string_release_ex(key, 0);
+ }
+ return (pcre_cache_entry*)Z_PTR_P(zv);
+ } else {
+ zend_hash_del(&PCRE_G(pcre_cache), key);
}
- return (pcre_cache_entry*)Z_PTR_P(zv);
}
p = ZSTR_VAL(regex);
It should have almost zero performance effect and will keep cache for usecases which switch the PCRE JIT flag for one regex and then restore the PCRE JIT flag back. |
as always/currently ;-) (PCRE JIT flag is enabled by default) |
No. Currently, regexes are compiled, then JIT is attempted, and then executed with JIT if succeeded. Otherwise it is interpreted. Later executions don't retry compilation. |
This is a backport of PCRE2Project/pcre2#300. Closes GH-12439.
* PHP-8.1: Fix GH-11374: Different preg_match result with -d pcre.jit=0
* PHP-8.2: Fix GH-11374: Different preg_match result with -d pcre.jit=0
* PHP-8.3: Fix GH-11374: Different preg_match result with -d pcre.jit=0
Description
related with PHP-CS-Fixer/PHP-CS-Fixer#6997
How to reproduce:
https://github.com/PHP-CS-Fixer/PHP-CS-Fixer.git
(latest master or 3.17 tag)composer update
php -d pcre.jit=0 vendor/phpunit/phpunit/phpunit --filter TypeExpressionTest
-d pcre.jit=1
the tests are passingExpected result:
pcre.jit
config should have no effect on the preg_match resultThe problematic regex is probably https://github.com/PHP-CS-Fixer/PHP-CS-Fixer/blob/v3.17.0/src/DocBlock/TypeExpression.php#L32 - is there any easy workaround until php-src/PCRE is fixed?
PHP Version
PHP 7.4 - 8.2
Operating System
Windows and Unix
The text was updated successfully, but these errors were encountered: