Add PREG_LENGTH_CAPTURE flag #3971

nikic · 2019-03-21T11:32:11Z

This is the implementation for https://bugs.php.net/bug.php?id=77744. If PREG_LENGTH_CAPTURE is used, captured strings are replaced with their length instead. Generally this is only useful in conjunction with PREG_OFFSET_CAPTURE, in which case the offset + length together allow you to extract the captured string manually.

The motivation is to avoid copying large captured substrings if not necessary.

nikic · 2019-05-09T10:09:14Z

@cscott Based on your benchmarks, guess I can drop this one as not really worthwhile?

cmb69 · 2021-07-19T13:31:24Z

What is the status here? @cscott?

iluuu1994 · 2022-04-18T11:33:45Z

Closing as there was no response.

cscott · 2025-07-03T17:41:44Z

Sorry for the unresponsiveness. PHP tokenizers still suffer a performance penalty compare to JS. One reason is that we don't have the equivalent of mb_ord_at($string, $offset) to allow multi-byte character comparisons to be done numerically without creating a bunch of 1- to 4-byte substrings.

But the preg_match interface could be improved, too. preg_match() is optimized if the match is omitted but the only way to use the $offset parameter is by providing a match array. PREG_LENGTH_CAPTURE would still help, by providing a way to avoid the more expensive part of creating a match array. Something like preg_match_at() might also help by avoiding the creation of the array, but you still have to advance "one character" if you have the match, and that requires knowing the match length for UTF-8 strings. So I think this option would still be very beneficial.

Add PREG_LENGTH_CAPTURE flag

70dea5d

nikic added the Feature label Mar 21, 2019

php-pulls force-pushed the PHP-7.4 branch from 4e0f9e4 to 6a5f851 Compare August 5, 2020 08:37

iluuu1994 closed this Apr 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add PREG_LENGTH_CAPTURE flag #3971

Add PREG_LENGTH_CAPTURE flag #3971

Uh oh!

nikic commented Mar 21, 2019

Uh oh!

nikic commented May 9, 2019

Uh oh!

cmb69 commented Jul 19, 2021

Uh oh!

iluuu1994 commented Apr 18, 2022

Uh oh!

cscott commented Jul 3, 2025

Uh oh!

Uh oh!

Add PREG_LENGTH_CAPTURE flag #3971

Add PREG_LENGTH_CAPTURE flag #3971

Uh oh!

Conversation

nikic commented Mar 21, 2019

Uh oh!

nikic commented May 9, 2019

Uh oh!

cmb69 commented Jul 19, 2021

Uh oh!

iluuu1994 commented Apr 18, 2022

Uh oh!

cscott commented Jul 3, 2025

Uh oh!

Uh oh!