Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor pcre optimizations #12923

Merged
merged 9 commits into from
Dec 11, 2023
Merged

Minor pcre optimizations #12923

merged 9 commits into from
Dec 11, 2023

Conversation

nielsdos
Copy link
Member

No description provided.

This changes the variables that are bools to actually be bools instead
of ints, which allows some additional optimization by the compiler (e.g.
removing some ternaries and move extensions).

It also gets rid of the use_flags argument because that's just the same
as flags == 0. This reduces the call frame.
This can't be true if the second condition is true because it would
require the string to occupy the entire address space.
Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Didn't find any obvious mistakes.

@@ -1013,8 +1013,8 @@ static inline void add_offset_pair(
static void populate_subpat_array(
zval *subpats, const char *subject, PCRE2_SIZE *offsets, zend_string **subpat_names,
uint32_t num_subpats, int count, const PCRE2_SPTR mark, zend_long flags) {
bool offset_capture = (flags & PREG_OFFSET_CAPTURE) != 0;
bool unmatched_as_null = (flags & PREG_UNMATCHED_AS_NULL) != 0;
zend_long offset_capture = flags & PREG_OFFSET_CAPTURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect this to change the emitted assembly, at least after optimizations. Does it?

Copy link
Member Author

@nielsdos nielsdos Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sequence emitted by my compiler (GCC 13) for the old code was:
mov reg, flags
test reg, flag
setne reg2

and for the if:
test reg2, reg2
jz ...

while the change I made turns this into:
mov reg, flags
and reg, flag

and then the same sequence for the if.

So for some reasons the compiler produces one extra instruction in the old code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok. This optimization seems to work fine on godbolt, probably something else in the function that trips it up. I'm just asking because it's easy to imagine that somebody will switch it back, expecting it to work. So a comment might be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I doubt that it really matters. What this optimization PR showed yet again is that reducing the number of memory accesses is the most worthwhile optimization. In my previous pcre optimization PR I reduced the number of memory accesses and it had a great improvement, and the biggest improvement from this PR is again caused by avoiding the marks HashTable unwrapping from the zval.

bool offset_capture = (flags & PREG_OFFSET_CAPTURE) != 0;
bool unmatched_as_null = (flags & PREG_UNMATCHED_AS_NULL) != 0;
zend_long offset_capture = flags & PREG_OFFSET_CAPTURE;
zend_long unmatched_as_null = flags & PREG_UNMATCHED_AS_NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on another line similar unmatched_as_null is declared as uint32_t

@nielsdos nielsdos marked this pull request as ready for review December 11, 2023 18:24
@nielsdos nielsdos merged commit 642e111 into php:master Dec 11, 2023
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants