Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ZEND_ARRAY_KEY_EXISTS opcode to speed up array_key_exists() #3360

Closed
wants to merge 2 commits into from

Conversation

Majkl578
Copy link
Contributor

@Majkl578 Majkl578 commented Jul 2, 2018

This change makes array_key_exists() actually faster than isset() by ~25% (tested with GCC 8, -O3, march=native, mtune=native).

addresses https://bugs.php.net/bug.php?id=76148
addresses https://bugs.php.net/bug.php?id=71239

With a synthetic benchmark:

<?php

$foo = [123 => 456, 'abc' => 'def'];

$iterations = 500000000;

$a = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
    isset($foo[123]);
}
printf('isset (%dx): %.2fs%s', $iterations, microtime(true) - $a, PHP_EOL);

$b = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
    array_key_exists(123, $foo);
}
printf('array_key_exists (%dx): %.2fs%s', $iterations, microtime(true) - $b, PHP_EOL);

It yields results like:
7.2 (unpatched):

isset (500000000x): 7.65s
array_key_exists (500000000x): 9.03s

7.3 + patch:

isset (500000000x): 7.60s
array_key_exists (500000000x): 5.65s

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! I've left a few implementation notes, in particular in regard to how performance may be improved.

Zend/zend_compile.c Show resolved Hide resolved
subject = GET_OP2_ZVAL_PTR(BP_VAR_R);
key = GET_OP1_ZVAL_PTR(BP_VAR_R);

if (UNEXPECTED(Z_ISREF_P(subject))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more efficient (and for VM code, more idiomatic) way to handle this is to check for ISREF prior to the error case and use a goto try_again pattern. See https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L4419 for an example.


SAVE_OPLINE();

subject = GET_OP2_ZVAL_PTR(BP_VAR_R);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferable to use GET_OP2_ZVAL_PTR_UNDEF here and check the UNDEF case for IS_CV inside the branch that generates the type error. For an example of delayed UNDEF handling see https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L59.

zend_error(E_WARNING, "array_key_exists(): The first argument should be either a string or an integer");
result = 0;
}
ZVAL_BOOL(EX_VAR(opline->result.var), result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code between IS_ARRAY and IS_OBJECT is duplicated here (which is actually rather dubious and will handle some cases incorrectly, but that's an existing issue and best solved by deprecating array_key_exists on objects entirely). It would be better to only extract the hashtable using Z_ARRVAL_P and Z_OBJPROP_P in these branches and have only one copy of the common key handling code afterwards.

@@ -2394,6 +2394,7 @@ static int zend_update_type_info(const zend_op_array *op_array,
case ZEND_ISSET_ISEMPTY_STATIC_PROP:
case ZEND_ASSERT_CHECK:
case ZEND_IN_ARRAY:
case ZEND_ARRAY_KEY_EXISTS:
UPDATE_SSA_TYPE(MAY_BE_FALSE|MAY_BE_TRUE, ssa_ops[i].result_def);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_key_exists() can also return null for the zpp failure case.

Additionally, ARRAY_KEY_EXISTS needs to be handled in https://github.com/php/php-src/blob/master/ext/opcache/Optimizer/sccp.c#L1328. Looking at the ISSET_ISEMPTY_DIM_OBJ case should provide inspiration here.


FREE_OP2();
FREE_OP1();
ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_key_exists() is a function that would benefit from smart branch handling. See https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L8085 how this is done for ZEND_IN_ARRAY. You'll also have to add it to https://github.com/php/php-src/blob/master/Zend/zend_compile.c#L2181.

@Majkl578
Copy link
Contributor Author

Majkl578 commented Jul 3, 2018

@nikic Thanks for your review 🥇, I'll try to address it later this week.

@KalleZ
Copy link
Member

KalleZ commented Jul 3, 2018

Also you should just revert the change to NEWS, usually the merger of the PR will do this part =)

@cmb69
Copy link
Contributor

cmb69 commented Jul 3, 2018

Also you should just revert the change to NEWS, usually the merger of the PR will do this part =)

For this PR an (additional) entry in UPGRADING appears to be appropriate (UPGRADING usually serves as base for the migration guide in the PHP manual, but NEWS is likely to be overlooked, and this improvement seems to be quite noteworthy).

@laruence
Copy link
Member

laruence commented Jul 4, 2018

Although isset and array_key_exists behavior slightly different, but I think in most cases, array_key_exists can be repalced with isset..

thus, I doubt such optimization is worthy to do...

@nikic
Copy link
Member

nikic commented Jul 4, 2018

@laruence array_key_exists() is common in foundational libraries that need to support null values. These are often very hot code paths and currently the usual way to handle them is if (isset($array[$key]) || \array_key_exists($key, $array)), because directly calling array_key_exists() is too slow.

@nikic
Copy link
Member

nikic commented Jul 4, 2018

@cmb69 Currently we usually do not include performance changes in UPGRADING, unless they come with some observable behavior change. But I think it's a good idea to start doing this, maybe under a new "Performance Improvements" section. We do tend to have a fair bit of improvements in each release, but they aren't really called out anywhere.

@cmb69
Copy link
Contributor

cmb69 commented Jul 4, 2018

@nikic Even if we won't document performance changes, I still think this very change should be documented, since it can affect how userland code will be written. You already gave a typical example which could be improved wrt. this PR.

@hikari-no-yume
Copy link
Contributor

hikari-no-yume commented Jul 4, 2018

By the way, it would be nice to have a comment above the code for the normal function array_key_exists that mentions this special case. I added one for all the special cases there were some time ago (2015 maybe?), because I had modified a special-cased function's behaviour (in the actual function, not the compiler) and spent some time confused as to why the change wasn't taking effect.

@Majkl578 Majkl578 force-pushed the ZEND_ARRAY_KEY_EXISTS-opcode branch from 2265ca4 to 6a76ae4 Compare August 8, 2018 14:38
@Majkl578 Majkl578 changed the base branch from master to PHP-7.3 August 8, 2018 14:39
@Majkl578 Majkl578 force-pushed the ZEND_ARRAY_KEY_EXISTS-opcode branch from 6a76ae4 to 4ff60bb Compare August 8, 2018 14:40
@Majkl578
Copy link
Contributor Author

Majkl578 commented Aug 8, 2018

@nikic I adressed all your comments, except one: SCCP for this new opcode, since it's a bit beyond my current knowledge of PHP internals. May I ask you or anyone else to help me with this one? Thanks!

@Majkl578
Copy link
Contributor Author

Majkl578 commented Aug 8, 2018

gcc-8 -O0

isset (100000000x): 3.57s
array_key_exists (100000000x): 3.33s

gcc-8 -O3

isset (100000000x): 1.49s
array_key_exists (100000000x): 1.34s

@nikic Can you please another round of review? Thanks.

@Majkl578 Majkl578 force-pushed the ZEND_ARRAY_KEY_EXISTS-opcode branch from 4ff60bb to 16c5a7b Compare August 8, 2018 15:03
@nikic
Copy link
Member

nikic commented Aug 8, 2018

/cc @cmb69 for whether this can still land in 7.3. It introduces a new opcode, not sure what our policy is there. I guess as it may require further extension changes (xdebug support etc) probably too late?


ZEND_VM_C_LABEL(try_again):

if (UNEXPECTED(Z_ISREF_P(subject)) || UNEXPECTED(Z_ISREF_P(key))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite what I had in mind regarding the reference handling. The idea is to do something like this:

ZEND_VM_C_LABEL(try_again_subject):
	if (EXPECTED(Z_TYPE_P(subject) == IS_ARRAY)) {
		ht = Z_ARRVAL_P(subject);
	} else if (UNEXPECTED(Z_TYPE_P(subject) == IS_OBJECT)) {
		ht = Z_OBJPROP_P(subject);
	} else if (Z_TYPE_P(Z_ISREF_P(subject)) {
        subject = Z_REFVAL_P(subject);
        ZEND_VM_C_GOTO(try_again_subject);
    } else {
		if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(key) == IS_UNDEF)) {
			key = GET_OP1_UNDEF_CV(key, BP_VAR_R);
		}
		if (OP2_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(subject) == IS_UNDEF)) {
			subject = GET_OP2_UNDEF_CV(subject, BP_VAR_R);
		}
		zend_internal_type_error(EX_USES_STRICT_TYPES(), "array_key_exists() expects parameter 2 to be array, %s given", zend_get_type_by_const(Z_TYPE_P(subject)));
		FREE_OP2();
		FREE_OP1();
		ZEND_VM_SMART_BRANCH(result, 0);
		ZVAL_NULL(EX_VAR(opline->result.var));
		ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();
    }

The point is that the check for the reference is only done after the most likely case (subject is an array) is already handled, so that rare cases (references) don't have a runtime impact on common ones (array).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, got it!

zend_internal_type_error(EX_USES_STRICT_TYPES(), "array_key_exists() expects parameter 2 to be array, %s given", zend_get_type_by_const(Z_TYPE_P(subject)));
FREE_OP2();
FREE_OP1();
ZEND_VM_SMART_BRANCH(result, 0);
Copy link
Member

@nikic nikic Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe result is uninitialized here. It should be fine to just drop the smart branch line here, as this is a rare error case anyway.

@Majkl578 Majkl578 force-pushed the ZEND_ARRAY_KEY_EXISTS-opcode branch from 16c5a7b to bc6db1f Compare August 8, 2018 15:32
@cmb69
Copy link
Contributor

cmb69 commented Aug 8, 2018

Regarding PHP 7.3: I cannot assess how many extensions and SAPIs (phpdbg?) would be affected, and how much work it would be to update them, but given that Xdebug is not ready for PHP 7.3 yet, and the long UPGRADING.INTERNALS it might indeed be better to delay this to PHP 7.4. Otherwise this issue should be discussed on internals@.

@Majkl578
Copy link
Contributor Author

Majkl578 commented Aug 8, 2018

I only quickly checked code of XDebug and PHPDBG and found no usages of i.e. ZEND_COUNT or ZEND_IN_ARRAY (which is basically the same thing). Of course I don't know if there are any indirect implications for these tools, but it doesn't seem so, unlike i.e. new syntax.

@Majkl578
Copy link
Contributor Author

Majkl578 commented Aug 8, 2018

http://news.php.net/php.internals/103064

@Majkl578
Copy link
Contributor Author

Majkl578 commented Aug 9, 2018

(Waiting on author label can be removed.)

@Majkl578 Majkl578 changed the base branch from PHP-7.3 to master August 14, 2018 01:37
@Majkl578 Majkl578 force-pushed the ZEND_ARRAY_KEY_EXISTS-opcode branch from bc6db1f to c9a918b Compare August 14, 2018 01:38
@Majkl578
Copy link
Contributor Author

Alright, retargeted to master/7.4.

@Majkl578
Copy link
Contributor Author

Majkl578 commented Dec 21, 2018

@dstogov @nikic Also noticed that zend_vm_gen.php is not PHP 7.3-compatible and generates broken VM code:

PHP Warning:  preg_match_all(): The /e modifier is no longer supported, use preg_replace_callback instead in Zend/zend_vm_gen.php on line 996

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine for me and doesn't require RFC.
It may be merged after renaming GET_OP?_UNDEF_CV(...) into ZVAL_UNDEFINED_OP?().
I may take care about some final tuning after the merging.

result = zend_symtable_exists_ind(ht, ZSTR_EMPTY_ALLOC());
} else if (Z_TYPE_P(key) <= IS_NULL) {
if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_P(key) == IS_UNDEF)) {
GET_OP1_UNDEF_CV(key, BP_VAR_R);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GET_OP1_UNDEF_CV() and GET_OP2_UNDEF_CV() have to be changed into ZVAL_UNDEFINED_OP1() and ZVAL_UNDEFINED_OP2().

} else if (EXPECTED(Z_TYPE_P(key) == IS_LONG)) {
result = zend_hash_index_exists(ht, Z_LVAL_P(key));
} else if (UNEXPECTED(Z_TYPE_P(key) == IS_NULL)) {
result = zend_symtable_exists_ind(ht, ZSTR_EMPTY_ALLOC());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be zend_hash_exists_ind(). "" is not numeric.

@dstogov
Copy link
Member

dstogov commented Dec 21, 2018

@dstogov @nikic Also noticed that zend_vm_gen.php is not PHP 7.3-compatible and generates broken VM code:

PHP Warning:  preg_match_all(): The /e modifier is no longer supported, use preg_replace_callback instead in Zend/zend_vm_gen.php on line 996

Not a best place to report about the problem, but thanks anyway.
It's fixed now.

@dstogov
Copy link
Member

dstogov commented Dec 24, 2018

@Majkl578 do you like me to finalize and merge this?

@Majkl578
Copy link
Contributor Author

I'll handle the feedback over next few days. 👍

@Majkl578
Copy link
Contributor Author

@dstogov Adressed your comments and also rebased again. 👍

result = zend_hash_exists_ind(ht, ZSTR_EMPTY_ALLOC());
} else if (Z_TYPE_P(key) <= IS_NULL) {
if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_P(key) == IS_UNDEF)) {
ZVAL_UNDEFINED_OP1();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be OP2_TYPE ans ZVAL_UNDEFINED_OP2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems, you changed OP1->OP2 in wrong place. I'll take care about this and merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad. Everything is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. My mistake. It's already merged into master.

@nikic nikic closed this Dec 26, 2018
@nikic
Copy link
Member

nikic commented Dec 26, 2018

I've added the perf improvement section for UPGRADING which I mentioned above: f1c0e67

@Majkl578 Majkl578 deleted the ZEND_ARRAY_KEY_EXISTS-opcode branch December 26, 2018 21:52
@Majkl578
Copy link
Contributor Author

Thanks @nikic @dstogov @cmb69!

TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Nov 17, 2019
Make opcache replace the result with false if the array argument is
known to be empty.
This may be useful when a codebase has placeholders,
e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }`

In zend_inference.c: In php 8, array_key_exists will throw
a TypeError instead of returning null.

I didn't see any discussion of this optimization (for/against)
after a quick search on github, e.g. phpGH-3360

Potential future optimizations:

- convert `in_array($needle, ['only one element'], true)` to `===`?
  (or `==` for strict=false)
- When the number of elements is less than 4, switch to looping instead of hash
  lookup. (exact threshold for better performance to be determined)

  Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
germanow added a commit to germanow/arrays that referenced this pull request Feb 12, 2020
TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Dec 20, 2020
Make opcache replace the result with false if the array argument is
known to be empty.
This may be useful when a codebase has placeholders,
e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }`

In zend_inference.c: In php 8, array_key_exists will throw
a TypeError instead of returning null.

I didn't see any discussion of this optimization (for/against)
after a quick search on github, e.g. phpGH-3360

Potential future optimizations:

- convert `in_array($needle, ['only one element'], true)` to `===`?
  (or `==` for strict=false)
- When the number of elements is less than 4, switch to looping instead of hash
  lookup. (exact threshold for better performance to be determined)

  Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
Muqsit added a commit to Muqsit/VanillaGenerator that referenced this pull request Feb 19, 2021
Besides being faster (since php/php-src#3360), array_key_exists is also safer as it narrows down key existence to arrays only and performs type-checking on both parameters.
TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Dec 28, 2021
Make opcache replace the result with false if the array argument is
known to be empty.
This may be useful when a codebase has placeholders,
e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }`

In zend_inference.c: In php 8, array_key_exists will throw
a TypeError instead of returning null.

I didn't see any discussion of this optimization (for/against)
after a quick search on github, e.g. phpGH-3360

Potential future optimizations:

- convert `in_array($needle, ['only one element'], true)` to `===`?
  (or `==` for strict=false)
- When the number of elements is less than 4, switch to looping instead of hash
  lookup. (exact threshold for better performance to be determined)

  Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
TysonAndre added a commit to TysonAndre/php-src that referenced this pull request Dec 28, 2021
Make opcache replace the result with false if the array argument is
known to be empty.
This may be useful when a codebase has placeholders,
e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }`

In zend_inference.c: In php 8, array_key_exists will throw
a TypeError instead of returning null.

I didn't see any discussion of this optimization (for/against)
after a quick search on github, e.g. phpGH-3360

Potential future optimizations:

- convert `in_array($needle, ['only one element'], true)` to `===`?
  (or `==` for strict=false)
- When the number of elements is less than 4, switch to looping instead of hash
  lookup. (exact threshold for better performance to be determined)

  Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants