New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ZEND_ARRAY_KEY_EXISTS opcode to speed up array_key_exists() #3360

Closed
wants to merge 2 commits into
base: master
from

Conversation

8 participants
@Majkl578
Copy link
Contributor

Majkl578 commented Jul 2, 2018

This change makes array_key_exists() actually faster than isset() by ~25% (tested with GCC 8, -O3, march=native, mtune=native).

addresses https://bugs.php.net/bug.php?id=76148
addresses https://bugs.php.net/bug.php?id=71239

With a synthetic benchmark:

<?php

$foo = [123 => 456, 'abc' => 'def'];

$iterations = 500000000;

$a = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
    isset($foo[123]);
}
printf('isset (%dx): %.2fs%s', $iterations, microtime(true) - $a, PHP_EOL);

$b = microtime(true);
for ($i = 0; $i < $iterations; $i++) {
    array_key_exists(123, $foo);
}
printf('array_key_exists (%dx): %.2fs%s', $iterations, microtime(true) - $b, PHP_EOL);

It yields results like:
7.2 (unpatched):

isset (500000000x): 7.65s
array_key_exists (500000000x): 9.03s

7.3 + patch:

isset (500000000x): 7.60s
array_key_exists (500000000x): 5.65s
@nikic
Copy link
Member

nikic left a comment

Thanks for taking this on! I've left a few implementation notes, in particular in regard to how performance may be improved.

Show resolved Hide resolved Zend/zend_compile.c
subject = GET_OP2_ZVAL_PTR(BP_VAR_R);
key = GET_OP1_ZVAL_PTR(BP_VAR_R);

if (UNEXPECTED(Z_ISREF_P(subject))) {

This comment has been minimized.

@nikic

nikic Jul 3, 2018

Member

A more efficient (and for VM code, more idiomatic) way to handle this is to check for ISREF prior to the error case and use a goto try_again pattern. See https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L4419 for an example.


SAVE_OPLINE();

subject = GET_OP2_ZVAL_PTR(BP_VAR_R);

This comment has been minimized.

@nikic

nikic Jul 3, 2018

Member

It would be preferable to use GET_OP2_ZVAL_PTR_UNDEF here and check the UNDEF case for IS_CV inside the branch that generates the type error. For an example of delayed UNDEF handling see https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L59.

zend_error(E_WARNING, "array_key_exists(): The first argument should be either a string or an integer");
result = 0;
}
ZVAL_BOOL(EX_VAR(opline->result.var), result);

This comment has been minimized.

@nikic

nikic Jul 3, 2018

Member

The code between IS_ARRAY and IS_OBJECT is duplicated here (which is actually rather dubious and will handle some cases incorrectly, but that's an existing issue and best solved by deprecating array_key_exists on objects entirely). It would be better to only extract the hashtable using Z_ARRVAL_P and Z_OBJPROP_P in these branches and have only one copy of the common key handling code afterwards.

@@ -2394,6 +2394,7 @@ static int zend_update_type_info(const zend_op_array *op_array,
case ZEND_ISSET_ISEMPTY_STATIC_PROP:
case ZEND_ASSERT_CHECK:
case ZEND_IN_ARRAY:
case ZEND_ARRAY_KEY_EXISTS:
UPDATE_SSA_TYPE(MAY_BE_FALSE|MAY_BE_TRUE, ssa_ops[i].result_def);

This comment has been minimized.

@nikic

nikic Jul 3, 2018

Member

array_key_exists() can also return null for the zpp failure case.

Additionally, ARRAY_KEY_EXISTS needs to be handled in https://github.com/php/php-src/blob/master/ext/opcache/Optimizer/sccp.c#L1328. Looking at the ISSET_ISEMPTY_DIM_OBJ case should provide inspiration here.


FREE_OP2();
FREE_OP1();
ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();

This comment has been minimized.

@nikic

nikic Jul 3, 2018

Member

array_key_exists() is a function that would benefit from smart branch handling. See https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L8085 how this is done for ZEND_IN_ARRAY. You'll also have to add it to https://github.com/php/php-src/blob/master/Zend/zend_compile.c#L2181.

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Jul 3, 2018

@nikic Thanks for your review 🥇, I'll try to address it later this week.

@KalleZ

This comment has been minimized.

Copy link
Contributor

KalleZ commented Jul 3, 2018

Also you should just revert the change to NEWS, usually the merger of the PR will do this part =)

@cmb69

This comment has been minimized.

Copy link
Contributor

cmb69 commented Jul 3, 2018

Also you should just revert the change to NEWS, usually the merger of the PR will do this part =)

For this PR an (additional) entry in UPGRADING appears to be appropriate (UPGRADING usually serves as base for the migration guide in the PHP manual, but NEWS is likely to be overlooked, and this improvement seems to be quite noteworthy).

@laruence

This comment has been minimized.

Copy link
Member

laruence commented Jul 4, 2018

Although isset and array_key_exists behavior slightly different, but I think in most cases, array_key_exists can be repalced with isset..

thus, I doubt such optimization is worthy to do...

@nikic

This comment has been minimized.

Copy link
Member

nikic commented Jul 4, 2018

@laruence array_key_exists() is common in foundational libraries that need to support null values. These are often very hot code paths and currently the usual way to handle them is if (isset($array[$key]) || \array_key_exists($key, $array)), because directly calling array_key_exists() is too slow.

@nikic

This comment has been minimized.

Copy link
Member

nikic commented Jul 4, 2018

@cmb69 Currently we usually do not include performance changes in UPGRADING, unless they come with some observable behavior change. But I think it's a good idea to start doing this, maybe under a new "Performance Improvements" section. We do tend to have a fair bit of improvements in each release, but they aren't really called out anywhere.

@cmb69

This comment has been minimized.

Copy link
Contributor

cmb69 commented Jul 4, 2018

@nikic Even if we won't document performance changes, I still think this very change should be documented, since it can affect how userland code will be written. You already gave a typical example which could be improved wrt. this PR.

@hikari-no-yume

This comment has been minimized.

Copy link
Contributor

hikari-no-yume commented Jul 4, 2018

By the way, it would be nice to have a comment above the code for the normal function array_key_exists that mentions this special case. I added one for all the special cases there were some time ago (2015 maybe?), because I had modified a special-cased function's behaviour (in the actual function, not the compiler) and spent some time confused as to why the change wasn't taking effect.

@Majkl578 Majkl578 force-pushed the Majkl578:ZEND_ARRAY_KEY_EXISTS-opcode branch from 2265ca4 to 6a76ae4 Aug 8, 2018

@Majkl578 Majkl578 changed the base branch from master to PHP-7.3 Aug 8, 2018

@Majkl578 Majkl578 force-pushed the Majkl578:ZEND_ARRAY_KEY_EXISTS-opcode branch from 6a76ae4 to 4ff60bb Aug 8, 2018

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Aug 8, 2018

@nikic I adressed all your comments, except one: SCCP for this new opcode, since it's a bit beyond my current knowledge of PHP internals. May I ask you or anyone else to help me with this one? Thanks!

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Aug 8, 2018

gcc-8 -O0

isset (100000000x): 3.57s
array_key_exists (100000000x): 3.33s

gcc-8 -O3

isset (100000000x): 1.49s
array_key_exists (100000000x): 1.34s

@nikic Can you please another round of review? Thanks.

@Majkl578 Majkl578 force-pushed the Majkl578:ZEND_ARRAY_KEY_EXISTS-opcode branch from 4ff60bb to 16c5a7b Aug 8, 2018

@nikic

This comment has been minimized.

Copy link
Member

nikic commented Aug 8, 2018

/cc @cmb69 for whether this can still land in 7.3. It introduces a new opcode, not sure what our policy is there. I guess as it may require further extension changes (xdebug support etc) probably too late?


ZEND_VM_C_LABEL(try_again):

if (UNEXPECTED(Z_ISREF_P(subject)) || UNEXPECTED(Z_ISREF_P(key))) {

This comment has been minimized.

@nikic

nikic Aug 8, 2018

Member

This isn't quite what I had in mind regarding the reference handling. The idea is to do something like this:

ZEND_VM_C_LABEL(try_again_subject):
	if (EXPECTED(Z_TYPE_P(subject) == IS_ARRAY)) {
		ht = Z_ARRVAL_P(subject);
	} else if (UNEXPECTED(Z_TYPE_P(subject) == IS_OBJECT)) {
		ht = Z_OBJPROP_P(subject);
	} else if (Z_TYPE_P(Z_ISREF_P(subject)) {
        subject = Z_REFVAL_P(subject);
        ZEND_VM_C_GOTO(try_again_subject);
    } else {
		if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(key) == IS_UNDEF)) {
			key = GET_OP1_UNDEF_CV(key, BP_VAR_R);
		}
		if (OP2_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(subject) == IS_UNDEF)) {
			subject = GET_OP2_UNDEF_CV(subject, BP_VAR_R);
		}
		zend_internal_type_error(EX_USES_STRICT_TYPES(), "array_key_exists() expects parameter 2 to be array, %s given", zend_get_type_by_const(Z_TYPE_P(subject)));
		FREE_OP2();
		FREE_OP1();
		ZEND_VM_SMART_BRANCH(result, 0);
		ZVAL_NULL(EX_VAR(opline->result.var));
		ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();
    }

The point is that the check for the reference is only done after the most likely case (subject is an array) is already handled, so that rare cases (references) don't have a runtime impact on common ones (array).

This comment has been minimized.

@Majkl578

Majkl578 Aug 8, 2018

Contributor

Hah, got it!

zend_internal_type_error(EX_USES_STRICT_TYPES(), "array_key_exists() expects parameter 2 to be array, %s given", zend_get_type_by_const(Z_TYPE_P(subject)));
FREE_OP2();
FREE_OP1();
ZEND_VM_SMART_BRANCH(result, 0);

This comment has been minimized.

@nikic

nikic Aug 8, 2018

Member

I believe result is uninitialized here. It should be fine to just drop the smart branch line here, as this is a rare error case anyway.

Show resolved Hide resolved Zend/zend_vm_def.h

@Majkl578 Majkl578 force-pushed the Majkl578:ZEND_ARRAY_KEY_EXISTS-opcode branch from 16c5a7b to bc6db1f Aug 8, 2018

@cmb69

This comment has been minimized.

Copy link
Contributor

cmb69 commented Aug 8, 2018

Regarding PHP 7.3: I cannot assess how many extensions and SAPIs (phpdbg?) would be affected, and how much work it would be to update them, but given that Xdebug is not ready for PHP 7.3 yet, and the long UPGRADING.INTERNALS it might indeed be better to delay this to PHP 7.4. Otherwise this issue should be discussed on internals@.

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Aug 8, 2018

I only quickly checked code of XDebug and PHPDBG and found no usages of i.e. ZEND_COUNT or ZEND_IN_ARRAY (which is basically the same thing). Of course I don't know if there are any indirect implications for these tools, but it doesn't seem so, unlike i.e. new syntax.

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Aug 8, 2018

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Aug 9, 2018

(Waiting on author label can be removed.)

@Majkl578 Majkl578 changed the base branch from PHP-7.3 to master Aug 14, 2018

@Majkl578 Majkl578 force-pushed the Majkl578:ZEND_ARRAY_KEY_EXISTS-opcode branch from bc6db1f to c9a918b Aug 14, 2018

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Aug 14, 2018

Alright, retargeted to master/7.4.

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Sep 3, 2018

Right now there is this discrepancy with normal function:
PR:

 $ sapi/cli/php -n -derror_reporting=-1 -r 'array_key_exists($a, $b);'

Notice: Undefined variable: b in Command line code on line 1

Warning: array_key_exists() expects parameter 2 to be array, null given in Command line code on line 1

PHP 7.2.9:

$ php -n -derror_reporting=-1 -r 'array_key_exists($a, $b);'

Notice: Undefined variable: a in Command line code on line 1

Notice: Undefined variable: b in Command line code on line 1

Warning: array_key_exists() expects parameter 2 to be array, null given in Command line code on line 1

@nikic Can we handle this while still keeping delayed undef checks? Maybe explicitly trigger GET_OP1_UNDEF_CV() in the subject's else branch?

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Sep 3, 2018

The following diff fixes the difference in notices (and keeps the same order as well):

diff --git a/Zend/zend_vm_def.h b/Zend/zend_vm_def.h
index a0ee5809aa..bcb36fc08b 100644
--- a/Zend/zend_vm_def.h
+++ b/Zend/zend_vm_def.h
@@ -6428,6 +6428,7 @@ ZEND_VM_C_LABEL(try_again_subject):
                subject = Z_REFVAL_P(subject);
                ZEND_VM_C_GOTO(try_again_subject);
        } else {
+               GET_OP1_UNDEF_CV(key, BP_VAR_R);
                if (OP2_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(subject) == IS_UNDEF)) {
                        subject = GET_OP2_UNDEF_CV(subject, BP_VAR_R);
                }

If this approach is ok (it looks a bit hackish), I'll apply it here.

@nikic

This comment has been minimized.

Copy link
Member

nikic commented Sep 3, 2018

@Majkl578 It should be

if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_P(key) == IS_UNDEF)) {
    GET_OP1_UNDEF_CV(key, BP_VAR_R);
}

as in the other branch. Both variables can be potentially undef at that point, so they both need to be checked.

@Majkl578 Majkl578 force-pushed the Majkl578:ZEND_ARRAY_KEY_EXISTS-opcode branch 2 times, most recently from 2ccb530 to ae6a94d Sep 4, 2018

@cmb69 cmb69 referenced this pull request Sep 8, 2018

Closed

Implement exists() construct #1530

@Majkl578 Majkl578 force-pushed the Majkl578:ZEND_ARRAY_KEY_EXISTS-opcode branch from ae6a94d to 4f0a9e7 Oct 5, 2018

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Oct 5, 2018

rebased

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Nov 8, 2018

Hi, anyone who could tell me what else needs to be done here so the PR can get merged? Thanks. 🙏

@cmb69

This comment has been minimized.

Copy link
Contributor

cmb69 commented Nov 8, 2018

I'm somewhat concerned regarding @dstogov's comment.

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Dec 21, 2018

It's both funny and sad to see this PR blocked because it adds one opcode with almost unnoticeable performance impact while there is running vote on a major RFC which doesn't even provide benchmarks at all.

@cmb69

This comment has been minimized.

Copy link
Contributor

cmb69 commented Dec 21, 2018

If in doubt, utilize the RFC process. :)

@nikic

This comment has been minimized.

Copy link
Member

nikic commented Dec 21, 2018

I'm ok with landing this. To address the VM size concern, we might want to extract parts of this into functions, to reduce specialization.

UPGRADING Outdated
- Core:
. The array_key_exists() function is now optimized into a specialized
ZEND_ARRAY_KEY_EXISTS instruction.

This comment has been minimized.

@dstogov

dstogov Dec 21, 2018

Contributor

This doesn't change anything for regular users. I would move this into UPGRADING.INTERNALS or completely remove.

This comment has been minimized.

@Majkl578

Majkl578 Dec 21, 2018

Contributor

Alright. 👌

This comment has been minimized.

@cmb69

cmb69 Dec 21, 2018

Contributor

But see #3360 (comment).

This comment has been minimized.

@Majkl578

Majkl578 Dec 21, 2018

Contributor

I'd say let's leave the decision up to the 7.4 RM?

@Majkl578 Majkl578 force-pushed the Majkl578:ZEND_ARRAY_KEY_EXISTS-opcode branch from 4f0a9e7 to 9d43818 Dec 21, 2018

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Dec 21, 2018

@dstogov @nikic Also noticed that zend_vm_gen.php is not PHP 7.3-compatible and generates broken VM code:

PHP Warning:  preg_match_all(): The /e modifier is no longer supported, use preg_replace_callback instead in Zend/zend_vm_gen.php on line 996
@dstogov
Copy link
Contributor

dstogov left a comment

This looks fine for me and doesn't require RFC.
It may be merged after renaming GET_OP?_UNDEF_CV(...) into ZVAL_UNDEFINED_OP?().
I may take care about some final tuning after the merging.

result = zend_symtable_exists_ind(ht, ZSTR_EMPTY_ALLOC());
} else if (Z_TYPE_P(key) <= IS_NULL) {
if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_P(key) == IS_UNDEF)) {
GET_OP1_UNDEF_CV(key, BP_VAR_R);

This comment has been minimized.

@dstogov

dstogov Dec 21, 2018

Contributor

GET_OP1_UNDEF_CV() and GET_OP2_UNDEF_CV() have to be changed into ZVAL_UNDEFINED_OP1() and ZVAL_UNDEFINED_OP2().

} else if (EXPECTED(Z_TYPE_P(key) == IS_LONG)) {
result = zend_hash_index_exists(ht, Z_LVAL_P(key));
} else if (UNEXPECTED(Z_TYPE_P(key) == IS_NULL)) {
result = zend_symtable_exists_ind(ht, ZSTR_EMPTY_ALLOC());

This comment has been minimized.

@dstogov

dstogov Dec 21, 2018

Contributor

This may be zend_hash_exists_ind(). "" is not numeric.

@dstogov

This comment has been minimized.

Copy link
Contributor

dstogov commented Dec 21, 2018

@dstogov @nikic Also noticed that zend_vm_gen.php is not PHP 7.3-compatible and generates broken VM code:

PHP Warning:  preg_match_all(): The /e modifier is no longer supported, use preg_replace_callback instead in Zend/zend_vm_gen.php on line 996

Not a best place to report about the problem, but thanks anyway.
It's fixed now.

@dstogov

This comment has been minimized.

Copy link
Contributor

dstogov commented Dec 24, 2018

@Majkl578 do you like me to finalize and merge this?

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Dec 24, 2018

I'll handle the feedback over next few days. 👍

Majkl578 and others added some commits Jul 2, 2018

@Majkl578 Majkl578 force-pushed the Majkl578:ZEND_ARRAY_KEY_EXISTS-opcode branch from 9d43818 to f473132 Dec 26, 2018

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Dec 26, 2018

@dstogov Adressed your comments and also rebased again. 👍

result = zend_hash_exists_ind(ht, ZSTR_EMPTY_ALLOC());
} else if (Z_TYPE_P(key) <= IS_NULL) {
if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_P(key) == IS_UNDEF)) {
ZVAL_UNDEFINED_OP1();

This comment has been minimized.

@dstogov

dstogov Dec 26, 2018

Contributor

Must be OP2_TYPE ans ZVAL_UNDEFINED_OP2.

This comment has been minimized.

@dstogov

dstogov Dec 26, 2018

Contributor

It seems, you changed OP1->OP2 in wrong place. I'll take care about this and merge.

This comment has been minimized.

@dstogov

dstogov Dec 26, 2018

Contributor

Oh, my bad. Everything is fine.

This comment has been minimized.

@dstogov

dstogov Dec 26, 2018

Contributor

yeah. My mistake. It's already merged into master.

@nikic nikic closed this Dec 26, 2018

@nikic

This comment has been minimized.

Copy link
Member

nikic commented Dec 26, 2018

I've added the perf improvement section for UPGRADING which I mentioned above: f1c0e67

@Majkl578 Majkl578 deleted the Majkl578:ZEND_ARRAY_KEY_EXISTS-opcode branch Dec 26, 2018

@Majkl578

This comment has been minimized.

Copy link
Contributor

Majkl578 commented Dec 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment