Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array_key_exists on $GLOBALS takes linear time in PHP 8.1.0+ #9429

Open
leejt opened this issue Aug 26, 2022 · 3 comments
Open

array_key_exists on $GLOBALS takes linear time in PHP 8.1.0+ #9429

leejt opened this issue Aug 26, 2022 · 3 comments

Comments

@leejt
Copy link

leejt commented Aug 26, 2022

Description

Compare the following in PHP 8.0.22 vs 8.1.9 (https://3v4l.org/91CFl)

<?php

for ($i = 0; $i < 100000; $i++) {
    $GLOBALS["a$i"] = rand(0, 1000000);
}

$start = microtime(true);
for ($runs = 1000; $runs > 0; $runs--) {
    array_key_exists('a1111111', $GLOBALS);
}
$end = microtime(true);
printf("[GLOBALS]array_key_exists() took %.5f seconds\n", $end - $start);

It takes 0.00002 seconds in 8.0.22, and 1.9 seconds in 8.1.9.

Further testing makes it clear that the speed of array_key_exists on $GLOBALS increases linearly with the size of $GLOBALS.

This regression doesn't affect array_key_exists on non-$GLOBALS arrays, and doesn't affect the speed of isset($GLOBALS[$foo]).

The commit that introduced the slowdown was 3c68f38, which restricted usage of $GLOBALS in various ways. I'm not sure if this particular side effect was known or intended.

array_key_exists is used somewhat often on $GLOBALS in MediaWiki, most prominently here.

The behavior can be mostly replaced by calling isset($GLOBALS[$foo]), which remains fast, although that has different semantics when the value is null. I'm not aware of a better way to get the full behavior of array_key_exists($foo, $GLOBALS) in constant-time in 8.1+.

PHP Version

PHP 8.1.9

Operating System

Ubuntu 20.04

@bwoebi
Copy link
Member

bwoebi commented Aug 26, 2022

What the change does is always rebuilding $GLOBALS when not doing a direct array manipulation on it.

Hence, as you are passing the whole of $GLOBALS to a function, the whole of $GLOBALS will be recomputed at runtime for each access.

To fix this, there are essentially a couple options: a) special case array_key_exists on $GLOBALS, b) revert that $GLOBALS change or c) just mitigate it by caching $GLOBALS until the next global symtable update adding a new value there?

@TysonAndre
Copy link
Contributor

TysonAndre commented Aug 27, 2022

I'm not sure if this particular side effect was known or intended.

It's intended - https://wiki.php.net/rfc/restrict_globals_usage was the RFC for that in php 8.1 - it seemed obvious it'd copy all of the globals every time the full $GLOBALS was read


3c68f38#diff-b3cd91650e41ac88d84b327235481f7846530bb7bc9d23b974b3a95326175578 - I think maybe this can be special cased by replacing this with an alternative opcode to ARRAY_KEY_EXISTS with a different name for globals, which will probably require a new opcode. This should behave exactly like $localVariable = $GLOBALS; array_key_exists($name, $localVariable);, including the edge case of ${1} = 123; now creating $GLOBALS[1] as of PHP 8.1+ (actually backed by the string '1' but converted to an integer in FETCH_GLOBALS, meaning both array_key_exists(1, $GLOBALS) and array_key_exists('1', $GLOBALS) now succeed in php 8.1+

(This opcode would need to convert integer keys to strings and emit any notices (e.g. float handling, deprecation of 0.5 coercion to array key?) to imitate the old behavior)
// existing opcodes
0000 CV0($key) = RECV 1
0001 T2 = FETCH_GLOBALS
0002 T1 = ARRAY_KEY_EXISTS CV0($key) T2
function test(string $key) {
    return array_key_exists($key, $GLOBALS);
}

Miscellaneous thoughts:

  • ARRAY_KEY_EXISTS opcode is only used when array_key_exists is unambiguously referring to the function in the global namespace, e.g. namespace Foo; array_key_exists(...); won't work, but \array_key_exists or use array_key_exists; will work

  • Adding helper functions such as is_global_variable_defined() or is_defined_variable($variable, globalScope: true) may help, e.g. in a PECL or through the RFC process if it passes, but that isn't really useful to existing applications

  • In the worst case, you can override the error handler with set_error_handler to check for the notice emitted by $GLOBALS[$undefinedVarName], but typical applications won't have hundreds of thousands of variables, so I can't imagine anyone doing that.

    Maybe a heuristic could be added to permanently switch to the slower handler the first time the application sees that $GLOBALS has more than some threshold (e.g. 1000 values) at any point

@hormus
Copy link

hormus commented Sep 11, 2022

It is not the size of the Superglobal, if you create a collection with an key or index rand, the element it is much faster but rather it looks like $key = 0; $GLOBALS['a' . $key]; or $GLOBALS[$key]; or for ($i = 0; $i < 500000; $i++) { ${$i} = rand(0, 499999); } the reason for the regression. As already mentioned, an assignment by variable value is faster. $copy = $GLOBALS;

<?php

$start = microtime(true);
if(empty($GLOBALS['' . 'rand'])) {
$GLOBALS['' . 'rand'] = array();
}
for ($i = 0; $i < 500000; $i++) {
    $rand['' . $i] = rand(0, 499999);
    //$GLOBALS['a' . $i] = $rand;
}
$arr = NULL;
unset($arr);
$arr = range(0, count($rand) - 1);
$GLOBALS['' . 'rand'] = array_combine($arr, $rand);
$arr = NULL;
unset($arr);

$mid = microtime(true);

for ($runs = 1000; $runs > 0; $runs--) {
    array_key_exists(1111111, $GLOBALS['' . 'rand']);
}
$end = microtime(true);
printf('array_key_exists(1111111, $GLOBALS[\'\' . \'rand\']) took %1$.5F seconds for write array %2$.5F seconds and total time %3$.5F' . "\n", $end - $mid, $mid - $start, $end - $start );
var_dump(memory_get_usage(true), ini_get('memory_limit'));


?>

note* the Variable Variables $i = 0; ${$i} = 'hi'; syntax is always available but in php 8.1.0 it doesn't work if there are a lot of them

<?php

$start = microtime(true);
for ($i = 0; $i < 100000; $i++) {
    $rand = rand(0, 99999);
    ${$i} = $rand; // As of PHP 5.4 $GLOBALS is now initialized just-in-time
}

$mid = microtime(true);
//$copy = $GLOBALS;
$return = NULL;
for ($runs = 1000; $runs > 0; $runs--) {
    $false = array_key_exists('1111111', $GLOBALS);
        if($false) {
            $return = $false;
        }
}
$copy = NULL;
unset($copy);
$end = microtime(true);
printf('array_key_exists(\'1111111\', $GLOBALS) took %1$.5F seconds for write array %2$.5F seconds and total time %3$.5F' . "\n", $end - $mid, $mid - $start, $end - $start );
var_dump(memory_get_usage(true), ini_get('memory_limit'), $return);


?>

From php >= 8.1.0 and array_key_exists for $key
Process exited with code 137.

php < 8.1.0

array_key_exists('1111111', $GLOBALS) took 0.00004 seconds for write array 0.01427 seconds and total time 0.01431
int(9437184)
string(3) "64M"
NULL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants