[ext/standard] Specialize min()/max() for numeric arrays#22127
[ext/standard] Specialize min()/max() for numeric arrays#22127mehmetcansahin wants to merge 4 commits into
Conversation
|
You may need to add an entry to the UPGRADING file, Performance Improvements section. |
| return; | ||
| } | ||
|
|
||
| zval *result = zend_hash_minmax(array, php_data_compare, 0); |
There was a problem hiding this comment.
zval *result = zend_hash_minmax(array, php_data_compare, 0);
It seems this API is now only used for min/max functions (se https://sourcegraph.com/search?q=context:global+zend_hash_minmax+-f:zend_hash.c+-f:zend_hash.h+-f:standard/array.c&patternType=keyword&sm=0), so we could probably move it into array.c and stop using function pointers which might also increase performance.
There was a problem hiding this comment.
15babe3 The generic min/max scan now lives in ext/standard/array.c as php_array_data_minmax(), and zend_hash_minmax() was removed from the Zend HashTable API. I also removed the php_data_compare() function-pointer wrapper, so the fallback path now calls zend_compare() directly.
Updated UPGRADING.INTERNALS and refreshed the PR description with the new benchmark/testing details.
Girgias
left a comment
There was a problem hiding this comment.
I'd rather split the PR in two and have the move from zend_hash.c to array.c be done first so that this becomes the new baseline regarding performance.
| while (1) { | ||
| if (idx == array->nNumUsed) { | ||
| return NULL; | ||
| } | ||
| if (Z_TYPE(array->arPacked[idx]) != IS_UNDEF) { | ||
| break; | ||
| } | ||
| idx++; | ||
| } | ||
| res = array->arPacked + idx; | ||
| for (; idx < array->nNumUsed; idx++) { | ||
| zv = array->arPacked + idx; | ||
| if (UNEXPECTED(Z_TYPE_P(zv) == IS_UNDEF)) { | ||
| continue; | ||
| } | ||
|
|
||
| if (max) { | ||
| if (zend_compare(res, zv) < 0) { | ||
| res = zv; | ||
| } | ||
| } else { | ||
| if (zend_compare(res, zv) > 0) { | ||
| res = zv; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Can we not use the usual ZEND_FOREACH API?
| while (1) { | ||
| if (idx == array->nNumUsed) { | ||
| return NULL; | ||
| } | ||
| if (Z_TYPE(array->arData[idx].val) != IS_UNDEF) { | ||
| break; | ||
| } | ||
| idx++; | ||
| } | ||
| res = &array->arData[idx].val; | ||
| for (; idx < array->nNumUsed; idx++) { | ||
| p = array->arData + idx; | ||
| if (UNEXPECTED(Z_TYPE(p->val) == IS_UNDEF)) { | ||
| continue; | ||
| } | ||
|
|
||
| if (max) { | ||
| if (zend_compare(res, &p->val) < 0) { | ||
| res = &p->val; | ||
| } | ||
| } else { | ||
| if (zend_compare(res, &p->val) > 0) { | ||
| res = &p->val; | ||
| } |
Summary
This specializes the array-form
min()/max()path for numeric arrays.The fast path compares homogeneous
IS_LONGandIS_DOUBLEvalues directly while scanning the array. For arrays that remain all-long or all-double, the result is returned directly withZVAL_LONG()orZVAL_DOUBLE(), avoiding generic comparison dispatch and zval copy/deref overhead on the hot path.If the array is empty, the first value is non-numeric, or a later value requires generic comparison semantics, execution falls back to the generic scan path in
array.cand compares values withzend_compare()directly. NaN-sensitive double cases are also handed to the generic comparison path to preserve existing ordering behavior.The generic min/max scan helper now lives in
ext/standard/array.c, so the internalzend_hash_minmax()helper and thephp_data_compare()function-pointer wrapper are no longer needed.An
UPGRADINGentry was added under Performance Improvements, andUPGRADING.INTERNALSnotes the removal of the internalzend_hash_minmax()API.Benchmark
Local CLI build:
Debug Build => no--disable-all --enable-clin=100000700min()+max()iterationsorigin/masterFallback / non-numeric cases from the original benchmark:
origin/masterAfter moving the generic fallback into
array.cand callingzend_compare()directly instead of using a function-pointer wrapper, I also compared the previous PR branch against the current branch with 10k-element arrays, 5k iterations, and 11 samples per case:min()first string, then longsmax()first string, then longsmin()strings onlymax()strings onlyAcross repeated samples, integer-only arrays stay roughly 3x faster and float-only arrays are roughly 4.8x-5x faster. Generic fallback cases stay within noise to a small improvement after removing the function-pointer wrapper.
Testing
git diff --checkmake -j$(sysctl -n hw.ncpu)TEST_PHP_EXECUTABLE=sapi/cli/php sapi/cli/php run-tests.php -q ext/standard/tests/array/min*.phpt ext/standard/tests/array/max*.phpt Zend/tests/ArrayAccess/bug78356.phptResult: