Skip to content

[ext/standard] Specialize min()/max() for numeric arrays#22127

Open
mehmetcansahin wants to merge 4 commits into
php:masterfrom
mehmetcansahin:minmax-long-array-fast-path
Open

[ext/standard] Specialize min()/max() for numeric arrays#22127
mehmetcansahin wants to merge 4 commits into
php:masterfrom
mehmetcansahin:minmax-long-array-fast-path

Conversation

@mehmetcansahin
Copy link
Copy Markdown
Contributor

@mehmetcansahin mehmetcansahin commented May 22, 2026

Summary

This specializes the array-form min() / max() path for numeric arrays.

The fast path compares homogeneous IS_LONG and IS_DOUBLE values directly while scanning the array. For arrays that remain all-long or all-double, the result is returned directly with ZVAL_LONG() or ZVAL_DOUBLE(), avoiding generic comparison dispatch and zval copy/deref overhead on the hot path.

If the array is empty, the first value is non-numeric, or a later value requires generic comparison semantics, execution falls back to the generic scan path in array.c and compares values with zend_compare() directly. NaN-sensitive double cases are also handed to the generic comparison path to preserve existing ordering behavior.

The generic min/max scan helper now lives in ext/standard/array.c, so the internal zend_hash_minmax() helper and the php_data_compare() function-pointer wrapper are no longer needed.

An UPGRADING entry was added under Performance Improvements, and UPGRADING.INTERNALS notes the removal of the internal zend_hash_minmax() API.

Benchmark

Local CLI build:

  • Debug Build => no
  • --disable-all --enable-cli
  • opcache/JIT disabled
  • separate baseline, long-only, and current CLI binaries
  • n=100000
  • each sample runs 700 min()+max() iterations
  • 5 samples per case
  • table reports median seconds
Case origin/master long-only current long+double Result vs master
packed long 0.612s 0.174s 0.171s 3.58x faster
sparse long 0.492s 0.163s 0.162s 3.04x faster
packed double 0.796s 0.747s 0.164s 4.86x faster
sparse double 0.757s 0.699s 0.158s 4.80x faster
mixed long, late fallback 0.573s 0.172s 0.168s 3.41x faster
mixed double, late fallback 0.781s 0.743s 0.165s 4.74x faster
double with late NaN 0.818s 0.743s 0.163s 5.01x faster

Fallback / non-numeric cases from the original benchmark:

Case origin/master current long+double
first string, then longs 1.540s 1.430s
strings only 2.818s 2.901s

After moving the generic fallback into array.c and calling zend_compare() directly instead of using a function-pointer wrapper, I also compared the previous PR branch against the current branch with 10k-element arrays, 5k iterations, and 11 samples per case:

Case previous PR branch current branch Result
min() first string, then longs 482.5ms 426.9ms ~11.5% faster
max() first string, then longs 174.7ms 159.9ms ~8.5% faster
min() strings only 887.6ms 867.8ms ~2.2% faster
max() strings only 992.4ms 953.9ms ~3.9% faster

Across repeated samples, integer-only arrays stay roughly 3x faster and float-only arrays are roughly 4.8x-5x faster. Generic fallback cases stay within noise to a small improvement after removing the function-pointer wrapper.

Testing

  • git diff --check
  • make -j$(sysctl -n hw.ncpu)
  • TEST_PHP_EXECUTABLE=sapi/cli/php sapi/cli/php run-tests.php -q ext/standard/tests/array/min*.phpt ext/standard/tests/array/max*.phpt Zend/tests/ArrayAccess/bug78356.phpt

Result:

  • 14 tests passed
  • 0 failed
  • 0 warned

@mehmetcansahin mehmetcansahin marked this pull request as ready for review May 22, 2026 14:03
@mehmetcansahin mehmetcansahin requested a review from bukka as a code owner May 22, 2026 14:03
@LamentXU123
Copy link
Copy Markdown
Contributor

You may need to add an entry to the UPGRADING file, Performance Improvements section.
https://github.com/php/php-src/blob/master/UPGRADING#L382

@mehmetcansahin mehmetcansahin changed the title [ext/standard] Specialize min()/max() for long arrays [ext/standard] Specialize min()/max() for numeric arrays May 22, 2026
Comment thread ext/standard/array.c Outdated
return;
}

zval *result = zend_hash_minmax(array, php_data_compare, 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

			zval *result = zend_hash_minmax(array, php_data_compare, 0);

It seems this API is now only used for min/max functions (se https://sourcegraph.com/search?q=context:global+zend_hash_minmax+-f:zend_hash.c+-f:zend_hash.h+-f:standard/array.c&patternType=keyword&sm=0), so we could probably move it into array.c and stop using function pointers which might also increase performance.

Copy link
Copy Markdown
Contributor Author

@mehmetcansahin mehmetcansahin May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15babe3 The generic min/max scan now lives in ext/standard/array.c as php_array_data_minmax(), and zend_hash_minmax() was removed from the Zend HashTable API. I also removed the php_data_compare() function-pointer wrapper, so the fallback path now calls zend_compare() directly.

Updated UPGRADING.INTERNALS and refreshed the PR description with the new benchmark/testing details.

Copy link
Copy Markdown
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather split the PR in two and have the move from zend_hash.c to array.c be done first so that this becomes the new baseline regarding performance.

Comment thread ext/standard/array.c
Comment on lines +1105 to +1130
while (1) {
if (idx == array->nNumUsed) {
return NULL;
}
if (Z_TYPE(array->arPacked[idx]) != IS_UNDEF) {
break;
}
idx++;
}
res = array->arPacked + idx;
for (; idx < array->nNumUsed; idx++) {
zv = array->arPacked + idx;
if (UNEXPECTED(Z_TYPE_P(zv) == IS_UNDEF)) {
continue;
}

if (max) {
if (zend_compare(res, zv) < 0) {
res = zv;
}
} else {
if (zend_compare(res, zv) > 0) {
res = zv;
}
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use the usual ZEND_FOREACH API?

Comment thread ext/standard/array.c
Comment on lines +1135 to +1158
while (1) {
if (idx == array->nNumUsed) {
return NULL;
}
if (Z_TYPE(array->arData[idx].val) != IS_UNDEF) {
break;
}
idx++;
}
res = &array->arData[idx].val;
for (; idx < array->nNumUsed; idx++) {
p = array->arData + idx;
if (UNEXPECTED(Z_TYPE(p->val) == IS_UNDEF)) {
continue;
}

if (max) {
if (zend_compare(res, &p->val) < 0) {
res = &p->val;
}
} else {
if (zend_compare(res, &p->val) > 0) {
res = &p->val;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants