-
Notifications
You must be signed in to change notification settings - Fork 8k
Fix SORT_REGULAR with new transitive comparison functions #20517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Girgias
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Various comments and questions and this needs a rebase as I refactored the sorting code to remove a bunch of duplication.
| static int php_array_hash_compare_transitive(zval *zv1, zval *zv2) /* {{{ */ | ||
| { | ||
| return php_array_compare_transitive(zv1, zv2); | ||
| } | ||
| /* }}} */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept this one so we can pass a compare_func_t to zend_hash_compare().
php_array_compare_transitive() doesn’t match that signature, so we still need this tiny adapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment is no longer valid, this can be removed, but I noticed a measurable regression in my benchmarks after removing it, so I decided to keep it in place. I should probably include a comment in the function regarding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you benchmarking? Because I don't really see why it would regress?
|
@Girgias thank you for taking the time to provide the careful review! Looks like I was able to capture your sorting code refactor when I created this new branch. I'll push a fresh commit what I addressed in your code comments. Thanks again for the help! |
f6e9f05 to
374a660
Compare
Girgias
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please only do the fix for the transitivity.
Optimizations can be decided later, but currently it just pollutes the PR and makes it harder to review and merge.
- Add zend_compare_{long,double}_to_string_ex() plus
zendi_smart_strcmp_ex() so SORT_REGULAR can invoke transitive-aware
scalar comparisons without touching zend_compare()
- Introduce php_array_compare_transitive() (pared-down zend_compare())
and php_array_compare_transitive_objects() (mirrors
zend_std_compare_objects()) so arrays, objects, and enums recurse with
transitive ordering
- Route the public sort APIs and array_unique() through
php_array_sort_regular() so PHP_SORT_REGULAR always uses the new
comparator
- Add regression tests: phpGH-20262 (array_unique with enums/objects/nested
arrays) plus SORT_REGULAR consistency tests for sort()/ksort() on
numeric-string edge cases
Fixes: phpGH-20262
374a660 to
2ff1700
Compare
|
@Girgias yes, I clearly got a bit carried away haha. I decided to reimplement and force push a clean commit. Sorry for the mess I made of this PR. I have a bag full of optimizations we can save for a follow-up PR. One worth calling out would be to split |
| static zend_always_inline int zend_compare_long_to_string_ex(zend_long lval, zend_string *str, bool transitive) | ||
| { | ||
| zend_long str_lval; | ||
| double str_dval; | ||
| uint8_t type = is_numeric_string(ZSTR_VAL(str), ZSTR_LEN(str), &str_lval, &str_dval, 0); | ||
|
|
||
| if (type == IS_LONG) { | ||
| return ZEND_THREEWAY_COMPARE(lval, str_lval); | ||
| } | ||
|
|
||
| if (type == IS_DOUBLE) { | ||
| return ZEND_THREEWAY_COMPARE((double) lval, str_dval); | ||
| } | ||
|
|
||
| if (transitive) { | ||
| if (ZSTR_LEN(str) == 0) { | ||
| return 1; | ||
| } | ||
| return -1; | ||
| } | ||
|
|
||
| zend_string *lval_as_str = zend_long_to_str(lval); | ||
| int cmp_result = zend_binary_strcmp( | ||
| ZSTR_VAL(lval_as_str), ZSTR_LEN(lval_as_str), ZSTR_VAL(str), ZSTR_LEN(str)); | ||
| zend_string_release(lval_as_str); | ||
| return ZEND_NORMALIZE_BOOL(cmp_result); | ||
| } | ||
|
|
||
| static zend_always_inline int zend_compare_double_to_string_ex(double dval, zend_string *str, bool transitive) | ||
| { | ||
| zend_long str_lval; | ||
| double str_dval; | ||
| uint8_t type = is_numeric_string(ZSTR_VAL(str), ZSTR_LEN(str), &str_lval, &str_dval, 0); | ||
|
|
||
| ZEND_ASSERT(!zend_isnan(dval)); | ||
|
|
||
| if (type == IS_LONG) { | ||
| str_dval = (double) str_lval; | ||
| return ZEND_THREEWAY_COMPARE(dval, str_dval); | ||
| } | ||
|
|
||
| if (type == IS_DOUBLE) { | ||
| return ZEND_THREEWAY_COMPARE(dval, str_dval); | ||
| } | ||
|
|
||
| if (transitive) { | ||
| if (ZSTR_LEN(str) == 0) { | ||
| return 1; | ||
| } | ||
| return -1; | ||
| } | ||
|
|
||
| zend_string *dval_as_str = zend_double_to_str(dval); | ||
| int cmp_result = zend_binary_strcmp( | ||
| ZSTR_VAL(dval_as_str), ZSTR_LEN(dval_as_str), ZSTR_VAL(str), ZSTR_LEN(str)); | ||
| zend_string_release(dval_as_str); | ||
| return ZEND_NORMALIZE_BOOL(cmp_result); | ||
| } | ||
|
|
||
| static zend_always_inline int zendi_smart_strcmp_ex(zend_string *s1, zend_string *s2, bool transitive) | ||
| { | ||
| uint8_t ret1, ret2; | ||
| int oflow1, oflow2; | ||
| zend_long lval1 = 0, lval2 = 0; | ||
| double dval1 = 0.0, dval2 = 0.0; | ||
|
|
||
| if (UNEXPECTED(ZSTR_LEN(s1) == 0 || ZSTR_LEN(s2) == 0)) { | ||
| if (transitive) { | ||
| if (ZSTR_LEN(s1) == 0 && ZSTR_LEN(s2) == 0) { | ||
| return 0; | ||
| } | ||
| return ZSTR_LEN(s1) == 0 ? -1 : 1; | ||
| } | ||
| } | ||
|
|
||
| ret1 = is_numeric_string_ex(ZSTR_VAL(s1), ZSTR_LEN(s1), &lval1, &dval1, false, &oflow1, NULL); | ||
| ret2 = is_numeric_string_ex(ZSTR_VAL(s2), ZSTR_LEN(s2), &lval2, &dval2, false, &oflow2, NULL); | ||
|
|
||
| if (ret1 && ret2) { | ||
| #if ZEND_ULONG_MAX == 0xFFFFFFFF | ||
| if (oflow1 != 0 && oflow1 == oflow2 && dval1 - dval2 == 0. && | ||
| ((oflow1 == 1 && dval1 > 9007199254740991. /*0x1FFFFFFFFFFFFF*/) | ||
| || (oflow1 == -1 && dval1 < -9007199254740991.))) { | ||
| #else | ||
| if (oflow1 != 0 && oflow1 == oflow2 && dval1 - dval2 == 0.) { | ||
| #endif | ||
| /* both values are integers overflowed to the same side, and the | ||
| * double comparison may have resulted in crucial accuracy lost */ | ||
| goto string_cmp; | ||
| } | ||
| if ((ret1 == IS_DOUBLE) || (ret2 == IS_DOUBLE)) { | ||
| if (ret1 != IS_DOUBLE) { | ||
| if (oflow2) { | ||
| /* 2nd operand is integer > LONG_MAX (oflow2==1) or < LONG_MIN (-1) */ | ||
| return -1 * oflow2; | ||
| } | ||
| dval1 = (double) lval1; | ||
| } else if (ret2 != IS_DOUBLE) { | ||
| if (oflow1) { | ||
| return oflow1; | ||
| } | ||
| dval2 = (double) lval2; | ||
| } else if (dval1 == dval2 && !zend_finite(dval1)) { | ||
| /* Both values overflowed and have the same sign, | ||
| * so a numeric comparison would be inaccurate */ | ||
| goto string_cmp; | ||
| } | ||
| dval1 = dval1 - dval2; | ||
| return ZEND_NORMALIZE_BOOL(dval1); | ||
| } else { /* they both have to be long's */ | ||
| return lval1 > lval2 ? 1 : (lval1 < lval2 ? -1 : 0); | ||
| } | ||
| } else if (ret1) { | ||
| if (transitive) { | ||
| return -1; | ||
| } | ||
| goto string_cmp; | ||
| } else if (ret2) { | ||
| if (transitive) { | ||
| return 1; | ||
| } | ||
| goto string_cmp; | ||
| } | ||
|
|
||
| int strcmp_ret; | ||
| string_cmp: | ||
| if (transitive) { | ||
| return zend_compare_non_numeric_strings(s1, s2); | ||
| } | ||
|
|
||
| strcmp_ret = zend_binary_strcmp(ZSTR_VAL(s1), ZSTR_LEN(s1), ZSTR_VAL(s2), ZSTR_LEN(s2)); | ||
| return ZEND_NORMALIZE_BOOL(strcmp_ret); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these in the header?
| uintptr_t lhs_ptr = (uintptr_t) Z_OBJ_P(lhs); | ||
| uintptr_t rhs_ptr = (uintptr_t) Z_OBJ_P(rhs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this will result in stable results, @iluuu1994 or @arnaud-lb might have a better idea on how to implement an enum comparison.
| static int php_array_hash_compare_transitive(zval *zv1, zval *zv2) /* {{{ */ | ||
| { | ||
| return php_array_compare_transitive(zv1, zv2); | ||
| } | ||
| /* }}} */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you benchmarking? Because I don't really see why it would regress?
| /* Mirrors zend_std_compare_objects(), but recurses via php_array_compare_transitive() | ||
| * so nested properties obey SORT_REGULAR's transitive ordering. */ | ||
| static int php_array_compare_transitive_objects(zval *o1, zval *o2) /* {{{ */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I think might make more sense is to create a zend_std_compare_objects_ex() function that takes a function pointer for the prop table comparison if this is identical.
As hopefully the compiler will inline the behaviour properly in zend_std_compare_objects() so that it should be equivalent. As for quite a bit I was trying to understand what the point of this is.
| if (UNEXPECTED(php_array_is_enum_zval(op1) || php_array_is_enum_zval(op2))) { | ||
| return php_array_compare_enum_zvals(op1, op2); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you handling enums early here?
| ZVAL_DEREF(op1); | ||
| ZVAL_DEREF(op2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely if you deref you want to deref before the int/str comparisons?
| PHP_FUNCTION(arsort) | ||
| { | ||
| php_sort(INTERNAL_FUNCTION_PARAM_PASSTHRU, php_get_data_reverse_compare_func, false); | ||
| HashTable *array; | ||
| zend_long sort_type = PHP_SORT_REGULAR; | ||
|
|
||
| ZEND_PARSE_PARAMETERS_START(1, 2) | ||
| Z_PARAM_ARRAY_HT_EX(array, 0, 1) | ||
| Z_PARAM_OPTIONAL | ||
| Z_PARAM_LONG(sort_type) | ||
| ZEND_PARSE_PARAMETERS_END(); | ||
|
|
||
| if ((sort_type & ~PHP_SORT_FLAG_CASE) == PHP_SORT_REGULAR) { | ||
| php_array_sort_regular(array, false, true, false); | ||
| RETURN_TRUE; | ||
| } | ||
|
|
||
| php_array_apply_sort(array, sort_type, php_get_data_reverse_compare_func, false); | ||
| RETURN_TRUE; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert these changes, the point of this was to simplify the implementation by having a unique one.
So please change the various php_get_*_compare_func().
Summary
Fixes #20262 by making SORT_REGULAR fall back to a fully transitive comparator whenever loose comparison semantics would otherwise be non-transitive (numeric strings vs ints/floats, enums, nested arrays/objects). This keeps duplicates grouped so array_unique() and the sort family behave consistently.
Highlights