-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fix #63217: Constant numeric strings become integers when used as ArrayAccess offset #3351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// zval_ptr_dtor_nogc(val); | ||
// ZVAL_LONG(val, index); | ||
// } | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor note, as per our coding standards this should be a multi-line C style comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that's just a temporary removal, didn't mean to commit that. I'll remove the commented code entirely. Thanks. 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 7aceef6
If we can technically fix the behaviour, even if it's not in an optimal way, shouldn't that be a solid place to start for 7.3.0? I doubt that the current patch will affect performance enough to be a deal-breaker. It's an optimisation that affects a very rare case, as I have never come across PHP code where an array is accessed using a numeric string literal like Personal opinion is that this should be merged so that we can fix the bug at hand, and worry about the tiny performance uncertainty later if there's no clear solution for that right now. Correct behaviour should come before performance every time. |
Zend/zend_vm_def.h
Outdated
if (ZEND_HANDLE_NUMERIC(str, hval)) { | ||
ZEND_VM_C_GOTO(num_index); | ||
} | ||
if (ZEND_HANDLE_NUMERIC(str, hval)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ADD_ARRAY_ELEMENT doesn't need to be touched (you also left the handle_numeric_op in the compiler).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will revert changes to ADD_ARRAY_ELEMENT
in 135449d, but zend_handle_numeric_op
is still called in zend_compile_array
. Should it not be? Andrea only removed one of the calls in the original PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's right. I was just noting the inconsistency between the compiler and the runtime. The way you reserved it is correct.
// FIXME: numeric string | ||
tmp |= MAY_BE_ARRAY_KEY_LONG; | ||
} | ||
tmp |= (MAY_BE_ARRAY_KEY_LONG | MAY_BE_ARRAY_KEY_STRING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need to be changed to replace the dim_op_type != IS_CONST
check with a ZEND_HANDLE_NUMERIC for CONST operands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to extract the whole key type checking into a separate function, as this code is repeated a couple of times.
You'll need to pass in the znode_op* for the dim next to dim_op_type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if dim_op_type == IS_CONST
, we should do a ZEND_HANDLE_NUMERIC
to add MAY_BE_ARRAY_KEY_LONG
?
I'm really just following your lead here but I'll try to make sense of this. If the dimension operand is a constant (I'm guessing a literal string in our case), we should check if it's numeric.. but why? What is tmp
here and who are we signalling that tmp
MAY_BE_ARRAY_KEY_LONG
?
Also, what's a znode_op
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dim_op_type != IS_CONST: You always add MAY_BE_ARRAY_KEY_LONG.
If dim_op_type == IS_CONST: You do a ZEND_HANDLE_NUMERIC and only add MAY_BE_KEY_LONG if it's numeric.
To avoid the need to pass in the full kitchen sink of parameters to this function, you might want to pass opline->op2_type == IS_CONST ? CRT_CONSTANT_EX(op_array, opline, opline->op2, ssa->rt_constants) : NULL
from the caller, rather than just opline->op2
(which is the znode_op
I was referring to).
zval_ptr_dtor_nogc(val); | ||
ZVAL_LONG(val, index); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code can be left for ADD_ARRAY_ELEMENT, and the rest can simply be dropped from the switch. They'll fall through to the default case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case that we're concerned about are not accesses like |
Of course, that makes sense. In most cases these would fail at the first character anyway. I'll do a proper micro-benchmark when the patch is more complete. I'll use this one that @hikari-no-yume wrote in #1649 (comment) Thanks for your help again on this one @nikic |
e0f09e7
to
c30f56c
Compare
@@ -2496,7 +2515,8 @@ static int zend_update_type_info(const zend_op_array *op_array, | |||
|
|||
if (opline->extended_value == ZEND_ASSIGN_DIM) { | |||
if (opline->op1_type == IS_CV) { | |||
orig = assign_dim_result_type(orig, OP2_INFO(), tmp, opline->op2_type); | |||
orig = assign_dim_result_type(orig, OP2_INFO(), tmp, opline->op2_type, | |||
CRT_CONSTANT_EX(op_array, opline, opline->op2, ssa->rt_constants)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These CRT_CONSTANT_EX accesses should be guarded by opline->op2_type == IS_CONST
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this benchmark: https://gist.github.com/nikic/f80155ecc86fd4e48be50af9c35f64be I get the following results:
The good news is that it seems to have negligible impact on the common case of non-numeric constant keys. The bad news is that cases like From my side this is good to go. @dstogov What do you think? |
@nikic so that case is slow because it has to do the string-to-long conversion for every loop at runtime? Could we not do that conversion once, if we know that it's an array (can we know?) |
if (ZEND_HANDLE_NUMERIC(Z_STR_P(dim_op), hval)) { | ||
tmp |= MAY_BE_ARRAY_KEY_LONG; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to write this as:
if (dim_type & MAY_BE_STRING) {
if (dim_op_type != IS_CONST) {
tmp |= MAY_BE_ARRAY_KEY_STRING | MAY_BE_ARRAY_KEY_LONG;
} else {
zend_ulong hval;
if (ZEND_HANDLE_NUMERIC(Z_STR_P(dim_op), hval)) {
tmp |= MAY_BE_ARRAY_KEY_LONG;
} else {
tmp |= MAY_BE_ARRAY_KEY_STRING;
}
}
}
Otherwise we're adding MAY_BE_ARRAY_KEY_STRING even if we know numeric conversion will happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. 👍 Done in 45e6b24
@rtheunissen Yes, we could do that for the cases where we can determine that it is an array. This would take an extra check in https://github.com/php/php-src/blob/master/ext/opcache/Optimizer/dfa_pass.c#L866, which for all relevant opcodes checks if op1 has only MAY_BE_ARRAY type and op2 is a numeric string literal, and converts it in that case. I don't think this is strictly necessary to land this though, as I don't think it's a common case. |
45e6b24
to
0fb9aa7
Compare
@nikic @rtheunissen I don't object. |
0fb9aa7
to
ff078f8
Compare
Merged as 30156d5. Thanks! |
@nikic I got an idea, how this may be fixed without performance degradation. We may keep two constant literals (one of them should be converted to to number if necessary). This is similar to ZEND_GET_FUNC_BY_NAME that receives original and lowercase name. What do you think? |
ideally, it would be great to have second literal only for numeric string indexes (this will probably require some new flag). |
@dstogov I think this case is too rare to make an extra literal worthwhile. |
@nikic $mysql_result["filed_name"] is quite common code. And yes, extra literal should be used only for numeric string constants. |
@dstogov Yes, that is common code, but that case is not affected. Or rather, it will only perform one extra branch to check that 'f' > '9'. Based on my microbenchmark, this did not produce a measurable difference. The case that has become slower is |
Potentially crackpot idea: what if |
@nikic you don't see performance difference, because on micro-benchmarks all the code and data are already in CPU cache and the extra branch is well predicted. Of course, the degradation from this patch is not significant, but if we can fix the problem in a better way, we should try. @hikari-no-yume changing hash_value function may cause much more significant performance difference |
@dstogov That's true. Adding an opline flag for this will still leave the branch though, unless we want to specialize it away. Do you have a numbers for the impact of this change on something like WP? |
$ ZEND_DONT_UNLOAD_MODULES=1 valgrind --tool=callgrind --separate-recs=1 --dump-instr=yes --cache-sim=no sapi/cgi/php-cgi -T100 /var/www/html/bench/wordpress/index.php > /dev/null before: 1,987M I'll try to do a better fix myself, later today or tomorrow, May be I'll able to avoid extra flag in opline->execute_data, by using extra space in first zval instead. And we'll have to check it only in case of objects (arrays will use the old specialization logic). |
This is a refresh of #2607, from which I'm quoting: