Skip to content

Conversation

rtheunissen
Copy link
Contributor

This is a refresh of #2607, from which I'm quoting:

See:

This PR is an extension and re-opening of #1649 in an attempt to fix this as part of 7.2. I've rebased on master and updated some tests that are failing. @nikic mentioned in #1649 that further changes are necessary but I'm not sure what they are or why they are necessary.

This patch does not affect the array or string API and behaviour at all - only ArrayAccess.

// zval_ptr_dtor_nogc(val);
// ZVAL_LONG(val, index);
// }
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor note, as per our coding standards this should be a multi-line C style comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that's just a temporary removal, didn't mean to commit that. I'll remove the commented code entirely. Thanks. 😊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7aceef6

@rtheunissen
Copy link
Contributor Author

If we can technically fix the behaviour, even if it's not in an optimal way, shouldn't that be a solid place to start for 7.3.0? I doubt that the current patch will affect performance enough to be a deal-breaker. It's an optimisation that affects a very rare case, as I have never come across PHP code where an array is accessed using a numeric string literal like $array['123'].

Personal opinion is that this should be merged so that we can fix the bug at hand, and worry about the tiny performance uncertainty later if there's no clear solution for that right now.

Correct behaviour should come before performance every time.

if (ZEND_HANDLE_NUMERIC(str, hval)) {
ZEND_VM_C_GOTO(num_index);
}
if (ZEND_HANDLE_NUMERIC(str, hval)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ADD_ARRAY_ELEMENT doesn't need to be touched (you also left the handle_numeric_op in the compiler).

Copy link
Contributor Author

@rtheunissen rtheunissen Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will revert changes to ADD_ARRAY_ELEMENT in 135449d, but zend_handle_numeric_op is still called in zend_compile_array. Should it not be? Andrea only removed one of the calls in the original PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's right. I was just noting the inconsistency between the compiler and the runtime. The way you reserved it is correct.

// FIXME: numeric string
tmp |= MAY_BE_ARRAY_KEY_LONG;
}
tmp |= (MAY_BE_ARRAY_KEY_LONG | MAY_BE_ARRAY_KEY_STRING);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be changed to replace the dim_op_type != IS_CONST check with a ZEND_HANDLE_NUMERIC for CONST operands.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to extract the whole key type checking into a separate function, as this code is repeated a couple of times.

You'll need to pass in the znode_op* for the dim next to dim_op_type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if dim_op_type == IS_CONST, we should do a ZEND_HANDLE_NUMERIC to add MAY_BE_ARRAY_KEY_LONG?

I'm really just following your lead here but I'll try to make sense of this. If the dimension operand is a constant (I'm guessing a literal string in our case), we should check if it's numeric.. but why? What is tmp here and who are we signalling that tmp MAY_BE_ARRAY_KEY_LONG?

Also, what's a znode_op?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dim_op_type != IS_CONST: You always add MAY_BE_ARRAY_KEY_LONG.
If dim_op_type == IS_CONST: You do a ZEND_HANDLE_NUMERIC and only add MAY_BE_KEY_LONG if it's numeric.

To avoid the need to pass in the full kitchen sink of parameters to this function, you might want to pass opline->op2_type == IS_CONST ? CRT_CONSTANT_EX(op_array, opline, opline->op2, ssa->rt_constants) : NULL from the caller, rather than just opline->op2 (which is the znode_op I was referring to).

zval_ptr_dtor_nogc(val);
ZVAL_LONG(val, index);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code can be left for ADD_ARRAY_ELEMENT, and the rest can simply be dropped from the switch. They'll fall through to the default case.

Copy link
Contributor Author

@rtheunissen rtheunissen Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6a4ae15 and ae6ba00

@nikic
Copy link
Member

nikic commented Jun 29, 2018

I doubt that the current patch will affect performance enough to be a deal-breaker. It's an optimisation that affects a very rare case, as I have never come across PHP code where an array is accessed using a numeric string literal like $array['123'].

The case that we're concerned about are not accesses like $array['123'], but rather $array['foo']. These will now have to go through the numeric string checking. However, in most cases that should just be a single branch, so this should not be too problematic (a micro benchmark is needed though, to verify assumptions).

@rtheunissen
Copy link
Contributor Author

rtheunissen commented Jun 29, 2018

The case that we're concerned about are not accesses like $array['123'], but rather $array['foo']. These will now have to go through the numeric string checking. However, in most cases that should just be a single branch, so this should not be too problematic (a micro benchmark is needed though, to verify assumptions).

Of course, that makes sense. In most cases these would fail at the first character anyway. I'll do a proper micro-benchmark when the patch is more complete. I'll use this one that @hikari-no-yume wrote in #1649 (comment)

Thanks for your help again on this one @nikic

@rtheunissen rtheunissen force-pushed the rt-fix-bug-63217 branch 3 times, most recently from e0f09e7 to c30f56c Compare June 30, 2018 18:06
@@ -2496,7 +2515,8 @@ static int zend_update_type_info(const zend_op_array *op_array,

if (opline->extended_value == ZEND_ASSIGN_DIM) {
if (opline->op1_type == IS_CV) {
orig = assign_dim_result_type(orig, OP2_INFO(), tmp, opline->op2_type);
orig = assign_dim_result_type(orig, OP2_INFO(), tmp, opline->op2_type,
CRT_CONSTANT_EX(op_array, opline, opline->op2, ssa->rt_constants));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These CRT_CONSTANT_EX accesses should be guarded by opline->op2_type == IS_CONST.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 79a14f5 and b0074a6

@nikic
Copy link
Member

nikic commented Jun 30, 2018

For this benchmark: https://gist.github.com/nikic/f80155ecc86fd4e48be50af9c35f64be I get the following results:

Test Master This PR
$array["non_numeric"] 3.3779599666595s 3.3726971149445s
$array["!!!!!!!!!!!"] 3.3831338882446s 3.3914439678192s
$array["123456789"] 2.5053958892822s 4.4804201126099s
$array["123456789123456789123456789123456789"] 3.9566950798035s 4.3563148975372s

The good news is that it seems to have negligible impact on the common case of non-numeric constant keys. The bad news is that cases like $array["123"] become a good bit slower, here by a factor of 1.8. However, as this should be a rather unusual case, I think that's fine.

From my side this is good to go. @dstogov What do you think?

@rtheunissen
Copy link
Contributor Author

@nikic so that case is slow because it has to do the string-to-long conversion for every loop at runtime? Could we not do that conversion once, if we know that it's an array (can we know?)

if (ZEND_HANDLE_NUMERIC(Z_STR_P(dim_op), hval)) {
tmp |= MAY_BE_ARRAY_KEY_LONG;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to write this as:

		if (dim_type & MAY_BE_STRING) {
			if (dim_op_type != IS_CONST) {
				tmp |= MAY_BE_ARRAY_KEY_STRING | MAY_BE_ARRAY_KEY_LONG;
			} else {
				zend_ulong hval;
				if (ZEND_HANDLE_NUMERIC(Z_STR_P(dim_op), hval)) {
					tmp |= MAY_BE_ARRAY_KEY_LONG;
				} else {
					tmp |= MAY_BE_ARRAY_KEY_STRING;
				}
			}
		}

Otherwise we're adding MAY_BE_ARRAY_KEY_STRING even if we know numeric conversion will happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. 👍 Done in 45e6b24

@nikic
Copy link
Member

nikic commented Jun 30, 2018

@rtheunissen Yes, we could do that for the cases where we can determine that it is an array. This would take an extra check in https://github.com/php/php-src/blob/master/ext/opcache/Optimizer/dfa_pass.c#L866, which for all relevant opcodes checks if op1 has only MAY_BE_ARRAY type and op2 is a numeric string literal, and converts it in that case.

I don't think this is strictly necessary to land this though, as I don't think it's a common case.

@dstogov
Copy link
Member

dstogov commented Jul 2, 2018

@nikic @rtheunissen I don't object.

@nikic nikic added the Bug label Jul 2, 2018
@nikic
Copy link
Member

nikic commented Jul 2, 2018

Merged as 30156d5. Thanks!

@nikic nikic closed this Jul 2, 2018
@dstogov
Copy link
Member

dstogov commented Jul 2, 2018

@nikic I got an idea, how this may be fixed without performance degradation. We may keep two constant literals (one of them should be converted to to number if necessary). This is similar to ZEND_GET_FUNC_BY_NAME that receives original and lowercase name. What do you think?

@dstogov
Copy link
Member

dstogov commented Jul 2, 2018

ideally, it would be great to have second literal only for numeric string indexes (this will probably require some new flag).

@nikic
Copy link
Member

nikic commented Jul 2, 2018

@dstogov I think this case is too rare to make an extra literal worthwhile.

@dstogov
Copy link
Member

dstogov commented Jul 2, 2018

@nikic $mysql_result["filed_name"] is quite common code. And yes, extra literal should be used only for numeric string constants.

@nikic
Copy link
Member

nikic commented Jul 2, 2018

@dstogov Yes, that is common code, but that case is not affected. Or rather, it will only perform one extra branch to check that 'f' > '9'. Based on my microbenchmark, this did not produce a measurable difference.

The case that has become slower is $array["123"], but this will usually be written as $array[123] instead, which is why I'm not concerned much.

@hikari-no-yume
Copy link
Contributor

Potentially crackpot idea: what if "123" had a hash value of 123?

@dstogov
Copy link
Member

dstogov commented Jul 2, 2018

@nikic you don't see performance difference, because on micro-benchmarks all the code and data are already in CPU cache and the extra branch is well predicted. Of course, the degradation from this patch is not significant, but if we can fix the problem in a better way, we should try.

@hikari-no-yume changing hash_value function may cause much more significant performance difference

@nikic
Copy link
Member

nikic commented Jul 2, 2018

@dstogov That's true. Adding an opline flag for this will still leave the branch though, unless we want to specialize it away. Do you have a numbers for the impact of this change on something like WP?

@dstogov
Copy link
Member

dstogov commented Jul 2, 2018

@nikic

$ ZEND_DONT_UNLOAD_MODULES=1 valgrind --tool=callgrind --separate-recs=1 --dump-instr=yes --cache-sim=no sapi/cgi/php-cgi -T100 /var/www/html/bench/wordpress/index.php > /dev/null

before: 1,987M
after: 1,992M

I'll try to do a better fix myself, later today or tomorrow, May be I'll able to avoid extra flag in opline->execute_data, by using extra space in first zval instead. And we'll have to check it only in case of objects (arrays will use the old specialization logic).

@rtheunissen rtheunissen deleted the rt-fix-bug-63217 branch July 2, 2018 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants