Skip to content

Add support for string keys in array unpacking #6584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

nikic
Copy link
Member

@nikic nikic commented Jan 7, 2021

@nikic nikic added the RFC label Jan 7, 2021
Copy link
Contributor

@TysonAndre TysonAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the question about checking for an exception before calling Traversable->valid()
EDIT: and converting integer strings (from Traversable->key()) to integers

Comment on lines 2584 to 2585
zend_hash_update(result_ht, Z_STR(key), val);
zval_ptr_dtor_str(&key);
Copy link
Contributor

@TysonAndre TysonAndre Jan 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should this handle [...new ArrayObject(['key' => ThrowsOnDestruct]), ...new CustomTraversable()] to avoid an call to CustomTraversable::valid() in the unlikely case that an exception was thrown when the overwritten value's destructor was called?
    (i.e. before iter->funcs->valid(iter) == SUCCESS is called again)

    (e.g. in case CustomTraversable modifies state, performs db operations or makes network service calls)

  2. nit: the original works correctly, but I wonder if the below snippet would be a tiny bit more efficient, because the C optimizer can't infer that Z_STR(key) doesn't change after the call to zend_hash_update(), which may call destructors which modify zval, etc.

Same for the other calls to zval_ptr_dtor_str() in this PR - I mostly see zval_ptr_dtor_str in this file places that the compiler can infer don't have side effects, such as zend_fast_equal_strings

Suggested change
zend_hash_update(result_ht, Z_STR(key), val);
zval_ptr_dtor_str(&key);
zend_string *key_str = Z_STR(key);
zend_hash_update(result_ht, key_str, val);
zend_string_release_ex(key_str, 0);
if (UNEXPECTED(EG(exception) != NULL)) {
zval_ptr_dtor_nogc(val);
break;
}

Copy link
Contributor

@TysonAndre TysonAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test of the expected behavior for keys that are stringified integers, and clarify how this edge case is handled in the RFC?
Personally, I'd prefer using zend_symtable_update instead of zend_hash_update for Traversables (i.e. convert strings to integers like regular array insertion does)
EDIT: I mean using zend_hash_next_index_insert for any string key that would normally get coerced to an integer

function my_generator() { yield '0' => 'test'; }
$x = [...my_generator()];
var_dump($x); // should this have an integer key 0 or a string key '0'?

To compare, iterator_to_array() currently converts strings to integers with preserve_keys=true

php > function my_generator() { yield '5' => 'test'; }
php > var_export(iterator_to_array(my_generator(), true));
array (
  5 => 'test',
)
php > foreach (my_generator() as $key => $_) { var_dump($key); }
string(1) "5"

EDIT: there's also the question of whether strings that get converted to integers should be renumbered the same way integers are

Comment on lines 2588 to 2589
zend_cannot_add_element();
zval_ptr_dtor_nogc(val);
Copy link
Contributor

@TysonAndre TysonAndre Jan 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NYC: why does the original code have no break; here if it's already throwing Cannot add element to the array as the next element is already occupied

Not really important - Previously this branch may have been unreachable in practice due to memory limits, but with this change it's potentially reachable (if strings converted to integers aren't renumbered - I prefer renumbering)

[...(function() { yield PHP_INT_MAX; })(), new CustomTraversable()]

EDIT: that's already possible with [PHP_INT_MAX => true, ...new CustomTraversable()] and unnecessarily calling Iterator->next() after throwing

@nikic
Copy link
Member Author

nikic commented Jan 10, 2021

Could you add a test of the expected behavior for keys that are stringified integers, and clarify how this edge case is handled in the RFC?
Personally, I'd prefer using zend_symtable_update instead of zend_hash_update for Traversables (i.e. convert strings to integers like regular array insertion does)

function my_generator() { yield '0' => 'test'; }
$x = [...my_generator()];
var_dump($x); // should this have an integer key 0 or a string key '0'?

Nice catch! This definitely can't have a string key '0', because that would violate the array key canonicalization. However, it's not immediately obvious whether this should result in a key 0 being set, or it being treated as an original integer key (and appended with renumbering). I think the latter behavior is probably correct...

@TysonAndre
Copy link
Contributor

I think the latter behavior is probably correct...

With the latter behavior, [...$a, ...$b] would behave similarly to [...iterator_to_array($a, true), ...iterator_to_array($b, true)]

With the former behavior, gaps in numbers added by ... or starting to allow overwriting existing numeric keys in ... would be unexpected

@TysonAndre
Copy link
Contributor

TysonAndre commented Jan 10, 2021

The remaining changes to opcache look straightforward

For ext/opcache/Optimizer/zend_inference.c , I think this should replace MAY_BE_ARRAY_KEY_LONG with MAY_BE_ARRAY_KEY_ANY or (t1&MAY_BE_ARRAY_KEY_ANY)
I believe the current code would lead to improper JIT optimizations and dead code elimination for key types, and is straightforward to fix - may be worth an extra unit test.

// original code
		case ZEND_ADD_ARRAY_UNPACK:
			tmp = ssa_var_info[ssa_op->result_use].type;
			ZEND_ASSERT(tmp & MAY_BE_ARRAY);
			/* Ignore string keys as they will throw. */
			if (t1 & MAY_BE_ARRAY_KEY_LONG) {
				tmp |= MAY_BE_ARRAY_KEY_LONG | (t1 & (MAY_BE_ARRAY_OF_ANY|MAY_BE_ARRAY_OF_REF));
			}
			if (t1 & MAY_BE_OBJECT) {
				tmp |= MAY_BE_ARRAY_KEY_LONG | MAY_BE_ARRAY_OF_ANY;
			}
			UPDATE_SSA_TYPE(tmp, ssa_op->result_def);
			break;

ext/opcache/Optimizer/sccp.c seems like the ct_eval_add_array_unpack can also be changed to permit string key (also with zend_symtable_update() to handle stringified integers) (using zend_hash_update to be consistent with Zend/zend_vm_def.h for array types)

// original
static inline int ct_eval_add_array_unpack(zval *result, zval *array) {
	zend_string *key;
	zval *value;
	if (Z_TYPE_P(array) != IS_ARRAY) {
		return FAILURE;
	}

	SEPARATE_ARRAY(result);
	ZEND_HASH_FOREACH_STR_KEY_VAL(Z_ARRVAL_P(array), key, value) {
		if (key) {
			return FAILURE;
		}
		value = zend_hash_next_index_insert(Z_ARR_P(result), value);
		if (!value) {
			return FAILURE;
		}
		Z_TRY_ADDREF_P(value);
	} ZEND_HASH_FOREACH_END();

@nikic nikic force-pushed the array-splat-string-keys branch from 77d30f4 to 7f60b3a Compare January 11, 2021 15:34
@nikic
Copy link
Member Author

nikic commented Jan 11, 2021

I've added handling for integral string keys and updated the RFC accordingly.

Possibly we should also use the same handling for argument unpacking (i.e. treat it as a positional argument rather than a certainly not existing argument name.)

@php-pulls php-pulls closed this in 27cd7a1 Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants