Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent patch to SplFixedArray causes array_walk + json encode to hang, circular var_export during gc to run out of memory #8079

Closed
TysonAndre opened this issue Feb 11, 2022 · 13 comments

Comments

@TysonAndre
Copy link
Contributor

TysonAndre commented Feb 11, 2022

Description

I believe that a lot of code in php-src/pecls relies on the fact that the pointer of Z_OBJPROP_P doesn't change, for non-empty properties tables. There's may be more examples than just array_walk but I'm still trying to create one for var_export

The following code:

<?php
call_user_func(function () {
    $x = new SplFixedArray(2);
    $x[0] = $x;
    $x[1] = strtoupper('foo');
    echo "Calling array_walk\n";
    array_walk($x, function ($value, $index) use($x) {
        printf("Value=%s index=%s\n", json_encode($value), json_encode($index));
        echo "Calling json_encode\n";
        echo json_encode($x, JSON_PARTIAL_OUTPUT_ON_ERROR), "\n";

    });
});

Resulted in this output:

// PHP hangs in an infinite loop.
// Adding debug statements to spl_fixedarray_object_get_properties indicates it is repeatedly called and cloning the array in 52ae6417ca3e62eff796bf3e3b5662fe6a9ee085

But I expected this output instead:

Calling array_walk
Value= index=0
Calling json_encode
[null,"FOO"]
Value="FOO" index=1
Calling json_encode
[null,"FOO"]

Relevant parts of the code - zend_hash_iterator_pos probably resets to the start if Z_OBJPROP is a different ht from when starting.

// Zend/zend_hash.c
ZEND_API HashPosition ZEND_FASTCALL zend_hash_iterator_pos(uint32_t idx, HashTable *ht)
{
	HashTableIterator *iter = EG(ht_iterators) + idx;

	ZEND_ASSERT(idx != (uint32_t)-1);
	if (UNEXPECTED(iter->ht != ht)) {
// ext/standard/array.c
static int php_array_walk(zval *array, zval *userdata, int recursive) /* {{{ */
// ...
		} else if (Z_TYPE_P(array) == IS_OBJECT) {
			target_hash = Z_OBJPROP_P(array);
			pos = zend_hash_iterator_pos(ht_iter, target_hash);

PHP Version

PHP 8.0.17-dev(52ae641 or newer)

Operating System

No response

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Feb 11, 2022

As another example of Z_OBJPROP_P/get_properties being assumed to return the same pointer every time in json_encode:

The following example also has a stack overflow in the latest commit, repeatedly outputting In jsonSerialize. Debug statements indicate the hash table is repeatedly duplicated and spl_fixedarray_object_get_properties is repeatedly called.

<?php

class VarDumper implements JsonSerializable {
    public function __construct(private $parent) {
    }
    public function jsonSerialize() {
        echo "In jsonSerialize\n";
        var_dump($this->parent);
        return ['mutated'];
    }
}
call_user_func(function () {
    $x = new SplFixedArray(3);
    $x[0] = new VarDumper($x);
    $x[1] = $x;
    $x[2] = new stdClass();
    echo json_encode($x, JSON_PARTIAL_OUTPUT_ON_ERROR);
});

@dstogov
Copy link
Member

dstogov commented Feb 11, 2022

This works fine in PHP-8.1 and master. I don't know the reason :)

This didn't work in PHP-8.0 before the patch, because of assertion - Zend/zend_hash.c:987: _zend_hash_index_add_or_update_i: Assertion `(zend_gc_refcount(&(ht)->gc) == 1) || ((ht)->u.flags & (1<<6))' failed
Now PHP-8.0 goes into infinity recursion.

@TysonAndre
Copy link
Contributor Author

This didn't work in PHP-8.0 before the patch, because of assertion - Zend/zend_hash.c:987: _zend_hash_index_add_or_update_i: Assertion `(zend_gc_refcount(&(ht)->gc) == 1) || ((ht)->u.flags & (1<<6))' failed

That was usually setting the value of the array field to itself, unless you're trying really hard to deliberately do that, so the debug assertion error seemed less severe than infinite recursion.

I wonder if there were optimizations to avoid building properties in some cases, e.g. when there were no dynamic properties of VarDumper? I thought I saw some but am likely mistaken

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Feb 11, 2022

It's in f9f8c1c for your change Optimized object encoding without rebulding properties HashTable which is in 8.1 but not 8.0. It mitigates it for my example but doesn't fix the root cause (I think? Still trying to reproduce it)

@dstogov
Copy link
Member

dstogov commented Feb 11, 2022

The debug assertion tells that something goes completely wrong. It's prohibited to update HashTables referenced from several places. Actually each invocation of get_properties() added (or updated) numeric elements in the standard properties table.

We have infinity recursion because of recursive data and incompatibility of recursion protection mechanism with SplFixedArray.

@TysonAndre
Copy link
Contributor Author

Thanks, I'll continue looking at this tomorrow to see if I have a better test case for the latest patches in 8.2.

We have infinity recursion because of recursive data and incompatibility of recursion protection mechanism with SplFixedArray.

I'd suggested updating the infinite recursion to first check for infinite recursion on the object before checking for infinite recursion on the properties table because of that.

I'm not very concerned about SplFixedArray and the original bug specifically due to use cases being rare, these are mostly to illustrate the point. but I do want to know how it'd be solved in the general case if var_export is to be readable for new or PECL data structures (instead of just returning an empty array making it look like an empty data structures, like SplDoublyLinkedList) - avoiding mutating obj->properties after creating it (except maybe for clearing) seems like it may be safer to me, and also overriding get_properties_for for debug/json/serialize/etc.


I'm still looking into whether this is only mitigated because performance improvements in 8.1 make all the examples I've tried avoid incrementing the hash table reference count to 2 and leaking/improperly reference counting the original.

  1. gc during iteration may cause problems if there are side effects while properties table reference count is > 1
  2. Maybe __sleep in SplFixedArray subclasses looks like zend_std_get_properties_for increments it, then php_var_serialize_get_sleep_props decrements it.

And in fa14eed Optimized object serialization without rebulding properties HashTable for mitigations.

			if (UNEXPECTED(ht)) {
				GC_ADDREF(ht);

Looking at what remains that would possibly add a reference to hash tables for SplFixedArray:

Zend/zend_gc.c has code such as this throughout for get_gc.

So I'm concerned that if json_encode and so on are called (e.g. due to __destruct, even in normal applications not deliberately trying to trigger this), the calls to Z_OBJPROP_P will now leak HashTable instances due to this change, and possibly prematurely free the original.

  1. var_export/other starts on SplFixedArray $splArray with refcount 1
  2. gc is triggered, whether by gc_collect_cycles or normal gc timing, temporarily incrementing refcount from 1 to 2.
  3. A destructor of some other class is called
  4. That destructor calls json_encode($splArray) directly or indirectly, while obj->properties has a refcount of 2
  5. json_encode gets (and leaks) a copy of the original array and decrements the reference count of the original array, which may later cause issues.
static int php_var_serialize_get_sleep_props(
		HashTable *ht, zval *struc, HashTable *sleep_retval) /* {{{ */
{
	zend_class_entry *ce = Z_OBJCE_P(struc);
	HashTable *props = zend_get_properties_for(struc, ZEND_PROP_PURPOSE_SERIALIZE);

Overrides of __sleep also increase reference counts in zend_get_properties_for

// ext/spl/spl_fixedarray.c
static HashTable* spl_fixedarray_object_get_gc(zend_object *obj, zval **table, int *n)
{
	spl_fixedarray_object *intern = spl_fixed_array_from_obj(obj);
	HashTable *ht = zend_std_get_properties(obj);

	*table = intern->array.elements;
	*n = (int)intern->array.size;

	return ht;
}

@dstogov
Copy link
Member

dstogov commented Feb 11, 2022

@TysonAndre hey, sorry. I already finished work for this week. (I ll review last comment later). BTW I think we should chat a bit on next week...
I real appreciative your involvement in php development and in case we have some misunderstanding (related to this issue or in general), it's better to discuss this. Please send me direct contact to dmitry at zend.com

@TysonAndre
Copy link
Contributor Author

  1. More context: The problems we'd see with SplFixedArray and var_export are the problems we'd see in general with cyclic data structures in future additions to php. Mostly, I want to either have var_export working the way users would expect, or have a good explanation to include for the lack of var_export support, for RFCs such as https://wiki.php.net/rfc/deque before I'd started voting. I unexpectedly ran into this when writing test cases for https://github.com/TysonAndre/pecl-teds/ . General-purpose data structures would face the same problem as SplFixedArray is having, and fixing it for SplFixedArray would allow for fixing it for PECLs or future additions to php.

    Changing var_export was the only way I could think of that was safe to do that - I'd assume any changes to var_export would target 8.2, though

  2. I managed to create a test case that's more realistic that still affects the latest 8.2.0 commit c77bbcd

  3. We have infinity recursion because of recursive data and incompatibility of recursion protection mechanism with SplFixedArray.

    This was why I'd initially asked about changing the recursion protection mechanism used (I'd believed it was impossible) - first checking for infinite recursion on the object before checking for infinite recursion on the get_properties_for result in var_export (and to better understand why it was done that way, if you'd remembered). You have much more experience than I do with internals, so I thought there may have been something I'd overlooked or an alternative approach

So I'm concerned that if json_encode and so on are called (e.g. due to __destruct, even in normal applications not deliberately trying to trigger this), the calls to Z_OBJPROP_P will now leak HashTable instances due to this change, and possibly prematurely free the original.

<?php
class VarExportInDestruct {
    public $self;
    public function __construct(public SplFixedArray $parent) {
        $this->self = $this;
    }
    public function __destruct() {
        echo "In __destruct during cyclic garbage collection, e.g. var_export in debugging output\n";
        var_export($this->parent);  // same for var_export($this);
        echo "\n";
    }
}

call_user_func(function () {
    $x = new SplFixedArray(2);
    $x[0] = $x;
    $x[1] = $x;
    new VarExportInDestruct($x);
    echo "Before gc_collect_cycles\n";
    gc_collect_cycles();
    echo "After gc_collect_cycles\n";
});

Actual output with latest patch set: This outputs thousands of var_export warnings and runs out of memory - I added printf statements to spl_fixedarray_object_get_properties to debug

Warning: var_export does not handle circular references in %s on line 9
spl_fixedarray_object_get_properties 0x7f9a702ba480
spl_fixedarray_object_get_properties 0x7f9a702ba480
Duplicating properties table

Warning: var_export does not handle circular references in %s on line 9
spl_fixedarray_object_get_properties 0x7f9a702ba4e0
spl_fixedarray_object_get_properties 0x7f9a702ba4e0
Duplicating properties table

Warning: var_export does not handle circular references in %s on line 9
spl_fixedarray_object_get_properties 0x7f9a702ba540
spl_fixedarray_object_get_properties 0x7f9a702ba540
Duplicating properties table

Warning: var_export does not handle circular references in %s on line 9
spl_fixedarray_object_get_properties 0x7f9a702ba5a0

Expected output (e.g. php 8.2 before the patch)

Before gc_collect_cycles
In __destruct during cyclic garbage collection, e.g. var_export in debugging output

Warning: var_export does not handle circular references in %s on line 9

Warning: var_export does not handle circular references in %s on line 9
SplFixedArray::__set_state(array(
   0 => NULL,
   1 => NULL,
))
After gc_collect_cycles

@TysonAndre TysonAndre changed the title Recent patch to SplFixedArray causes array_walk + json encode to hang Recent patch to SplFixedArray causes array_walk + json encode to hang, circular var_export during gc to run out of memory Feb 12, 2022
@dstogov
Copy link
Member

dstogov commented Feb 14, 2022

We think by a bit different categories. You see a bug and like to fix the engine, I see a buggy class that works against the engine expectations. It makes tricks to reuse properties for something else (does this improperly) and expects the engine to be fixed. I think, the problem should be fixed in the class itself. If it recreates properties table on each call to get_properties() let it protects from recursion itself.

@TysonAndre
Copy link
Contributor Author

If it recreates properties table on each call to get_properties() let it protects from recursion itself.

SplFixedArray doesn't have a way to tell if the properties table (whether from get_properties/get_properties_for) is still in use, though, so I don't know how you'd make it protect from recursion without changing the engine (with var_export/debug_zval_dump, infinite recursion detection only works if the returned table is the same instance every time)

What solution did you have in mind?


Also, if a solution can't be found, I think we should revert 52ae641 Fixed GH-8044 (var_export/debug_zval_dump HT_ASSERT_RC1 debug failure for SplFixedArray) in 8.0+ to avoid introducing new bugs in 8.0/8.1 such as out of memory errors during gc (before the next RC/patch releases) - I consider the original bug much, much less severe

@dstogov
Copy link
Member

dstogov commented Feb 14, 2022

Can SplFixedArray detect when it was changed and regenerate properties only in this case?

52ae641 fixes the incorrect usage of HashTable. We may revert it, but then we will get assertion in DEBUG build and potential inconsistency in RELEASE.

@TysonAndre
Copy link
Contributor Author

Can SplFixedArray detect when it was changed and regenerate properties only in this case?

That's probably possible, assuming that in practice, it won't change while var_export/debug_zval_dump is called, barring any PECL classes that misbehave during handlers.get_properties (not aware of any).

After modification calls such as offsetSet/offsetUnset/setSize, set a boolean or bit flag in SplFixedArray's implementation indicating it was modified

@TysonAndre
Copy link
Contributor Author

That's probably possible, assuming that in practice, it won't change while var_export/debug_zval_dump is called, barring any PECL classes that misbehave during handlers.get_properties (not aware of any).

After modification calls such as offsetSet/offsetUnset/setSize, set a boolean or bit flag in SplFixedArray's implementation indicating the array it stores was modified

TysonAndre added a commit to TysonAndre/php-src that referenced this issue Feb 16, 2022
Closes phpGH-8079

Under most circumstances, the zend_hash_index_update is setting the value to
itself when the hash table's reference count is already 2
TysonAndre added a commit to TysonAndre/php-src that referenced this issue Feb 16, 2022
Closes phpGH-8079

Under most circumstances, the zend_hash_index_update is setting the value to
itself when the hash table's reference count is already 2
TysonAndre added a commit to TysonAndre/php-src that referenced this issue Feb 17, 2022
Closes phpGH-8079

Track whether the spl_fixedarray was modified since the last call to
get_properties
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants