Skip to content

Add Spl\ForwardArrayIterator and Spl\ReverseArrayIterator #6535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

morrisonlevi
Copy link
Contributor

@morrisonlevi morrisonlevi commented Dec 23, 2020

Spl\ReverseArrayIterator

Spl\ReverseArrayIterator iterates over the array in reverse order, preserving keys.

$array = [ 
    "one" => 1,
    "two" => 2,
    "three" => 3,
];

$iterator = new Spl\ReverseArrayIterator($array);
foreach ($iterator as $key => $current) {
    echo "[{$key} => {$current}]\n";
}
/* outputs:
[three => 3]
[two => 2]
[one => 1]
*/

Spl\ForwardArrayIterator

ForwardArrayIterator is a simple array iterator which does not dup the array (though if the user modifies it after making an iterator then normal copy-on-write will apply and make a copy then). It iterates over the array in order. The API is intentionally very small. In contrast, ArrayIterator has a large API which requires it to almost always duplicate the array.

Spl\ForwardArrayIterator is intended to replace many usages of new \ArrayIterator($array) where iterating over the array in an iterator form is the intended use case. It is not intended to replace other usages, such as ArrayIterator::sort.

$array = [ 
    "one" => 1,
    "two" => 2,
    "three" => 3,
];

$iterator = new Spl\ForwardArrayIterator($array);
foreach ($iterator as $key => $current) {
    echo "[{$key} => {$current}]\n";
}
/* outputs:
[one => 1]
[two => 2]
[three => 3]
*/

For such simple cases I recommend using the $array directly, but sometimes an iterator is needed for generic code, such as:

function reduce_values(iterable $of, callable $by)
{
    // Uses Iterator API to avoid re-implementing it for arrays 
    $iterator = to_iterator($of);
    $iterator->rewind();
    if (!$iterator->valid()) {
        throw new \ValueException(
            __FUNCTION__ . " requires an iterable that contains at least one value"
        );
    }
    $result = $iterator->current();
    for ($iterator->next(); $iterator->valid(); $iterator->next()) {
        $result = $by($result, $iterator->current());
    }
    return $result;
}

// to_iterator may look something like this:
function to_iterator(iterable $subject): \Iterator
{
    if (\is_array($subject)) {
        /* Use \Spl\ForwardArrayIterator over \ArrayIterator so
         * so that we don't duplicate the $subject just to
         * construct an iterator.
         */
        return new \Spl\ForwardArrayIterator($subject);
    } elseif ($subject instanceof \Iterator) {
        return $subject;
    } else /* if ($subject instanceof \IteratorAggregate) */ {
        /* Technically IteratorAggregate::getIterator doesn't
         * have to return an Iterator, just any Traversable.
         * Protect it with a recursive call.
         */
        return namespace\to_iterator($subject->getIterator());
    }
}

Status

  • Spl\ForwardArrayIterator
    • Countable
    • Iterator
    • Forbid unserialization and serialization
    • Tests
  • Spl\ReverseArrayIterator
    • Countable
    • Iterator
    • Forbid unserialization and serialization
    • Tests

@TysonAndre
Copy link
Contributor

TysonAndre commented Jan 28, 2021

Some notes on the implementation:

  • ReflectionProperty->setValue(), unserialize(), etc. may cause unexpected changes in the properties being iterated over, modify the array or fields on the array.

    My preference would be to override the property array getters like SplObjectStorage currently does, and provide getInternalOffset(): int or maybe just getArraySliceBefore(): array/getArraySliceAfter(): array or countElementsBefore() or possibly nothing. e.g. to deal with gaps such as <?php $x = ['a','b','c']; unset($x[1]); new Spl\ForwardArrayIterator($x).

    If the goal is only for var_dump to work, I think overriding the get_properties_handler would work, like ArrayObject does - spl_handler_ArrayObject.get_properties_for = spl_array_get_properties_for;

    Closure::bindTo() is also a concern for modifying $this->offset

  • Future scope: for some use cases, it may be useful to add a prev() method to go backwards in an array, or setInternalOffset(). The presence of gaps in arrays may make using it to implement algorithms such as binary search error-prone

  • This may be useful to use with ZEND_API zend_object_iterator *zend_user_it_get_new_iterator(zend_class_entry *ce, zval *object, int by_ref) to allow user code to return a regular array in getIterator() and have php internally convert that to a zend_object_iterator for foreach to work without changing the many internal/external uses of ce->get_iterator()

@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Jan 28, 2021

I think I should switch to get_properties_for, as you mention later. I don't like hiding the details (and it actually is more work, ha), but given reflection and other ways to manipulate PHP-level properties it can invalidate the state. I don't worry about such things generally speaking: if someone is breaking abstraction levels then they should accept the consequences if things go wrong. However, it might be possible to cause a segfault, such as if we replace the inner array with some other type, and sigsegvs are no-nos.

So yeah, I should hide the properties and use get_properties_for.


I think partly you've mis-understood the intended design. Think of these as Array -> Iterator adapters. It takes an Array and packages it into an Iterator API for use in places where you specifically need an iterator (as I showed in the example in the original post). If we add array-specific features then it's leaking through the API. There may be exceptions to this -- for instance, a method like toArray() can be implemented more efficiently by the implementation than by using its public API, and it is a reasonable operation to perform on abstract iterators (hence iterator_to_array).

By the way, \ArrayIterator always duplicates its array. It can theoretically be optimized, but Nikita actually tried doing this and hit issues. It just has soo many API requirements that eager duplicating the array is the easiest way to accomplish what is needed. I specifically have it as a goal to not duplicate the input array, which is why I started with this in the first place. Here are some timing numbers just to create the iterators (no other methods):

------------------------------------------------------------------------------
Class                      Scale                      Time (nanoseconds)      
------------------------------------------------------------------------------
ArrayIterator                                     1                        935
Spl\ForwardArrayIterator                          1                        187
ArrayIterator                                    10                        209
Spl\ForwardArrayIterator                         10                        120
ArrayIterator                                   100                        539
Spl\ForwardArrayIterator                        100                        123
ArrayIterator                                  1000                       7367
Spl\ForwardArrayIterator                       1000                        129
ArrayIterator                                 10000                      60709
Spl\ForwardArrayIterator                      10000                        141
ArrayIterator                                100000                    2110241
Spl\ForwardArrayIterator                     100000                        567
------------------------------------------------------------------------------

There aren't "holes" in the iterator, by the way. The array iterator reports what the array has:

$x = ['a','b','c']; unset($x[1]);

$iterator = new Spl\ForwardArrayIterator($x);
foreach ($iterator as $key => $current) {
    echo "[{$key} => {$current}]\n";
}
/* results in:
[0 => a]
[2 => c]
 */

@morrisonlevi morrisonlevi force-pushed the spl/ArrayIterators branch 2 times, most recently from e7d3942 to db8683c Compare January 31, 2021 19:17
@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Jan 31, 2021

I did the following in recent commits:

  • Rebased on latest master.
  • Switched to get_properties_for; currently only supports it for debug.
  • Added (un)serialization deny handlers (cannot serialize them).
  • Extracted some of the commonalities between Forward- and Reverse- ArrayIterators, though there is more that could be done.
  • Added tests for ReverseArrayIterator.

Thanks @TysonAndre for the review!

ForwardArrayIterator *iterator = ArrayIterator_from_obj(Z_OBJ_P(ZEND_THIS));
zval *current = ForwardArrayIterator_current(iterator);
if (EXPECTED(current)) {
ZVAL_COPY(return_value, current);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the implementation of ForwardArrayIterator_current and ReverseArrayIterator_current need a call to ZVAL_DEREF? What happens for code such as the following - will $v unintentionally be a reference?

$z = 2;
$y = [&$z];
$x = new SplForwardArrayIterator($y);
foreach ($x as $k => $v) {
    $v = 'new value';
}
var_dump($z);

E.g. elsewhere in spl_array.c, I see the following (The IS_INDIRECT check may have become unnecessary now that $GLOBALS can't be passed directly to methods in php 8.1, ask nikita - that snippet also seems to support object property tables (IS_OBJECT) which definitely have IS_INDIRECT)

	if ((entry = zend_hash_get_current_data_ex(aht, spl_array_get_pos_ptr(aht, intern))) == NULL) {
		RETURN_FALSE;
	}

	if (Z_TYPE_P(entry) == IS_INDIRECT) {
		entry = Z_INDIRECT_P(entry);
	}

	ZVAL_DEREF(entry);
	RETURN_BOOL(Z_TYPE_P(entry) == IS_ARRAY || (Z_TYPE_P(entry) == IS_OBJECT && (intern->ar_flags & SPL_ARRAY_CHILD_ARRAYS_ONLY) == 0));

Copy link
Contributor Author

@morrisonlevi morrisonlevi Jan 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure on the deref. I'll think on it some more, but you might be right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a DEREF is needed -- at least as written I get int(2). In the back of my head by-value is taken care of by the iteration API somewhere; I think the VM will do zend_copy_to_variable. However, as Joe Watkins told me, it is a cheap operation to perform to be sure. Maybe extensions will call this API and not doing the DEREF would be a problem. I've convinced myself you are right about the DEREF, I think.

I don't know much about indirect. @nikic , can you chime in there?

ForwardArrayIterator is a simple array iterator which does not dup
the array (though if the user modifies it after making an iterator
then normal copy-on-write will apply and make a copy then).
The ForwardArrayIterator API is intentionally very small.
@ramsey ramsey added the Feature label Jun 9, 2021
@TysonAndre TysonAndre marked this pull request as ready for review January 22, 2022 19:55
@morrisonlevi morrisonlevi requested a review from Girgias as a code owner October 7, 2023 13:51
@Girgias
Copy link
Member

Girgias commented Oct 15, 2023

I'm going to close this (thanks GH for pinging me...) I'm not against this feature per se, but I don't think it should use SPL as namespace. Also needs a rebase.

Please reopen, or open a new PR if this is still something that is being considered.

@Girgias Girgias closed this Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants