Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add iterable\any(iterable $input, ?callable $cb=null), all(...), none(...), find(...), reduce(...) #6053

Closed
wants to merge 9 commits into from

Conversation

TysonAndre
Copy link
Contributor

@TysonAndre TysonAndre commented Aug 29, 2020

EDIT (2021-06-01): The namespace changed from PHP\iterable\ to iterable\ and several functions were added
TODO: Change namespace to Iterable\

This is an alternative approach to https://externals.io/message/103357 and https://github.com/php/php-src/pull/3597/files by bugreportuser
with the following changes:

  • Having two different helpers for array/Traversable seemed like an unnecessary distinction since the return type was bool, unlike array_filter and array_map.
    Unify to any() and all() for arrays and Traversables
  • Fix memory leaks for reference-counted keys such as strings
  • Make the callback optional - check if the values are truthy if it is null.
  • Move from the global namespace to iterable\ after discussion of whether PHP should start using namespaces for new collections of internal functionality and collecting feedback on namespace choices - https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote - After that vote was held, a different RFC https://wiki.php.net/rfc/namespaces_in_bundled_extensions passed forbidding vendored namespaces and Spl\
  • Add none(), find(), reduce() after feedback by a significant fraction of voters opposed to the change (https://wiki.php.net/rfc/any_all_on_iterable#straw_poll) that this was too small in scope

The following can be used as a polyfill (and also documents how this is meant to work)

<?php

namespace iterable;

/**
 * Determines whether any element of the iterable satisfies the predicate.
 *
 *
 * If the value returned by the callback is truthy
 * (e.g. true, non-zero number, non-empty array, truthy object, etc.),
 * this is treated as satisfying the predicate.
 *
 * @param iterable $input
 * @param null|callable(mixed):mixed $callback
 */
function any(iterable $input, ?callable $callback = null): bool {
    foreach ($input as $v) {
        if ($callback !== null ? $callback($v) : $v) {
            return true;
        }
    }
    return false;
}

/**
 * Determines whether all elements of the iterable satisfy the predicate.
 *
 * If the value returned by the callback is truthy
 * (e.g. true, non-zero number, non-empty array, truthy object, etc.),
 * this is treated as satisfying the predicate.
 *
 * @param iterable $input
 * @param null|callable(mixed):mixed $callback
 */
function all(iterable $input, ?callable $callback = null): bool {
    foreach ($input as $v) {
        if (!($callback !== null ? $callback($v) : $v)) {
            return false;
        }
    }
    return true;
}

/**
 * Determines whether no elements of the iterable satisfies the predicate.
 *
 *
 * If the value returned by the callback is truthy
 * (e.g. true, non-zero number, non-empty array, truthy object, etc.),
 * this is treated as satisfying the predicate.
 *
 * @param iterable $input
 * @param null|callable(mixed):mixed $callback
 */
function none(iterable $input, ?callable $callback = null): bool {
    return !\iterable\any($input, $callback);
}

/**
 * Returns the first element of $input satisfying $callback, or returns $default
 * 
 * If the value returned by the callback is truthy
 * (e.g. true, non-zero number, non-empty array, truthy object, etc.),
 * this is treated as satisfying the predicate.
 */
function find(iterable $input, callable $callback, mixed $default = null): mixed {
    foreach ($input as $v) {
         if ($callback($v)) {
             return $v;
         }
    }
    return $default;
}

/**
 * Similar to array_reduce but also acts on Traversables.
 */
function reduce(iterable $input, callable $callback, mixed $initial = null): mixed {
    foreach ($input as $v) {
         $initial = $callback($initial, $input);
    }
    return $initial;
}

For example, the following code could be shortened significantly:

// The old version
$satisifies_predicate = false;
foreach ($item_list as $item) {
    // Performs DB operations or external service requests, stops on first match by design.
    if (API::satisfiesCondition($item)) {
        $satisfies_predicate = true;
        break;
    }
}
if (!$satisfies_predicate) {
    throw new APIException("No matches found");
}
// more code....
use function Iterable\any;
// ...

// The new version is much shorter, readable, and easier to review,
// without creating temporary variables or helper functions that are used in only one place.
 
// Performs DB operations or external service requests, stops on first match by design.
if (!any($item_list, fn($item) => API::satisfiesCondition($item))) {
    throw new APIException("No matches found");
}

Potential future scope:

  • Iterable\apply(iterable $input, callable(mixed):unused $callback) (alternate names: walk/foreach_value) - Unlike array_walk, this does not convert array values to references and should avoid the resulting memory usage increase. Unlike https://www.php.net/iterator_apply - this passes the value - closure use variables are newer than iterator_apply.
  • Iterable\map(iterable $input, callable(mixed):mixed $callback): ImmutableKeyValueSequence
  • Iterable\filter_entries(iterable $input, callable(mixed $value, mixed $key):bool $callback): ImmutableKeyValueSequence (same arg order as ARRAY_FILTER_USE_BOTH)
  • Iterable\filter_values(iterable $input, callable(mixed $value):unused $callback = null): array (discards keys and returns a list of values that satisfy the predicate (or are truthy) (with keys 0,1,2,3...))
  • Iterable\filter_keys(iterable $input, callable(mixed $key):bool $callback = null): array
  • Iterable\is_empty(iterable $input): bool
  • Iterable\keys(iterable $input): array (like array_keys)
  • Iterable\values(iterable $input): array (like array_values/iterator_to_array but working on both. returns a list of values)
  • Iterable\sorted_values(iterable $input): array (like sort but (1) returns a sorted array instead of taking a reference, (2) also acts on Traversables)
  • Iterable\contains_value(iterable $input, mixed $value): bool (contains_key seems redundant with ArrayAccess/array_key_exist)
  • slice/chunk/flip
  • any_key/all_key/no_key?

@TysonAndre TysonAndre changed the title Proposal: Add any(...) and all(iterable $input, ?callable $cb, int $use_type=0) Proposal: Add any(...) and all(iterable $input, ?callable $cb=null, int $use_type=0) Aug 29, 2020
@TysonAndre TysonAndre changed the title Proposal: Add any(...) and all(iterable $input, ?callable $cb=null, int $use_type=0) Proposal: Add any(...) and all(iterable $input, ?callable $cb=null) Aug 31, 2020
@morrisonlevi
Copy link
Contributor

morrisonlevi commented Sep 1, 2020

I suggest slightly different signatures, assuming we stay value-oriented:

/**
 * @template Element
 * @psalm-param iterable<Element> $of
 * @psalm-param callable(Element): bool $satisfy
 */
function all_values(iterable $of, callable $satisfy): bool
{
    foreach ($of as $value) {
        if (!$satisfy($value)) {
            return false;
        }
    }
    return true;
}


/**
 * @template Element
 * @psalm-param iterable<Element> $of
 * @psalm-param callable(Element): bool $satisfies
 */
function any_value(iterable $of, callable $satisfies): bool
{
    foreach ($of as $value) {
        if ($satisfies($value)) {
            return true;
        }
    }
    return false;
}

// with named parameters
all_values(of: [1, 3, 5, 7], satisfy: 'is_odd');
any_value(of: [0, 2, 4, 6], satisfies: 'is_prime');

// without named parameters
all_values([1, 3, 5, 7], 'is_odd');
any_value([0, 2, 4, 6], 'is_prime');

The naming clarifies what any and all are about--the values--and leaves room for naming functions that are key or key/value oriented.

@@ -6389,3 +6389,155 @@ PHP_FUNCTION(array_combine)
} ZEND_HASH_FOREACH_END();
}
/* }}} */

static inline void php_array_until(zval *return_value, HashTable *htbl, zend_fcall_info fci, zend_fcall_info_cache fci_cache, int stop_value) /* {{{ */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In .c files the inline should either be removed, or should be changed into zend_always_inline depending on intent.

Suggested change
static inline void php_array_until(zval *return_value, HashTable *htbl, zend_fcall_info fci, zend_fcall_info_cache fci_cache, int stop_value) /* {{{ */
static void php_array_until(zval *return_value, HashTable *htbl, zend_fcall_info fci, zend_fcall_info_cache fci_cache, int stop_value) /* {{{ */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was basing this on the surrounding code, such as php_search_array used for in_array and array_search

I'd assume there would be a place where you'd want "inline depending on compiler's best guess at optimization level" but I'm not familiar with the why of zend_always_inline vs inline

/* void php_search_array(INTERNAL_FUNCTION_PARAMETERS, int behavior)
 * 0 = return boolean
 * 1 = return key
 */
static inline void php_search_array(INTERNAL_FUNCTION_PARAMETERS, int behavior) /* {{{ */
{

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to specify inline; a static function in a .c file is enough. If you want to be sure it's inlined, that's when you throw on zend_always_inline.

zend_fcall_info_cache fci_cache = empty_fcall_info_cache;

ZEND_PARSE_PARAMETERS_START(1, 2)
Z_PARAM_ZVAL(input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not have Z_PARAM_* for iterable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and didn't find anything - https://php.net/iterator_map takes an iterator but not an iterable. I didn't see any .stub.php files using iterable (see build/gen_stubs.php change)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I opened PR #6064: "Add Z_PARAM_ITERABLE and co" to address this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should now be able to use Z_PARAM_ITERABLE as the PR merged.

@carusogabriel carusogabriel modified the milestone: PHP 8.1 Sep 1, 2020
@TysonAndre
Copy link
Contributor Author

I suggest slightly different signatures, assuming we stay value-oriented: (any_value() and all_values())

Quite a few alternate suggestions have been made, including https://externals.io/message/111756#111760

To be in line with naming conventions, I would suggest calling these
iter_any() and iter_all(), using iter_* as the prefix for our future
additions to the "functions that work on arbitrary iterables" space.
iterable_any() and iterable_all() would work as well, though are
unnecessarily verbose.

https://github.com/nikic/iter was a userland library with similar iteration primitives which might be worth adopting analogous functions for - though I doubt future proposals for php-src would use generators implemented in C due to performance.

It somewhat makes sense for iter_first() or iter_is_empty() for consistency if we expect a family of functions to be added, but iter_apply(), iter_all(), iter_any(), iter_keys(iterable): array seem unnecessarily verbose (that's not a strong preference for the short version, though)

}
/* fallthrough */
default:
/* Intentionally not converted to an exception */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment shouldn't be here :)

@enumag
Copy link

enumag commented Dec 18, 2020

Hi, what's the reason for only passing values? I see the $flags in future scope but I always found this unnecessary - why not just always pass both value and key? (with value first of course) It's simple because you need no flags but the callback can always do what's needed.

@TysonAndre
Copy link
Contributor Author

Hi, what's the reason for only passing values? I see the $flags in future scope but I always found this unnecessary - why not just always pass both value and key? (with value first of course) It's simple because you need no flags but the callback can always do what's needed.

PHP 8.0+ throws an ArgumentCountError when an internal function is called with too many parameters. A future PHP version may also throw when a user-defined function is called with too many parameters, but a lot of code may rely on that.

So passing extra parameters would make any() throw for type checks. Additionally, many programmers would expect function (...$args) use ($orig) { log($args); $orig(...$args); } to work the same as the original callback for non-references, so checking parameter counts would have unintuitive behavior.

php > echo PHP_VERSION;
8.0.1-dev
php > var_export(is_string('value', 'key'));

Warning: Uncaught ArgumentCountError: is_string() expects exactly 1 argument, 2 given in php shell code:1
Stack trace:
#0 php shell code(1): is_string('value', 'key')
#1 {main}
  thrown in php shell code on line 1
php > var_export(array_filter(['key' => 'value'], 'is_string', ARRAY_FILTER_USE_BOTH));

Warning: Uncaught ArgumentCountError: is_string() expects exactly 1 argument, 2 given in php shell code:1
Stack trace:
#0 [internal function]: is_string('value', 'key')
#1 php shell code(1): array_filter(Array, 'is_string', 1)
#2 {main}
  thrown in php shell code on line 1

@TysonAndre TysonAndre force-pushed the any-all-iterable-checks branch 2 times, most recently from 1b4e5c0 to e8a15fa Compare December 18, 2020 18:13
@morrisonlevi
Copy link
Contributor

morrisonlevi commented Dec 18, 2020

@TysonAndre Would you be willing to collaborate on another attempt, this time sticking it into Spl, which I am something of a maintainer for? If so, send me an email.

@TysonAndre
Copy link
Contributor Author

Would you be willing to collaborate on another attempt, this time sticking it into Spl, which I am something of a maintainer for? If so, send me an email.

I was rebasing this because I was planning on resuming the original attempt, which I never started the vote on.

I'd postponed it because

  1. At the time, PHP hadn't yet split out the PHP-8.0 branch from the main development branch and I was waiting for 8.0.0 to have a stable release before proposing changes for 8.1.
  2. Finishing up support for PHP 8.0 in other projects(phan, several PECLs) was time consuming
  3. I was on the fence on the name alternatives and hadn't sent out a straw poll - I like the name any(), but if more functions would be added in the future it would make sense to give them a consistent prefix such as iterable_any()/iterable_all/iterable_apply/iterable_reduce/iterable_keys, and avoid conflicts with existing functions

@TysonAndre
Copy link
Contributor Author

Would you be willing to collaborate on another attempt, this time sticking it into Spl, which I am something of a maintainer for?

You make a good point about moving it to Spl - all of the existing functions on Traversables I remember (iterator_to_array, iterator_apply, iterator_count) are in spl. So are classes (https://www.php.net/manual/en/spl.iterators.php). I wasn't thinking about that distinction since spl was always enabled and statically compiled since PHP 5.3, but it would make sense to clarify this would go in spl (for the manual) and update the implementation unless there's a reason not to put it in spl.


Aside: https://wiki.php.net/rfc/any_all_on_iterable_straw_poll#vote was created to poll internals on the naming choice

@morrisonlevi
Copy link
Contributor

morrisonlevi commented Dec 20, 2020

If we move it to the SPL there are other naming choices: Spl\any_value, Spl\all_values, etc. I still like that naming scheme, and would love to talk to you more about it. Of course, Spl\any and Spl\all could also work, but aren't my first choice. I think we can add a few more functions with it being in the SPL as well if interested (I am).

I am sure the namespacing here is adequate and does not require a specific vote for it. I'm sort of a maintainer, and other sort-of maintainers are also in favor of it, so that part isn't an issue. (I think all the official maintainers have gone inactive).

@TysonAndre TysonAndre changed the title Proposal: Add any(...) and all(iterable $input, ?callable $cb=null) Proposal: Add PHP\iterable\any(...) and all(iterable $input, ?callable $cb=null) Feb 6, 2021
bugreportuser and others added 5 commits May 31, 2021 16:23
Create iterator_every() and iterator_any()

This has merge conflicts which were resolved in the subsequent commit(s).
This commit was used as a basis for implementing an alternate approach to
checking if all()/any() elements in an iterable satisfy a predicate.

Cherry-picked from phpGH-3597 with modifications

Combine helpers into any() and all(), modernize for php 8

Properly free the result of the callback if it was a refcounted
type such as a string/array.

Fix the default value of the callback.

Co-Authored-By: bugreportuser
Co-Authored-By: Tyson Andre <tandre@php.net>
Semantics are similar to array_reduce
And add checks for edge cases of Traversables
@TysonAndre TysonAndre changed the title Proposal: Add PHP\iterable\any(...) and all(iterable $input, ?callable $cb=null) Proposal: Add iterable\any(iterable $input, ?callable $cb=null), all(...), none(...), find(...), reduce(...) Jun 2, 2021
Copy link
Contributor

@morrisonlevi morrisonlevi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your initial comment, the return types for find and reduce are showing bool; can you fix that?


function none(iterable $iterable, ?callable $callback = null): bool {}

function reduce(iterable $iterable, callable $callback, mixed $initial = null): mixed {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly prefer removing $initial and adding fold which always requires it:

function fold(iterable $iterable, callable $callback, mixed $initial): mixed {}
function reduce(iterable $iterable, callable $callback): mixed {}

The fold function doesn't throw on empty, and the reduce will. This pattern exists in other languages, such as Kotlin.

I'm happy to do this work if you'll agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the inconsistency with array_reduce (which has optional $initial=null) would have more objectors for making it harder to learn the language or switch code from array_reduce to *reduce intuitively for beginners.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an error condition in reduce where there is not an initial value and an empty iterable, and it should throw because there is no legal value we can return that isn't already possible in the reduction. We should not repeat the mistakes of the past. You argue in another comment that find is useful in other languages, and yet you don't buy that same argument here? What gives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I'd agree my earlier proposal for reduce was a mistake and it's worth changing, adding fold and either removing reduce entirely or requiring a non-empty array.

The other argument was about including a function, not a change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can work on fold tonight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you okay with these signatures and semantics for fold and reduce?

I imagine there is some discussion to be had for naming the parameters to make sure named parameters is a good experience, so let me know what you think. The name $into I picked from Swift. I used $by because it's short but not an abbreviation like acc; my quick glance in other languages' docs didn't turn up anything better.

namespace Iterable;

/**
 * @template Element
 * @template Result
 * @param iterable<Element> $seq
 * @param Result $into
 * @param callable(Result, Element): Result $by
 * @return Result
 */
function fold(iterable $seq, mixed $into, callable $by): mixed {
    foreach ($seq as $value) {
        $into = $by($into, $value);
    }
    return $into;
}

/** Private helper, wouldn't actually be exposed.
 * @template Key
 * @template Value
 * @param iterable<Key, Value> $seq
 * @return \Iterator<Key, Value>
 */
function to_iterator(iterable $seq): \Iterator {
    if (\is_array($seq)) {
        return new \ArrayIterator($seq);
    } elseif ($seq instanceof \Iterator) {
        return $seq;
    } else {
        assert($seq instanceof \IteratorAggregate);
        return namespace\to_iterator($seq->getIterator());
    }
}


/**
 * @template Element
 * @param iterable<Element> $seq
 * @param callable(Element, Element): Element $by
 * @return Element
 * @throws \ValueError if $seq does not have at least 1 element
 */
function reduce(iterable $seq, callable $by): mixed {
    $iterator = namespace\to_iterator($seq);
    $iterator->rewind();
    if (!$iterator->valid()) {
        throw new \ValueError("parameter \$seq to reduce... was empty");
    }

    $into = $iterator->current();
    for ($iterator->next(); $iterator->valid(); $iterator->next()) {
        $into = $by($into, $iterator->current());
    }
    return $into;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fold, I think having the callback third would be hard to remember when it's second in other reduce() function

The inner implementation seem reasonable enough. I assume the ArrayObject is just for illustrating the behavior and not the internal implementation

$seq seems like an harder to remember naming choice compared to $array/$iterable used elsewhere - https://www.php.net/manual/en/function.iterator-apply.php and https://www.php.net/manual/en/function.array-walk.php - especially for non-english speakers

PHP's already using $initial for https://www.php.net/manual/en/function.array-reduce.php and I don't see a strong reason to switch to a different name - initial's been used elsewhere (e.g. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Reduce)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner implementation seem reasonable enough. I assume the ArrayObject is just for illustrating the behavior and not the internal implementation

The ArrayIterator is just for showing behavior, yes. Notably, this will not pass NULL as the first parameter to the callback on the very first time it is called, unlike array_reduce and what this PR currently does.

As an example with the data set [1, 3, 5, 7]:

$result = reduce([1, 3, 5, 7], function ($into, $value) {
    $retval = $into + $value;
    echo "fn ({$into}, {$value}) => {$retval}\n";
    return $retval;
});

This will print:

fn (1, 3) => 4
fn (4, 5) => 9
fn (9, 7) => 16

And not:

fn(, 1) => 1
fn (1, 3) => 4
fn (4, 5) => 9
fn (9, 7) => 16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the changes to reduce on this branch: https://github.com/morrisonlevi/php-src/tree/levi/any-all-iterable-checks. For some reason it wouldn't let me select your fork as the merge-base, so I didn't open a PR, but you can look at the last two commits. I did not yet add fold.


function reduce(iterable $iterable, callable $callback, mixed $initial = null): mixed {}

function find(iterable $iterable, callable $callback, mixed $default = null): mixed {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this one? I'd rather add filter, map, and flatmap if we need more functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find() is also useful and many other programming languages include both (js, haskell, etc.)

For example, if you have an array of a million elements and only want the first match, it is much more efficient to call find() if the iterable contains a matching value (and there would be less service calls and db calls) compared to reset(array_filter(...))

Additionally, filter() and map() would be waiting on the existence of CachedIterable when it starts, because Traversables can have repeated keys yield 'key' => 1; yield 'key' => 2;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, filter() and map() would be waiting on the existence of CachedIterable when it starts, because Traversables can have repeated keys yield 'key' => 1; yield 'key' => 2;

This does not require a CachedIterable; they can return any iterator.

Edit: on second thought, they should not return a cached iterable. These routines are often chained together; if every piece of the chain caches their results, it will balloon memory usage.

Copy link
Contributor Author

@TysonAndre TysonAndre Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that there's nothing in the standard library yet that supports rewinding, counting, and especially arbitrary offset access

It seems much more difficult to use without the support for rewindability, random/repeated offset access, countability, etc.

But yes, I suppose you could hide the implementation entirely with InternalIterator and only support a single iteration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: on second thought, they should not return a cached iterable. These routines are often chained together; if every piece of the chain caches their results, it will balloon memory usage.

CachedIterable is basically the name chosen for a rewindable immutable key-value sequence. It isn't cached permanently, it has a regular object lifetime. I'm referring to https://wiki.php.net/rfc/cachediterable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CachedIterable is basically the name chosen for a rewindable immutable key-value sequence. It isn't cached permanently, it has a regular object lifetime. I'm referring to https://wiki.php.net/rfc/cachediterable

Yes, I am specifically saying they should not return this object. It will ballon memory usage to do it that way. Think about it; you have a map + filter plus some terminator like first_n with n=100. Filter and map will each hold at least 100 values in memory that shouldn't be there while calculating the first_n.

Copy link
Contributor Author

@TysonAndre TysonAndre Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am specifically saying they should not return this object. It will ballon memory usage to do it that way. Think about it; you have a map + filter plus some terminator like first_n with n=100. Filter and map will each hold at least 100 values in memory that shouldn't be there while calculating the first_n.

  1. Many end users would want the standard library to return something that could be iterated over multiple times - this was a source of bugs in spl classes such as https://www.php.net/arrayobject prior to them being switched to iterables
$foo = map(...);
foreach ($foo as $i => $v1) {
    foreach ($foo as $i => $v2) {
        if (some_pair_predicate($v1, $v2)) {
            // do something
        }
    }
}
  1. Library/application authors that are interested in lazy generators could use or implement something such as https://github.com/nikic/iter instead - my opinion is that the standard library should provide something that is easy to understand, debug, serialize or represent, etc.
  2. It would be harder to understand why SomeFrameworkException is thrown in code unrelated to that framework when it is passed to some function that accepts a generic iterable, and harder to write correct exception handling for it if done in a lazy generation style
  3. It is possible to implement a lazy version of CachedIterable that only loads values as needed - but I hadn't proposed it due to doubts that 2/3 of voters would consider it widely useful enough to be included in php rather than as a userland or PECL library

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterables are made from arrays and generators. Since there is already a large body of functions which work with arrays, it seems reasonable to assume they would use an Iterable part of the standard library because they also use iterators/generators. One of the main points of generators is to reduce memory. We shouldn't add a corpus of iterable functions that will increase memory; that works directly against the goals of the iterable feature!

No, it's much better to compose something if you want it to be eager, e.g. $foo = CachedIterable::new(map(...$args)); We can include this in the examples in the docs for filter, map, etc.

Copy link
Contributor Author

@TysonAndre TysonAndre Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterables are made from arrays and generators

Iterables are made from arrays and Traversable objects such as SplObjectStorage, ArrayObject, user-defined data structures, and Generators

We shouldn't add a corpus of iterable functions that will increase memory; that works directly against the goals of the iterable feature!

In common use cases, the memory increase may be small, especially if the lifetime of the variable or temporary expression holding the result is short.

Additionally, a lazy iterable would require keeping a reference to the previous (possibly lazy) iterable, and the iterables/arrays those reference - if the initial iterable is larger than the result then eagerly evaluating $var = map($cb, temporary()) would end up saving memory

No, it's much better to compose something if you want it to be eager, e.g. $foo = CachedIterable::new(map(...$args)); We can include this in the examples in the docs for filter, map, etc.

That seems error prone and I'm personally opposed to that.

PHP has typically been imperitive rather than functional, and focused on "cater[ing] to the skill-levels and platforms of a wide range of users" as RFC authors are repeatedly reminded in https://wiki.php.net/rfc/template and my interpretation of that is that imperitive would be much more acceptable (aside: The loosely typed language part seems less applicable nowadays)

PHP is and should remain:

  1. a pragmatic web-focused language
  2. a loosely typed language
  3. a language which caters to the skill-levels and platforms of a wide range of users

Lazy data structures would be easy to misuse (consume twice, attempt to serialize or encode, (or var_dump or inspect with Xdebug), easier to attempt to log the full iterable(consume twice) etc) without (or even with) linters and static analyzers, so this really doesn't seem like catering to a wide range of users.


Explicitly using a different family of functions to act on generators internally would probably make more sense than being the default, e.g. https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#findFirst-- and https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html (Streams are separate from java.util.Collection in java, javascript eagerly evaluates https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Map, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's much better to compose something if you want it to be eager, e.g. $foo = CachedIterable::new(map(...$args)); We can include this in the examples in the docs for filter, map, etc.

Having the eager version be longer than the lazy version (instead of shorter or the same length) would also encourage the use of the lazy version, which I'd objected to for being error prone and easy to misuse.

zend_always_inline should only be done if:
 - It's been proven to be a hotspot
 - It's in a header file
Iterable\reduce() no longer takes an initial value; it pulls it from
the iterable. If there isn't at least one value, it throws.

Expect another commit with Iterable\fold which has a required $initial
parameter, which does not throw if the iterable is empty.
@morrisonlevi
Copy link
Contributor

morrisonlevi commented Jun 8, 2021

Looks like the reduce_traversable.phpt test is failing on quite a few different builds. Will look at those tonight.

@morrisonlevi
Copy link
Contributor

I fixed one trivial issue with the wrong enum type; again pushed to my branch.

@TysonAndre
Copy link
Contributor Author

Looks like the reduce_traversable.phpt test is failing on quite a few different builds. Will look at those tonight.

Fixed, also changed back from ZVAL_COPY to ZVAL_COPY_VALUE to avoid a leak reported in --enable-debug.

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jun 9, 2021

It seems likely that I'll withdraw this - competing rfcs can be created to work on their own implementation, either based on this implementation or independently.

The scope has increased significantly from 2 functions https://wiki.php.net/rfc/any_all_on_iterable#changelog - and with the namespacing discussion it seems to me like the functionality would need closer to a 5/6 majority (or higher) to pass after accounting for voters strongly opposed to choosing a given namespacing pattern for future additions to core


Based on early results of https://wiki.php.net/rfc/cachediterable_straw_poll#namespace_choices , and https://wiki.php.net/rfc/any_all_on_iterable#straw_poll and https://wiki.php.net/rfc/any_all_on_iterable_straw_poll_namespace#vote I'm seriously doubting this would meet the 2/3rd majority required for an RFC to pass.

  • Around half of voters were opposed to it for being too small and I'm still unclear of the desired scope size - increasing it in size would also increase the time investment for implementation and discussion, disagreement with implementation choices such as previous discussion, and likeliness of voters voting against an RFC because it is difficult to understand (or disagreement with implementation choices)

  • Separately from that, over 1/3 total would object to this being added to php if "should be in userland", object to functionality, and "object to the choice of namespace" are summed up, so even if it was sufficiently increased in scope, it still seems likely to fail to pass.

  • I don't think it's as easy as it was historically to add a large group of standard library functions - when https://www.php.net/manual/en/function.array-change-key-case.php was added in php 4.2, I believe there was a much smaller group of maintainers who knew each other, and

    1. In php 5, the voting threshold was 50%+1 rather than 2/3 majority, and there weren't namespaces. (I'm not familiar with the history)
    2. https://en.wikipedia.org/wiki/PHP#PHP_3_and_4 I'm not even sure if php 4.2 required a formal vote for standard libraries
  • https://wiki.php.net/rfc/cachediterable_straw_poll#namespace_choices is almost evenly split on Iterable\any or iterable_any so I can't even say an option is clearly widely preferred and decided on

  • I'd prefer to wait for some other smaller group of global functions to be added to core/standard/spl to set a precedent for the namespacing debate, but php is a pretty mature language at this point, and many new groups of functions would be their in own extensions with clearer namespacing guidelines. No groups of functionality similar to an iterable library come to mind - recently proposed global functions in core/standard/spl were too similar to existing functions for it to be debated

@TysonAndre
Copy link
Contributor Author

https://wiki.php.net/todo/php81#timetable indicates the feature freeze is in July 20th, in 5 weeks in 8.1.0beta1 - I'd forgotten it was this soon.

https://wiki.php.net/rfc/cachediterable_straw_poll#namespace_choices looks as if an RFC may pass, but may fail again without a much larger group of standard library iterable functions to pass (given other objections to the functionality or choice of namespace)

With rfcs requiring 2 weeks of discussion and 2 weeks of voting it make more sense to target 8.2 - if any amendments (new arguments, changes to default behaviors, renaming, etc) are proposed, then there would be much more time to discuss them.

@TysonAndre TysonAndre closed this Jun 16, 2021
@jorgsowa
Copy link
Contributor

Hi @TysonAndre,
I found your PR and I see it would be really awesome to have in-built function like any() or all(), but I think even better would be function find_first() that you suggested, because it covers all cases and even more.

I see also that this function could save some memory and time when users incorrectly search through arrays with loops without early return.

Do you have any plans about implementing/coming back to this RFC?

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Oct 10, 2022

Do you have any plans about implementing/coming back to this RFC?

No - https://wiki.php.net/rfc/any_all_on_iterable had just under 50% of a vote, and RFCs require a 2/3 majority to pass. The no voters objected to it both not having enough functionality and the choice of namespace (preferring global or other) - so even if it had enough functionality added, I'm still concerned it would fail from nothing really being different about namespaces (no new groups of global function functionality were added to php-src).

The iterable functionality can be found at https://github.com/TysonAndre/pecl-teds/#iterable-functions under a different namespace.

I see also that this function could save some memory and time when users incorrectly search through arrays with loops without early return.

This RFC was more about saving developer time, and having a common abstraction that can be used across all projects, rather than various choices of composer/PECL options that differ across projects.

For anything that calls callbacks on each element of an array, a userland solution is probably faster, because it avoids the need to switch from C back to the php vm repeatedly.

(If there's no callback, then the native implementation would be faster)

find_first() is find()?

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Oct 12, 2022

https://wiki.php.net/rfc/iterator_xyz_accept_array passed for php 8.2, and iterator_to_array and iterator_count now accept iterables in general, so there is now an existing group of 2 functions with the same naming pattern for functionality acting on iterables

Though that would get confusing and the names are misleading - it might be nice to have some form of iterable* aliases for those functionality anyway to make it easy to tell that they also accept arrays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants