Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add iterable\any(iterable $input, ?callable $cb=null), all(...), none(...), find(...), reduce(...) #6053

Closed
wants to merge 9 commits into from
420 changes: 394 additions & 26 deletions ext/standard/array.c

Large diffs are not rendered by default.

17 changes: 17 additions & 0 deletions ext/standard/basic_functions.stub.php
Expand Up @@ -2,6 +2,8 @@

/** @generate-class-entries */

namespace {

final class __PHP_Incomplete_Class
{
}
Expand Down Expand Up @@ -1510,3 +1512,18 @@ function sapi_windows_set_ctrl_handler(?callable $handler, bool $add = true): bo

function sapi_windows_generate_ctrl_event(int $event, int $pid = 0): bool {}
#endif
} // end global namespace

namespace iterable {

function any(iterable $iterable, ?callable $callback = null): bool {}

function all(iterable $iterable, ?callable $callback = null): bool {}

function none(iterable $iterable, ?callable $callback = null): bool {}

function reduce(iterable $iterable, callable $callback, mixed $initial = null): mixed {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly prefer removing $initial and adding fold which always requires it:

function fold(iterable $iterable, callable $callback, mixed $initial): mixed {}
function reduce(iterable $iterable, callable $callback): mixed {}

The fold function doesn't throw on empty, and the reduce will. This pattern exists in other languages, such as Kotlin.

I'm happy to do this work if you'll agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the inconsistency with array_reduce (which has optional $initial=null) would have more objectors for making it harder to learn the language or switch code from array_reduce to *reduce intuitively for beginners.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an error condition in reduce where there is not an initial value and an empty iterable, and it should throw because there is no legal value we can return that isn't already possible in the reduction. We should not repeat the mistakes of the past. You argue in another comment that find is useful in other languages, and yet you don't buy that same argument here? What gives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I'd agree my earlier proposal for reduce was a mistake and it's worth changing, adding fold and either removing reduce entirely or requiring a non-empty array.

The other argument was about including a function, not a change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can work on fold tonight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you okay with these signatures and semantics for fold and reduce?

I imagine there is some discussion to be had for naming the parameters to make sure named parameters is a good experience, so let me know what you think. The name $into I picked from Swift. I used $by because it's short but not an abbreviation like acc; my quick glance in other languages' docs didn't turn up anything better.

namespace Iterable;

/**
 * @template Element
 * @template Result
 * @param iterable<Element> $seq
 * @param Result $into
 * @param callable(Result, Element): Result $by
 * @return Result
 */
function fold(iterable $seq, mixed $into, callable $by): mixed {
    foreach ($seq as $value) {
        $into = $by($into, $value);
    }
    return $into;
}

/** Private helper, wouldn't actually be exposed.
 * @template Key
 * @template Value
 * @param iterable<Key, Value> $seq
 * @return \Iterator<Key, Value>
 */
function to_iterator(iterable $seq): \Iterator {
    if (\is_array($seq)) {
        return new \ArrayIterator($seq);
    } elseif ($seq instanceof \Iterator) {
        return $seq;
    } else {
        assert($seq instanceof \IteratorAggregate);
        return namespace\to_iterator($seq->getIterator());
    }
}


/**
 * @template Element
 * @param iterable<Element> $seq
 * @param callable(Element, Element): Element $by
 * @return Element
 * @throws \ValueError if $seq does not have at least 1 element
 */
function reduce(iterable $seq, callable $by): mixed {
    $iterator = namespace\to_iterator($seq);
    $iterator->rewind();
    if (!$iterator->valid()) {
        throw new \ValueError("parameter \$seq to reduce... was empty");
    }

    $into = $iterator->current();
    for ($iterator->next(); $iterator->valid(); $iterator->next()) {
        $into = $by($into, $iterator->current());
    }
    return $into;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fold, I think having the callback third would be hard to remember when it's second in other reduce() function

The inner implementation seem reasonable enough. I assume the ArrayObject is just for illustrating the behavior and not the internal implementation

$seq seems like an harder to remember naming choice compared to $array/$iterable used elsewhere - https://www.php.net/manual/en/function.iterator-apply.php and https://www.php.net/manual/en/function.array-walk.php - especially for non-english speakers

PHP's already using $initial for https://www.php.net/manual/en/function.array-reduce.php and I don't see a strong reason to switch to a different name - initial's been used elsewhere (e.g. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Reduce)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner implementation seem reasonable enough. I assume the ArrayObject is just for illustrating the behavior and not the internal implementation

The ArrayIterator is just for showing behavior, yes. Notably, this will not pass NULL as the first parameter to the callback on the very first time it is called, unlike array_reduce and what this PR currently does.

As an example with the data set [1, 3, 5, 7]:

$result = reduce([1, 3, 5, 7], function ($into, $value) {
    $retval = $into + $value;
    echo "fn ({$into}, {$value}) => {$retval}\n";
    return $retval;
});

This will print:

fn (1, 3) => 4
fn (4, 5) => 9
fn (9, 7) => 16

And not:

fn(, 1) => 1
fn (1, 3) => 4
fn (4, 5) => 9
fn (9, 7) => 16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the changes to reduce on this branch: https://github.com/morrisonlevi/php-src/tree/levi/any-all-iterable-checks. For some reason it wouldn't let me select your fork as the merge-base, so I didn't open a PR, but you can look at the last two commits. I did not yet add fold.


function find(iterable $iterable, callable $callback, mixed $default = null): mixed {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this one? I'd rather add filter, map, and flatmap if we need more functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find() is also useful and many other programming languages include both (js, haskell, etc.)

For example, if you have an array of a million elements and only want the first match, it is much more efficient to call find() if the iterable contains a matching value (and there would be less service calls and db calls) compared to reset(array_filter(...))

Additionally, filter() and map() would be waiting on the existence of CachedIterable when it starts, because Traversables can have repeated keys yield 'key' => 1; yield 'key' => 2;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, filter() and map() would be waiting on the existence of CachedIterable when it starts, because Traversables can have repeated keys yield 'key' => 1; yield 'key' => 2;

This does not require a CachedIterable; they can return any iterator.

Edit: on second thought, they should not return a cached iterable. These routines are often chained together; if every piece of the chain caches their results, it will balloon memory usage.

Copy link
Contributor Author

@TysonAndre TysonAndre Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that there's nothing in the standard library yet that supports rewinding, counting, and especially arbitrary offset access

It seems much more difficult to use without the support for rewindability, random/repeated offset access, countability, etc.

But yes, I suppose you could hide the implementation entirely with InternalIterator and only support a single iteration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: on second thought, they should not return a cached iterable. These routines are often chained together; if every piece of the chain caches their results, it will balloon memory usage.

CachedIterable is basically the name chosen for a rewindable immutable key-value sequence. It isn't cached permanently, it has a regular object lifetime. I'm referring to https://wiki.php.net/rfc/cachediterable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CachedIterable is basically the name chosen for a rewindable immutable key-value sequence. It isn't cached permanently, it has a regular object lifetime. I'm referring to https://wiki.php.net/rfc/cachediterable

Yes, I am specifically saying they should not return this object. It will ballon memory usage to do it that way. Think about it; you have a map + filter plus some terminator like first_n with n=100. Filter and map will each hold at least 100 values in memory that shouldn't be there while calculating the first_n.

Copy link
Contributor Author

@TysonAndre TysonAndre Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am specifically saying they should not return this object. It will ballon memory usage to do it that way. Think about it; you have a map + filter plus some terminator like first_n with n=100. Filter and map will each hold at least 100 values in memory that shouldn't be there while calculating the first_n.

  1. Many end users would want the standard library to return something that could be iterated over multiple times - this was a source of bugs in spl classes such as https://www.php.net/arrayobject prior to them being switched to iterables
$foo = map(...);
foreach ($foo as $i => $v1) {
    foreach ($foo as $i => $v2) {
        if (some_pair_predicate($v1, $v2)) {
            // do something
        }
    }
}
  1. Library/application authors that are interested in lazy generators could use or implement something such as https://github.com/nikic/iter instead - my opinion is that the standard library should provide something that is easy to understand, debug, serialize or represent, etc.
  2. It would be harder to understand why SomeFrameworkException is thrown in code unrelated to that framework when it is passed to some function that accepts a generic iterable, and harder to write correct exception handling for it if done in a lazy generation style
  3. It is possible to implement a lazy version of CachedIterable that only loads values as needed - but I hadn't proposed it due to doubts that 2/3 of voters would consider it widely useful enough to be included in php rather than as a userland or PECL library

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterables are made from arrays and generators. Since there is already a large body of functions which work with arrays, it seems reasonable to assume they would use an Iterable part of the standard library because they also use iterators/generators. One of the main points of generators is to reduce memory. We shouldn't add a corpus of iterable functions that will increase memory; that works directly against the goals of the iterable feature!

No, it's much better to compose something if you want it to be eager, e.g. $foo = CachedIterable::new(map(...$args)); We can include this in the examples in the docs for filter, map, etc.

Copy link
Contributor Author

@TysonAndre TysonAndre Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterables are made from arrays and generators

Iterables are made from arrays and Traversable objects such as SplObjectStorage, ArrayObject, user-defined data structures, and Generators

We shouldn't add a corpus of iterable functions that will increase memory; that works directly against the goals of the iterable feature!

In common use cases, the memory increase may be small, especially if the lifetime of the variable or temporary expression holding the result is short.

Additionally, a lazy iterable would require keeping a reference to the previous (possibly lazy) iterable, and the iterables/arrays those reference - if the initial iterable is larger than the result then eagerly evaluating $var = map($cb, temporary()) would end up saving memory

No, it's much better to compose something if you want it to be eager, e.g. $foo = CachedIterable::new(map(...$args)); We can include this in the examples in the docs for filter, map, etc.

That seems error prone and I'm personally opposed to that.

PHP has typically been imperitive rather than functional, and focused on "cater[ing] to the skill-levels and platforms of a wide range of users" as RFC authors are repeatedly reminded in https://wiki.php.net/rfc/template and my interpretation of that is that imperitive would be much more acceptable (aside: The loosely typed language part seems less applicable nowadays)

PHP is and should remain:

  1. a pragmatic web-focused language
  2. a loosely typed language
  3. a language which caters to the skill-levels and platforms of a wide range of users

Lazy data structures would be easy to misuse (consume twice, attempt to serialize or encode, (or var_dump or inspect with Xdebug), easier to attempt to log the full iterable(consume twice) etc) without (or even with) linters and static analyzers, so this really doesn't seem like catering to a wide range of users.


Explicitly using a different family of functions to act on generators internally would probably make more sense than being the default, e.g. https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#findFirst-- and https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html (Streams are separate from java.util.Collection in java, javascript eagerly evaluates https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Map, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's much better to compose something if you want it to be eager, e.g. $foo = CachedIterable::new(map(...$args)); We can include this in the examples in the docs for filter, map, etc.

Having the eager version be longer than the lazy version (instead of shorter or the same length) would also encourage the use of the lazy version, which I'd objected to for being error prone and easy to misuse.


}
33 changes: 32 additions & 1 deletion ext/standard/basic_functions_arginfo.h
@@ -1,5 +1,5 @@
/* This is a generated file, edit the .stub.php file instead.
* Stub hash: 088e107c889f44d0a9762c47fce668f4d82f28ba */
* Stub hash: f57589852a699be9edd08f434073007b9fef4e34 */

ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_set_time_limit, 0, 1, _IS_BOOL, 0)
ZEND_ARG_TYPE_INFO(0, seconds, IS_LONG, 0)
Expand Down Expand Up @@ -2218,6 +2218,27 @@ ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_sapi_windows_generate_ctrl_event
ZEND_END_ARG_INFO()
#endif

ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_iterable_any, 0, 1, _IS_BOOL, 0)
ZEND_ARG_TYPE_INFO(0, iterable, IS_ITERABLE, 0)
ZEND_ARG_TYPE_INFO_WITH_DEFAULT_VALUE(0, callback, IS_CALLABLE, 1, "null")
ZEND_END_ARG_INFO()

#define arginfo_iterable_all arginfo_iterable_any

#define arginfo_iterable_none arginfo_iterable_any

ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_iterable_reduce, 0, 2, IS_MIXED, 0)
ZEND_ARG_TYPE_INFO(0, iterable, IS_ITERABLE, 0)
ZEND_ARG_TYPE_INFO(0, callback, IS_CALLABLE, 0)
ZEND_ARG_TYPE_INFO_WITH_DEFAULT_VALUE(0, initial, IS_MIXED, 0, "null")
ZEND_END_ARG_INFO()

ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_iterable_find, 0, 2, IS_MIXED, 0)
ZEND_ARG_TYPE_INFO(0, iterable, IS_ITERABLE, 0)
ZEND_ARG_TYPE_INFO(0, callback, IS_CALLABLE, 0)
ZEND_ARG_TYPE_INFO_WITH_DEFAULT_VALUE(0, default, IS_MIXED, 0, "null")
ZEND_END_ARG_INFO()


ZEND_FUNCTION(set_time_limit);
ZEND_FUNCTION(header_register_callback);
Expand Down Expand Up @@ -2839,6 +2860,11 @@ ZEND_FUNCTION(sapi_windows_set_ctrl_handler);
#if defined(PHP_WIN32)
ZEND_FUNCTION(sapi_windows_generate_ctrl_event);
#endif
ZEND_FUNCTION(any);
ZEND_FUNCTION(all);
ZEND_FUNCTION(none);
ZEND_FUNCTION(reduce);
ZEND_FUNCTION(find);


static const zend_function_entry ext_functions[] = {
Expand Down Expand Up @@ -3492,6 +3518,11 @@ static const zend_function_entry ext_functions[] = {
#if defined(PHP_WIN32)
ZEND_FE(sapi_windows_generate_ctrl_event, arginfo_sapi_windows_generate_ctrl_event)
#endif
ZEND_NS_FE("iterable", any, arginfo_iterable_any)
ZEND_NS_FE("iterable", all, arginfo_iterable_all)
ZEND_NS_FE("iterable", none, arginfo_iterable_none)
ZEND_NS_FE("iterable", reduce, arginfo_iterable_reduce)
ZEND_NS_FE("iterable", find, arginfo_iterable_find)
ZEND_FE_END
};

Expand Down
68 changes: 68 additions & 0 deletions ext/standard/tests/iterable/all_array.phpt
@@ -0,0 +1,68 @@
--TEST--
Test all() function
--FILE--
<?php

use function iterable\all;

/*
Prototype: bool all(array $array, mixed $callback);
Description: Iterate array and stop based on return value of callback
*/

function is_int_ex($item)
{
return is_int($item);
}

echo "\n*** Testing not enough or wrong arguments ***\n";

function dump_all(...$args) {
try {
var_dump(all(...$args));
} catch (Error $e) {
printf("Caught %s: %s\n", $e::class, $e->getMessage());
}
}

dump_all();
dump_all(true);
dump_all([]);
dump_all(true, function () {});
dump_all([], true);

echo "\n*** Testing basic functionality ***\n";

dump_all([1, 2, 3], 'is_int_ex');
dump_all(['hello', 1, 2, 3], 'is_int_ex');
$iterations = 0;
dump_all(['hello', 1, 2, 3], function($item) use (&$iterations) {
++$iterations;
return is_int($item);
});
var_dump($iterations);

echo "\n*** Testing edge cases ***\n";

dump_all([], 'is_int_ex');

echo "\nDone";
?>
--EXPECT--
*** Testing not enough or wrong arguments ***
Caught ArgumentCountError: iterable\all() expects at least 1 argument, 0 given
Caught TypeError: iterable\all(): Argument #1 ($iterable) must be of type iterable, bool given
bool(true)
Caught TypeError: iterable\all(): Argument #1 ($iterable) must be of type iterable, bool given
Caught TypeError: iterable\all(): Argument #2 ($callback) must be a valid callback or null, no array or string given

*** Testing basic functionality ***
bool(true)
bool(false)
bool(false)
int(1)

*** Testing edge cases ***
bool(true)

Done
63 changes: 63 additions & 0 deletions ext/standard/tests/iterable/all_traversable.phpt
@@ -0,0 +1,63 @@
--TEST--
Test all() function
--FILE--
<?php

use function iterable\all;

/*
Prototype: bool all(array $array, ?callable $callback = null, int $use_type = 0);
Description: Iterate array and stop based on return value of callback
*/

function is_int_ex($item)
{
return is_int($item);
}

function dump_all(...$args) {
try {
var_dump(all(...$args));
} catch (Error $e) {
printf("Caught %s: %s\n", $e::class, $e->getMessage());
}
}


echo "\n*** Testing not enough or wrong arguments ***\n";

dump_all(new ArrayIterator());
dump_all(new ArrayIterator(), true);

echo "\n*** Testing basic functionality ***\n";

dump_all(new ArrayIterator([1, 2, 3]), 'is_int_ex');
dump_all(new ArrayIterator(['hello', 1, 2, 3]), 'is_int_ex');
$iterations = 0;
dump_all(new ArrayIterator(['hello', 1, 2, 3]), function($item) use (&$iterations) {
++$iterations;
return is_int($item);
});
var_dump($iterations);

echo "\n*** Testing edge cases ***\n";

dump_all(new ArrayIterator(), 'is_int_ex');

echo "\nDone";
?>
--EXPECT--
*** Testing not enough or wrong arguments ***
bool(true)
Caught TypeError: iterable\all(): Argument #2 ($callback) must be a valid callback or null, no array or string given

*** Testing basic functionality ***
bool(true)
bool(false)
bool(false)
int(1)

*** Testing edge cases ***
bool(true)

Done
81 changes: 81 additions & 0 deletions ext/standard/tests/iterable/any_array.phpt
@@ -0,0 +1,81 @@
--TEST--
Test any() function
--FILE--
<?php

use function iterable\any;

/*
Prototype: bool any(array $iterable, mixed $callback);
Description: Iterate array and stop based on return value of callback
*/

function is_int_ex($nr)
{
return is_int($nr);
}

echo "\n*** Testing not enough or wrong arguments ***\n";

function dump_any(...$args) {
try {
var_dump(any(...$args));
} catch (Error $e) {
printf("Caught %s: %s\n", $e::class, $e->getMessage());
}
}

dump_any();
dump_any(true);
dump_any([]);
dump_any(true, function () {});
dump_any([], true);

echo "\n*** Testing basic functionality ***\n";

dump_any(['hello', 'world'], 'is_int_ex');
dump_any(['hello', 1, 2, 3], 'is_int_ex');
$iterations = 0;
dump_any(['hello', 1, 2, 3], function($item) use (&$iterations) {
++$iterations;
return is_int($item);
});
var_dump($iterations);

echo "\n*** Testing second argument to predicate ***\n";

dump_any([1, 2, 3], function($item, $key) {
var_dump($key);
return false;
});

echo "\n*** Testing edge cases ***\n";

dump_any([], 'is_int_ex');

dump_any(['key' => 'x'], null);

echo "\nDone";
?>
--EXPECT--
*** Testing not enough or wrong arguments ***
Caught ArgumentCountError: iterable\any() expects at least 1 argument, 0 given
Caught TypeError: iterable\any(): Argument #1 ($iterable) must be of type iterable, bool given
bool(false)
Caught TypeError: iterable\any(): Argument #1 ($iterable) must be of type iterable, bool given
Caught TypeError: iterable\any(): Argument #2 ($callback) must be a valid callback or null, no array or string given

*** Testing basic functionality ***
bool(false)
bool(true)
bool(true)
int(2)

*** Testing second argument to predicate ***
Caught ArgumentCountError: Too few arguments to function {closure}(), 1 passed and exactly 2 expected

*** Testing edge cases ***
bool(false)
bool(true)

Done