Implement new array function array_column() #257

Closed
wants to merge 14 commits into
from

10 participants

@ramsey

This pull request supersedes pull request #56. I have cleaned it up and have rebased branch PHP-5.3 onto my branch.

This pull request also includes new work as a result of feedback received on the original pull request and mailing list discussion.

References:

@sc0ttkclark

Ready for this!

@ramsey

Thanks to the push from @lstrojny, I've opened up voting for this:
http://news.php.net/php.internals/64870

@asgrim

I don't think there is need for the alias is there? Surely aliases are just for backwards compatibility? Apologies if that was already discussed on the other PR, on my phone with slow net. Apart from that, this looks useful! 👍

@fititnt

👍

@lstrojny lstrojny commented on the diff Jan 19, 2013
ext/standard/array.c
+ Return the values from a single column in the input array, identified by the
+ value_key and optionally indexed by the index_key */
+PHP_FUNCTION(array_column)
+{
+ zval *zarray, *zcolumn, *zkey = NULL, **data, **zcolval, **zkeyval;
+ HashTable *arr_hash;
+ HashPosition pointer;
+ ulong column_idx = 0, key_idx = 0, keyval_idx = 0;
+ char *column = NULL, *key = NULL, *keyval = NULL;
+ int column_len = 0, key_len = 0;
+
+ if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "az|z", &zarray, &zcolumn, &zkey) == FAILURE) {
+ return;
+ }
+
+ switch (Z_TYPE_P(zcolumn)) {
@lstrojny
lstrojny added a line comment Jan 19, 2013

What about double and resource types? array_column() could handle them as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@lstrojny lstrojny commented on the diff Jan 19, 2013
ext/standard/array.c
+ break;
+ default:
+ php_error_docref(NULL TSRMLS_CC, E_WARNING, "The index key should be either a string or an integer");
+ RETURN_FALSE;
+ }
+ }
+
+ arr_hash = Z_ARRVAL_P(zarray);
+ array_init(return_value);
+
+ for (zend_hash_internal_pointer_reset_ex(arr_hash, &pointer);
+ zend_hash_get_current_data_ex(arr_hash, (void**)&data, &pointer) == SUCCESS;
+ zend_hash_move_forward_ex(arr_hash, &pointer)) {
+
+ if (Z_TYPE_PP(data) == IS_ARRAY) {
+ if (column && zend_hash_find(Z_ARRVAL_PP(data), column, column_len + 1, (void**)&zcolval) == FAILURE) {
@lstrojny
lstrojny added a line comment Jan 19, 2013

You could pre compute the hash once and use zend_hash_quick_find() with the pre computed hash key. I’m not sure it makes that much of a difference, but it would makes sense to check it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@oaass

Really looking foreward to this! (Y)

@ccampbell

@ramsey I still think it would be useful if there was a way to get multiple columns back with a single call by passing an array like we had talked about. Is there any plan for this?

Example

$records = array(
    array(
        'id' => 2135,
        'first_name' => 'John',
        'last_name' => 'Doe'
    ),
    array(
        'id' => 3245,
        'first_name' => 'Sally',
        'last_name' => 'Smith'
    ),
    array(
        'id' => 5342,
        'first_name' => 'Jane',
        'last_name' => 'Jones'
    )
);

$results = array_column($records, ['id', 'first_name']);
print_r($results);

would output

Array
(
    [id] => Array
        (
            [0] => 2135
            [1] => 3245
            [2] => 5342
        )

    [first_name] => Array
        (
            [0] => John
            [1] => Sally
            [2] => Jane
        )

)

If the third argument was specified it would work like this:

$results = array_column($records, ['first_name', 'last_name'], 'id');
print_r($results);

would return

Array
(
    [first_name] => Array
        (
            [2135] => John
            [3245] => Sally
            [5342] => Jane
        )

    [last_name] => Array
        (
            [2135] => Doe
            [3245] => Smith
            [5342] => Jones
        )

)

It doesn't change any of the existing functionality, and it would save having to make multiple calls to array_column in order to extract multiple columns.

@hakre

@ccampbell But how would you solve to decide which fashion to return array keys? Your example for example swaps the 2D axes. From a "straight forward" point of view, I'd say this must not be part of the implementation and the following kind of output looks more straight forward to me (no preference given):

$results = array_column($records, ['id', 'first_name']);
print_r($results);

Array
(
    [0] => Array
        (
            [id] => 2135
            [first_name] => John
        )

    [1] => Array
        (
            [id] => 3245
            [first_name] => Sally
        )

    [2] => Array
        (
            [id] => 5342
            [first_name] => Jane
        )

)

And the second example:

$results = array_column($records, ['id', 'first_name'], 'id');
print_r($results);

Array
(
    [2135] => Array
        (
            [id] => 2135
            [first_name] => John
        )

    [3245] => Array
        (
            [id] => 3245
            [first_name] => Sally
        )

    [5342] => Array
        (
            [id] => 5342
            [first_name] => Jane
        )

)

This output btw. is compatiable with the existing one, meaning, you could applay array_column a second time.

@ccampbell

@hakre your proposed output doesn't achieve the same thing as the purpose of this function though. It is just filtering out columns from the data set.

It doesn't make sense to me that if you pass in just 'id' you would get an array of ids that you can iterate over directly as ids (foreach ($ids as $id)), but if you pass in ['id', 'first_name'] you can't iterate over first names any differently than you could with the data set you started with. If you only wanted id and first_name like your first example, why not only select those columns from the database and save yourself the call completely?

Perhaps a use case would make more sense. The primary use case I think for getting back the data the way I proposed is for heavy traffic applications where you want to make data more cacheable at the database and sort your dataset in php. We do this at @vimeo pretty heavily and I'm pretty sure other people do as well. For example if you wanted to sort your user ids by last name you could do

$results = array_column($records, ['first_name', 'last_name'], 'id');
natsort($results['last_name']);
$ids = array_keys($ids);

Now you easily have all your user ids sorted alphabetically by last name.

What if now you wanted to sort by last name but when people have the same last name we sort by first name.

It would look something like

$results = array_column($records, ['first_name', 'last_name'], 'id');
array_multisort($results['last_name'], SORT_ASC, $results['first_name'], SORT_ASC, $results);

In your format to do this you would be right back where you started and would have to build the arrays of first_names and last_names manually.

Check out the documentation for http://php.net/manual/en/function.array-multisort.php. The expected format of the data is basically the same way that I proposed because it is the most efficient way to do sorting.

@hakre

@ccampbell: I did not propose any output. I just wrote that one might equally expect something different. When there is more than one dimension in the output, there is more than one way to arrange it. That's all.

@Abeja

Why "index_key" can not be an array?
It can be useful to build tree indexes.

Examples:

$records = array(
    array( 'id' => 1 , 'parent-id' => 7 , 'name' => 'Doe'),
    array( 'id' => 2 , 'parent-id' => 3 , 'name' => 'Smith'),
    array( 'id' => 4 , 'parent-id' => 7 , 'name' => 'Jones'),
);

array_column( $records , 'name' , array( 'parent-id' , 'id' ) )
// => array( 7 => array( 1 => 'Doe' , 4 => 'Jones' ), 3 => array( 2 => 'Smith' ) )

Result is collections of childs by 'parent-id' key where 'id' is subkey and 'name' is value.

More examples: #56 (comment)

P.S. It also compatible with current array_column and proposed higher

$records = array(
    array( 'id' => 1 , 'parent-id' => 7 , 'name' => 'Doe' , 'sex' => 'male' ),
    array( 'id' => 2 , 'parent-id' => 3 , 'name' => 'Smith' , 'sex' => 'female' ),
    array( 'id' => 4 , 'parent-id' => 7 , 'name' => 'Jones' , 'sex' => 'male' ),
);

array_column( $records , array( 'name' , 'sex' ) , array( 'parent-id' , 'id' ) )
/* => 
array(
    7 => array( 
        1 => array( 'Doe' , 'male' ) , 
        4 => array( 'Jones' , 'male' ) 
    ), 
    3 => array( 
        2 => array( 'Smith' , 'female' ) 
    ) 
)
*/

For myself, I call this array_collect
@dsp
php.net member

merged. thank you.

@dsp dsp closed this Mar 20, 2013
@wilmoore

Nice...Would be great if this function would check whether the value is "callable" and if so, call it and use the returned value.

@ramsey

@wilmoore Which parameter are you interested in having check whether the value is callable? All of them?

@ccampbell I've been considering your request, but I find myself agreeing with @hakre on the expected output of array_column() if the second parameter is an array. It also lines up perfectly with this bug request: https://bugs.php.net/bug.php?id=64493. In this bug request, the request is to allow NULL to be passed as the second parameter, which would return an array that is nearly identical to the input array but could index each row by the third parameter.

Thoughts?

@ccampbell

@ramsey I agree with you that @hakre's comment and that bug definitely fit together better. I do think most people would probably expect that output more than the one I proposed even though that functionality is already possible using array_intersect_key($data, array_flip($desired_columns)).

What I am trying to achieve is not having to make multiple calls to array_column which that solution does not solve.

Starting with

$records = array(
    array(
        'id' => 2135,
        'first_name' => 'John',
        'last_name' => 'Doe'
    ),
    array(
        'id' => 3245,
        'first_name' => 'Sally',
        'last_name' => 'Smith'
    ),
    array(
        'id' => 5342,
        'first_name' => 'Jane',
        'last_name' => 'Jones'
    )
);

The only way to get a list of ids [2135, 3245, 5342] and first_names ['John', 'Sally', 'Jane'] would be to call

$ids = array_column($records, 'id');
$names = array_columns($records, 'first_name');

Calling

$data = array_column($record, ['id', 'first_name']);

Would just filter out the last_name column and not actually pluck out the single columns I am looking for. It's not the end of the world. I bet more people are trying to filter out columns from the data set than trying to grab single column lists. Maybe array_multisort is the actual problem for expecting the data in that format.

@hakre

@ccampbell: Good point with array_intersect_key($data, array_flip($desired_columns)) and also the array_multisort reference earlier - I like array_multisort. My initial suggestion looks now a bit short-minded.

@wilmoore

Which parameter are you interested in having check whether the value is callable? All of them?

I was thinking, any callable returned values might be unwrapped regardless of which parameters were given. Further thought brings me to:

What to do when you want the raw function/callable (for whatever reason)? At that point, you'd have to add yet another optional parameter to satisfy that need. That could get out of hand quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment