Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zip #77

Closed
jpivarski opened this issue Jan 14, 2020 · 4 comments
Closed

zip #77

jpivarski opened this issue Jan 14, 2020 · 4 comments
Labels
feature New feature or request good first issue Good for newcomers

Comments

@jpivarski
Copy link
Member

This is an operation that combines a list (std::vector) of arrays (std::shared_ptr<Content>) into a RecordArray if any member of that list is one-dimensional, and a ListArray or RegularArray of RecordArray for each dimension that all arrays share.

It existed in old Awkward, though the implementation will be easier now. Here's what that looked like:

>>> import awkward
>>> first = awkward.fromiter([[1, 2, 3], [], [4, 5]])
>>> second = awkward.fromiter([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
>>> awkward.JaggedArray.zip(first, second)
<JaggedArray [[(1, 1.1) (2, 2.2) (3, 3.3)] [] [(4, 4.4) (5, 5.5)]] at 0x7fb20a91c590>

Those 2-tuples are a RecordArray in which the first field is from first and the second field is from second. Since first and second are both ListType with the same subarray lists for each element, the output is a ListType of RecordType (e.g. ListOffsetArray of RecordArray), in which the shared offsets are used in the single ListOffsetArray that wraps the RecordArray.

Perhaps this was too implicit in the past—perhaps there should be a zipdepth parameter that explicitly controls how many levels of list nesting to expect to be identical for all arguments (throwing an error if they're not).

zip is the way to make particle records from CMS NanoAOD, for instance. (It's useful.)

@jpivarski
Copy link
Member Author

Note: it's not just tuples.

>>> awkward.JaggedArray.zip({"x": first, "y": second}).tolist()
[[{'x': 1, 'y': 1.1}, {'x': 2, 'y': 2.2}, {'x': 3, 'y': 3.3}],
 [],
 [{'x': 4, 'y': 4.4}, {'x': 5, 'y': 5.5}]]

@jpivarski
Copy link
Member Author

The ak.zip([array1, array2, array3]) and ak.zip({"key1": array1, "key2": array2, "key3": array3}) functions should probably take a depthlimit parameter to indicate how many levels of jaggedness to try to combine.

depthlimit=0 would be a synonym for the RecordArray constructor, which creates top-level fields with whatever jaggedness the fields have unreconciled. depthlimit=None should go all the way down, broadcasting if possible. depth=1 would broadcast the first levels together, resulting in a ListArray(RecordArray(...)), depth=2 would produce ListArray(ListArray(RecordArray(...))), etc.

@jpivarski
Copy link
Member Author

I had intended to create a PR for this, but accidentally committed to master (too distracted). So instead of linking to a PR, here's the one commit that finished it: 3800552

@jpivarski
Copy link
Member Author

The test passed, so I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants