Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argcross and cross #78

Closed
jpivarski opened this issue Jan 14, 2020 · 5 comments · Fixed by #159
Closed

argcross and cross #78

jpivarski opened this issue Jan 14, 2020 · 5 comments · Fixed by #159
Assignees
Labels
feature New feature or request

Comments

@jpivarski
Copy link
Member

jpivarski commented Jan 14, 2020

Implementing argcross in C++ is sufficient; we can do cross in Python.

This computes a Cartesian product per element. Given a list (std::vector) of arrays (std::shared_ptr<Content>), it makes a ListOffsetArray of RecordArray of combinations. For example,

>>> import awkward
>>> first = awkward.fromiter([[1, 2, 3], [], [4, 5]])
>>> second = awkward.fromiter([["a", "b"], ["c"], ["d", "e", "f"]])
>>> first.cross(second)
<JaggedArray [[(1, a) (1, b) (2, a) (2, b) (3, a) (3, b)]
              []
              [(4, d) (4, e) (4, f) (5, d) (5, e) (5, f)]]>

This and choose (issue #79) are the two basic generators of particle combinatorics in HEP analyses.

@jpivarski jpivarski added the feature New feature or request label Jan 14, 2020
@nsmith-
Copy link
Contributor

nsmith- commented Jan 17, 2020

If I recall, we were going to keep such operations as module methods like ak.cross(array, array) rather than as member functions to minimize namespace clashes? Alternatively we could spend some time finding a synonym for cross.

@jpivarski
Copy link
Member Author

That's right, at the ak.Array level, these are not methods but are in the global namespace like ak.cross. But for the layout objects (not the data-analyst level), they're methods for convenience. In C++, it's convenient to make them virtual methods because the compiler ensures that we have a method defined (though it might only be a runtime_error("FIXME") stub).

@nsmith-
Copy link
Contributor

nsmith- commented Jan 29, 2020

Here's an interesting observation: cross is a composition of broadcasting and zip.
Consider the following numpy arrays:

a = numpy.arange(24).reshape(6, 4)
b = numpy.array(['a', 'b'] * 6).reshape(6, 2)

Then the awkward0 operation out = a.cross(b) is equivalent to:

i0, i1 = numpy.broadcast_arrays(a[:, :, None], b[:, None, :])
rectype = [('i0', i0.dtype), ('i1', i1.dtype)]
out = numpy.fromiter(zip(i0.flatten(), i1.flatten()), dtype=rectype).reshape(6, 2*4)

The ak.zip would simplify the last line.
For nested cross, out = a.cross(b, nested=True) is even simpler:

i0 = a
i1 = numpy.broadcast_to(b[:, None, :], shape=(6, 4, 2))

In fact, with the current awkward1 master, nested cross is effectively implemented:

a = ak.Array([[0, 1, 2], [3, 4], [5], []])
b = ak.Array([[1], [2, 3], [4], [5, 6, 7]])
a[:, :, None] < b[:, None, :]

is the same as awkward0's

a = ak0.fromiter([[0, 1, 2], [3, 4], [5], []])
b = ak0.fromiter([[1], [2, 3], [4], [5, 6, 7]])
ab = a.cross(b, nested=True)
ab.i0 < ab.i1

Non-nested cross will need ak.flatten.
Notice that in fact the left-broadcasting on var works nicely here, e.g. the following are equivalent:

a[:, :, None] < b[:, None, :]
a < b[:, None]

and a simple switch of where we insert newaxis gives us b.cross(a, nested=True):

a[:, None] < b

@jpivarski
Copy link
Member Author

ak.zip is #77, which is related to #156 (so much so that I just labeled them duplicates). Just cross-referencing here because this uses zip.

@jpivarski
Copy link
Member Author

@nsmith- I wasn't planning to, but in the end, I used your method to implement cross, with minor modifications (more low-level functions than __getitem__).

Thanks! Your observation really helped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants