Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: support awkward v1 and v2 together #226

Merged
merged 16 commits into from
Sep 4, 2022

Conversation

Saransh-cpp
Copy link
Member

@Saransh-cpp Saransh-cpp commented Jul 20, 2022

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Jul 20, 2022

Codecov Report

Merging #226 (2f737b3) into main (cac88a2) will increase coverage by 0.00%.
The diff coverage is 88.88%.

@@           Coverage Diff           @@
##             main     #226   +/-   ##
=======================================
  Coverage   82.63%   82.64%           
=======================================
  Files          96       96           
  Lines       10524    10525    +1     
=======================================
+ Hits         8697     8698    +1     
  Misses       1827     1827           
Impacted Files Coverage Δ
src/vector/backends/awkward_constructors.py 47.59% <85.71%> (+0.25%) ⬆️
src/vector/backends/awkward.py 75.89% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cac88a2...2f737b3. Read the comment docs.

@Saransh-cpp Saransh-cpp marked this pull request as draft July 20, 2022 07:56
@Saransh-cpp Saransh-cpp marked this pull request as ready for review July 21, 2022 04:06
@henryiii
Copy link
Member

How hard would it be to make it work with both? Caps are highly problematic. If someone has the following file:

awkward>=2
vector

Modern pip and other modern solvers will just pick the last version of Vector without the cap (& custom error) and awkward 2 - that's a logically valid solution.

@jpivarski
Copy link
Member

Well, in a major version update, you're allowed to change the public API, and a few things get renamed that would be visible here. I've found a couple by searching with grep, but there's nothing like actually doing tests.

array_type = array_type.type

The .type of nestable Types (all but ArrayType) becomes .content for consistency with Contents and Forms.

field_type = field_type.type

There it is again.

This string, badly named .dtype in v1, is now .primitive (one of our list of primitive types).

if isinstance(awkwardtype, ak._connect._numba.layout.NumpyArrayType):
return awkwardtype.arraytype
elif isinstance(awkwardtype, ak._connect._numba.layout.IndexedArrayType):

In v2, _connect still has an underscore, but submodules within it, like numba, do not. The underscore is a gateway from public to private. And actually, these two types are technically private, though I don't see a better way of writing this code in Vector and these names will not be changing.

So, after reviewing the Vector code, it has been written with enough decoupling from Awkward that it's possible to make it work for both v1 and v2. Testing that will be tricky, since all references to "awkward" and names derived from it, like "ak", will need to be swapped for "awkward._v2" to do a test. That swap must be 100% consistent—nothing left unswitched, otherwise it won't be a complete test.

@Saransh-cpp
Copy link
Member Author

Saransh-cpp commented Jul 21, 2022

Thanks for the review, @henryiii, and @jpivarski!

Modern pip and other modern solvers will just pick the last version of Vector without the cap (& custom error) and awkward 2 - that's a logically valid solution.

Oh, I had no idea about this behavior.

I have made the required changes, and now vector should support both awkward v1 and v2.

Testing that will be tricky, since all references to "awkward" and names derived from it, like "ak", will need to be swapped for "awkward._v2" to do a test

Yes, the tests are tricky. Still searching for ways to test both awkward versions, but no luck so far.

@Saransh-cpp Saransh-cpp changed the title Upper cap awkward to v2 chore: support awkward v1 and v2 together Jul 21, 2022
@henryiii
Copy link
Member

Tricky part is loading it - will have to check with PyTest - otherwise we should be able to just monkeypatch it into sys.modules. At the worst, it could be a envvar setting when you run the tests.

@Saransh-cpp
Copy link
Member Author

Saransh-cpp commented Jul 22, 2022

I tried using monkeypatch in different ways, but none of them worked perfectly.

  1. I tried editing sys.modules but the edits are not affecting the imports at all. I am not sure how to tell Python to use the value associated with awkward._v2 in sys.modules instead of awkward.

  2. I also tried using it with pytest (https://docs.pytest.org/en/7.1.x/how-to/monkeypatch.html). The workaround I was thinking of was importing awkward by declaring a global variable first and then monkeypatching it from the tests, but the tests fail with multiple errors. These errors are not related to the v1 -> v2 switch, but I think because of ambiguous variable (and global variable), module, and dependency name (all awkward).-

backends/awkward.py (similar for the constructor file, I had to declare the variable globally there too for pytest monkeypatching to work)

global awkward
awkward = __import__('awkward', globals())

test_awkward.py (similar for the constructor tests)

def test_basic(monkeypatch):
    monkeypatch.setattr("vector.backends.awkward.awkward", awkward._v2, raising=True)
    monkeypatch.setattr("vector.backends.awkward_constructors.awkward", awkward._v2, raising=True)

...

__init__.py

def _is_awkward_v2(obj: typing.Any) -> bool:
    return packaging.version.Version(
        importlib_metadata.version("awkward")
    ) >= packaging.version.Version("2") or "awkward._v2" in inspect.getmodule(obj).__name__

Using environment variable seems to work, and with @pytest.fixture or @pytest.mark.parametrize we will be able to run the tests twice, once for v1 and once for v2.

@henryiii
Copy link
Member

I'll push what I was thinking about, can be run with:

VECTOR_USE_AWKWARDV2=1 pytest

You can revert and continue with your proposal above, or continue on this path. But I thought this was the easiest thing to show what I was thinking about.

@Saransh-cpp
Copy link
Member Author

This solution looks much simple! Thanks, @henryiii!

I made some fixes after your commit, but the tests are still failing due to some reasons. I was able to pinpoint a couple of them -

  1. awkward.fields() and array.fields now return the exact object, not a copy (?) This list is currently modified in awkward_constructors._check_names(), but passing a copy in _check_names() works!
@@ -314,7 +315,7 @@ def Array(*args: typing.Any, **kwargs: typing.Any) -> typing.Any:

     if not _is_type_safe(array_type):
         raise TypeError("a coordinate must be of the type int or float")
-    fields = awkward.fields(akarray)
+    fields = awkward.fields(akarray).copy()

     is_momentum, dimension, names, arrays = _check_names(akarray, fields)
  1. tolist() on VectorArrays behaves weirdly. To check further, I went ahead and changed all the awkward imports to awkward._v2 imports inside vector and discovered this -
>>> array = vector.Array(
...     [
...         [{"x": 1, "y": 2, "z": 3, "wow": 99}],
...         [],
...         [{"x": 4, "y": 5, "z": 6, "wow": 123}],
...     ]
... )
>>> out = array.to_Vector2D()
>>> assert isinstance(out, vector.backends.awkward.VectorArray2D)
>>> out.tolist()
[<Array [{x: 1, y: 2, wow: 99}] type='1 * Vector2D[x: int64, y: int64, wow: ...'>, <Array [] type='0 * Vector2D[x: int64, y: int64, wow: int64]'>, <Array [{x: 4, y: 5, wow: 123}] type='1 * Vector2D[x: int64, y: int64, wow:...'>]
>>> assert [arr.tolist() for arr in out.tolist()] == [
...     [{"x": 1, "y": 2, "wow": 99}],
...     [],
...     [{"x": 4, "y": 5, "wow": 123}],
... ]

Changing tolist() lines to a list comprehension works when the tests are run manually, without using pytest -

@@ -65,7 +65,7 @@ def test_rotateZ():
     array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])
     out = array.rotateZ(1)
     assert isinstance(out, vector.backends.awkward.MomentumArray2D)
-    assert out.tolist() == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]
+    assert [arr.tolist() for arr in out.tolist()] == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]

     array = vector.Array(
         [[{"x": 1, "y": 0, "wow": 99}], [], [{"x": 2, "y": 1, "wow": 123}]]

Running this test manually -

Python 3.9.0 (tags/v3.9.0:9cf6752, Oct  5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import vector
>>> array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])
>>> out = array.rotateZ(1)
>>> assert isinstance(out, vector.backends.awkward.MomentumArray2D)
>>> assert [arr.tolist() for arr in out.tolist()] == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]

Running tests using pytest -ra (specifically pasting the error from test_rotateZ below, but all of the ValueError: cannot broadcast recordsin equal errors are never thrown when the same functions are run manually, given that the tolist() output is fixed in these functions) -

=============================================== short test summary info ================================================
FAILED tests/test_issues.py::test_issue_99 - AttributeError: no field named 'to_xyz'
FAILED tests/test_issues.py::test_issue_161 - numba.core.errors.TypingError: Failed in nopython mode pipeline (step: ...
FAILED tests/backends/test_awkward.py::test_rotateZ - ValueError: cannot broadcast recordsin equal
FAILED tests/backends/test_awkward.py::test_projection - ValueError: cannot broadcast recordsin equal
FAILED tests/backends/test_awkward.py::test_add - ValueError: cannot broadcast recordsin equal
FAILED tests/backends/test_awkward.py::test_ufuncs - TypeError: <lambda>() takes 1 positional argument but 4 were given
FAILED tests/backends/test_awkward.py::test_zip - ValueError: cannot broadcast recordsin equal
FAILED tests/backends/test_awkward_numba.py::test - numba.core.errors.TypingError: Failed in nopython mode pipeline (...
====================================== 8 failed, 2781 passed in 75.56s (0:01:15) =======================================

Full stacktrace -

Trace:
_____________________________________________________ test_rotateZ _____________________________________________________

    def test_rotateZ():
        array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])
        out = array.rotateZ(1)
        assert isinstance(out, vector.backends.awkward.MomentumArray2D)
>       assert [arr.tolist() for arr in out.tolist()] == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]

tests/backends/test_awkward.py:68:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/saransh/.local/lib/python3.8/site-packages/numpy/lib/mixins.py:21: in func
    return ufunc(self, other)
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/highlevel.py:2001: in __array_ufunc__
    return ak._v2._connect.numpy.array_ufunc(ufunc, method, inputs, kwargs)
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_connect/numpy.py:269: in array_ufunc
    out = ak._v2._broadcasting.broadcast_and_apply(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:738: in broadcast_and_apply
    out = apply_step(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:718: in apply_step
    return continuation()
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:483: in continuation
    outcontent = apply_step(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:718: in apply_step
    return continuation()
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:235: in continuation
    return apply_step(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:718: in apply_step
    return continuation()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def continuation():
        # Any EmptyArrays?
        if any(isinstance(x, EmptyArray) for x in inputs):
            nextinputs = [
                x.toNumpyArray(np.float64, nplike) if isinstance(x, EmptyArray) else x
                for x in inputs
            ]
            return apply_step(
                nplike,
                nextinputs,
                action,
                depth,
                copy.copy(depth_context),
                lateral_context,
                behavior,
                options,
            )

        # Any NumpyArrays with ndim != 1?
        elif any(isinstance(x, NumpyArray) and x.data.ndim != 1 for x in inputs):
            nextinputs = [
                x.toRegularArray() if isinstance(x, NumpyArray) else x for x in inputs
            ]
            return apply_step(
                nplike,
                nextinputs,
                action,
                depth,
                copy.copy(depth_context),
                lateral_context,
                behavior,
                options,
            )

        # Any IndexedArrays?
        elif any(isinstance(x, IndexedArray) for x in inputs):
            nextinputs = [
                x.project() if isinstance(x, IndexedArray) else x for x in inputs
            ]
            return apply_step(
                nplike,
                nextinputs,
                action,
                depth,
                copy.copy(depth_context),
                lateral_context,
                behavior,
                options,
            )

        # Any UnionArrays?
        elif any(isinstance(x, UnionArray) for x in inputs):
            if not nplike.known_data:
                numtags, length = [], None
                for x in inputs:
                    if isinstance(x, UnionArray):
                        numtags.append(len(x.contents))
                        if length is None:
                            length = x.tags.data.shape[0]
                assert length is not None

                all_combos = list(itertools.product(*[range(x) for x in numtags]))

                tags = nplike.empty(length, dtype=np.int8)
                index = nplike.empty(length, dtype=np.int64)
                numoutputs, outcontents = None, []
                for combo in all_combos:
                    nextinputs = []
                    i = 0
                    for x in inputs:
                        if isinstance(x, UnionArray):
                            nextinputs.append(x._contents[combo[i]])
                            i += 1
                        else:
                            nextinputs.append(x)

                    outcontents.append(
                        apply_step(
                            nplike,
                            nextinputs,
                            action,
                            depth,
                            copy.copy(depth_context),
                            lateral_context,
                            behavior,
                            options,
                        )
                    )
                    assert isinstance(outcontents[-1], tuple)
                    if numoutputs is None:
                        numoutputs = len(outcontents[-1])
                    else:
                        assert numoutputs == len(outcontents[-1])

                assert numoutputs is not None

            else:
                tagslist, numtags, length = [], [], None
                for x in inputs:
                    if isinstance(x, UnionArray):
                        tagslist.append(x.tags.raw(nplike))
                        numtags.append(len(x.contents))
                        if length is None:
                            length = tagslist[-1].shape[0]
                        elif length != tagslist[-1].shape[0]:
                            raise ValueError(
                                "cannot broadcast UnionArray of length {} "
                                "with UnionArray of length {}{}".format(
                                    length, tagslist[-1].shape[0], in_function(options)
                                )
                            )
                assert length is not None

                combos = nplike.stack(tagslist, axis=-1)

                all_combos = nplike.array(
                    list(itertools.product(*[range(x) for x in numtags])),
                    dtype=[(str(i), combos.dtype) for i in range(len(tagslist))],
                )

                combos = combos.view(
                    [(str(i), combos.dtype) for i in range(len(tagslist))]
                ).reshape(length)

                tags = nplike.empty(length, dtype=np.int8)
                index = nplike.empty(length, dtype=np.int64)
                numoutputs, outcontents = None, []
                for tag, combo in enumerate(all_combos):
                    mask = combos == combo
                    tags[mask] = tag
                    index[mask] = nplike.arange(
                        nplike.count_nonzero(mask), dtype=np.int64
                    )
                    nextinputs = []
                    i = 0
                    for x in inputs:
                        if isinstance(x, UnionArray):
                            nextinputs.append(x[mask].project(combo[str(i)]))
                            i += 1
                        elif isinstance(x, Content):
                            nextinputs.append(x[mask])
                        else:
                            nextinputs.append(x)
                    outcontents.append(
                        apply_step(
                            nplike,
                            nextinputs,
                            action,
                            depth,
                            copy.copy(depth_context),
                            lateral_context,
                            behavior,
                            options,
                        )
                    )
                    assert isinstance(outcontents[-1], tuple)
                    if numoutputs is None:
                        numoutputs = len(outcontents[-1])
                    else:
                        assert numoutputs == len(outcontents[-1])

                assert numoutputs is not None

            return tuple(
                UnionArray(
                    Index8(tags), Index64(index), [x[i] for x in outcontents]
                ).simplify_uniontype()
                for i in range(numoutputs)
            )

        # Any option-types?
        elif any(isinstance(x, optiontypes) for x in inputs):
            if nplike.known_data:
                mask = None
                for x in inputs:
                    if isinstance(x, optiontypes):
                        m = x.mask_as_bool(valid_when=False, nplike=nplike)
                        if mask is None:
                            mask = m
                        else:
                            mask = nplike.bitwise_or(mask, m, out=mask)

                nextmask = Index8(mask.view(np.int8))
                index = nplike.full(mask.shape[0], -1, dtype=np.int64)
                index[~mask] = nplike.arange(
                    mask.shape[0] - nplike.count_nonzero(mask), dtype=np.int64
                )
                index = Index64(index)
                if any(not isinstance(x, optiontypes) for x in inputs):
                    nextindex = nplike.arange(mask.shape[0], dtype=np.int64)
                    nextindex[mask] = -1
                    nextindex = Index64(nextindex)

                nextinputs = []
                for x in inputs:
                    if isinstance(x, optiontypes):
                        nextinputs.append(x.project(nextmask))
                    elif isinstance(x, Content):
                        nextinputs.append(
                            IndexedOptionArray(nextindex, x).project(nextmask)
                        )
                    else:
                        nextinputs.append(x)

            else:
                index = None
                nextinputs = []
                for x in inputs:
                    if isinstance(x, optiontypes):
                        index = Index64(nplike.empty((x.length,), np.int64))
                        nextinputs.append(x.content)
                    else:
                        nextinputs.append(x)
                assert index is not None

            outcontent = apply_step(
                nplike,
                nextinputs,
                action,
                depth,
                copy.copy(depth_context),
                lateral_context,
                behavior,
                options,
            )
            assert isinstance(outcontent, tuple)
            return tuple(
                IndexedOptionArray(index, x).simplify_optiontype() for x in outcontent
            )

        # Any list-types?
        elif any(isinstance(x, listtypes) for x in inputs):
            # All regular?
            if all(
                isinstance(x, RegularArray) or not isinstance(x, listtypes)
                for x in inputs
            ):
                maxsize = max(x.size for x in inputs if isinstance(x, RegularArray))

                if nplike.known_data:
                    for x in inputs:
                        if isinstance(x, RegularArray):
                            if maxsize > 1 and x.size == 1:
                                tmpindex = Index64(
                                    nplike.repeat(
                                        nplike.arange(x.length, dtype=np.int64), maxsize
                                    )
                                )

                    nextinputs = []
                    for x in inputs:
                        if isinstance(x, RegularArray):
                            if maxsize > 1 and x.size == 1:
                                nextinputs.append(
                                    IndexedArray(
                                        tmpindex, x.content[: x.length * x.size]
                                    ).project()
                                )
                            elif x.size == maxsize:
                                nextinputs.append(x.content[: x.length * x.size])
                            else:
                                raise ValueError(
                                    "cannot broadcast RegularArray of size "
                                    "{} with RegularArray of size {} {}".format(
                                        x.size, maxsize, in_function(options)
                                    )
                                )
                        else:
                            nextinputs.append(x)

                else:
                    nextinputs = []
                    for x in inputs:
                        if isinstance(x, RegularArray):
                            nextinputs.append(x.content)
                        else:
                            nextinputs.append(x)

                length = None
                for x in inputs:
                    if isinstance(x, Content):
                        if length is None:
                            length = x.length
                        elif nplike.known_shape:
                            assert length == x.length
                assert length is not None

                outcontent = apply_step(
                    nplike,
                    nextinputs,
                    action,
                    depth + 1,
                    copy.copy(depth_context),
                    lateral_context,
                    behavior,
                    options,
                )
                assert isinstance(outcontent, tuple)
                return tuple(RegularArray(x, maxsize, length) for x in outcontent)

            elif not nplike.known_data or not nplike.known_shape:
                offsets = None
                nextinputs = []
                for x in inputs:
                    if isinstance(x, ListOffsetArray):
                        offsets = Index64(
                            nplike.empty((x.offsets.data.shape[0],), np.int64)
                        )
                        nextinputs.append(x.content)
                    elif isinstance(x, ListArray):
                        offsets = Index64(
                            nplike.empty((x.starts.data.shape[0] + 1,), np.int64)
                        )
                        nextinputs.append(x.content)
                    elif isinstance(x, RegularArray):
                        nextinputs.append(x.content)
                    else:
                        nextinputs.append(x)
                assert offsets is not None

                outcontent = apply_step(
                    nplike,
                    nextinputs,
                    action,
                    depth + 1,
                    copy.copy(depth_context),
                    lateral_context,
                    behavior,
                    options,
                )
                assert isinstance(outcontent, tuple)

                return tuple(ListOffsetArray(offsets, x) for x in outcontent)

            # Not all regular, but all same offsets?
            # Optimization: https://github.com/scikit-hep/awkward-1.0/issues/442
            elif all_same_offsets(nplike, inputs):
                lencontent, offsets, starts, stops = None, None, None, None
                nextinputs = []

                for x in inputs:
                    if isinstance(x, ListOffsetArray):
                        offsets = x.offsets
                        lencontent = offsets[-1]
                        nextinputs.append(x.content[:lencontent])

                    elif isinstance(x, ListArray):
                        starts, stops = x.starts, x.stops
                        if starts.length == 0 or stops.length == 0:
                            nextinputs.append(x.content[:0])
                        else:
                            lencontent = nplike.max(stops)
                            nextinputs.append(x.content[:lencontent])

                    else:
                        nextinputs.append(x)

                outcontent = apply_step(
                    nplike,
                    nextinputs,
                    action,
                    depth + 1,
                    copy.copy(depth_context),
                    lateral_context,
                    behavior,
                    options,
                )
                assert isinstance(outcontent, tuple)

                if isinstance(offsets, Index):
                    return tuple(
                        ListOffsetArray(offsets, x).toListOffsetArray64(False)
                        for x in outcontent
                    )
                elif isinstance(starts, Index) and isinstance(stops, Index):
                    return tuple(
                        ListArray(starts, stops, x).toListOffsetArray64(False)
                        for x in outcontent
                    )
                else:
                    raise AssertionError(
                        "unexpected offsets, starts: {}, {}".format(
                            type(offsets), type(starts)
                        )
                    )

            # General list-handling case: the offsets of each list may be different.
            else:
                fcns = [
                    ak._v2._util.custom_broadcast(x, behavior)
                    if isinstance(x, Content)
                    else None
                    for x in inputs
                ]

                first, secondround = None, False
                for x, fcn in zip(inputs, fcns):
                    if (
                        isinstance(x, listtypes)
                        and not isinstance(x, RegularArray)
                        and fcn is None
                    ):
                        first = x
                        break

                if first is None:
                    secondround = True
                    for x in inputs:
                        if isinstance(x, listtypes) and not isinstance(x, RegularArray):
                            first = x
                            break

                offsets = first._compact_offsets64(True)

                nextinputs = []
                for x, fcn in zip(inputs, fcns):
                    if callable(fcn) and not secondround:
                        nextinputs.append(fcn(x, offsets))
                    elif isinstance(x, listtypes):
                        nextinputs.append(x._broadcast_tooffsets64(offsets).content)

                    # Handle implicit left-broadcasting (non-NumPy-like broadcasting).
                    elif options["left_broadcast"] and isinstance(x, Content):
                        nextinputs.append(
                            RegularArray(x, 1, x.length)
                            ._broadcast_tooffsets64(offsets)
                            .content
                        )
                    else:
                        nextinputs.append(x)

                outcontent = apply_step(
                    nplike,
                    nextinputs,
                    action,
                    depth + 1,
                    copy.copy(depth_context),
                    lateral_context,
                    behavior,
                    options,
                )
                assert isinstance(outcontent, tuple)

                return tuple(ListOffsetArray(offsets, x) for x in outcontent)

        # Any RecordArrays?
        elif any(isinstance(x, RecordArray) for x in inputs):
            if not options["allow_records"]:
>               raise ValueError(f"cannot broadcast records{in_function(options)}")
E               ValueError: cannot broadcast recordsin equal

/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:644: ValueError

@Saransh-cpp Saransh-cpp force-pushed the saransh/upper-cap-awkward branch 2 times, most recently from 97c3093 to 97b5623 Compare July 29, 2022 15:52
@Saransh-cpp
Copy link
Member Author

Saransh-cpp commented Aug 6, 2022

Monkeypatching sys.modules also brings up a few unrelated errors. I think a user is not supposed to monkeypatch the imports in this manner; hence these errors cannot be classified as bugs in awkward._v2 or vector (?)

In [1]: import vector

In [2]: import sys

In [3]: import awkward._v2

In [4]: sys.modules["awkward"] = awkward._v2

In [5]: sys.modules["awkward"]
Out[5]: <module 'awkward._v2' from 'C:\\Users\\Saransh\\Saransh_softwares\\Python_3.9\\lib\\site-packages\\awkward\\_v2\\__init__.py'>

In [6]: array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])

In [7]: array
Out[7]: ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~\Saransh_softwares\Python_3.9\lib\site-packages\IPython\core\formatters.py:707, in PlainTextFormatter.__call__(self, obj)
    700 stream = StringIO()
    701 printer = pretty.RepresentationPrinter(stream, self.verbose,
    702     self.max_width, self.newline,
    703     max_seq_length=self.max_seq_length,
    704     singleton_pprinters=self.singleton_printers,
    705     type_pprinters=self.type_printers,
    706     deferred_pprinters=self.deferred_printers)
--> 707 printer.pretty(obj)
    708 printer.flush()
    709 return stream.getvalue()

File ~\Saransh_softwares\Python_3.9\lib\site-packages\IPython\lib\pretty.py:410, in RepresentationPrinter.pretty(self, obj)
    407                         return meth(obj, self, cycle)
    408                 if cls is not object \
    409                         and callable(cls.__dict__.get('__repr__')):
--> 410                     return _repr_pprint(obj, self, cycle)
    412     return _default_pprint(obj, self, cycle)
    413 finally:

File ~\Saransh_softwares\Python_3.9\lib\site-packages\IPython\lib\pretty.py:778, in _repr_pprint(obj, p, cycle)
    776 """A pprint that just redirects to the normal repr function."""
    777 # Find newlines and replace them with p.break_()
--> 778 output = repr(obj)
    779 lines = output.splitlines()
    780 with p.group():

File ~\Saransh_softwares\Python_3.9\lib\site-packages\awkward\_v2\highlevel.py:1252, in Array.__repr__(self)
   1250 pytype = type(self).__name__
   1251 if self._layout.nplike.known_shape and self._layout.nplike.known_data:
-> 1252     valuestr = " " + awkward._v2._prettyprint.valuestr(self, 1, 50)
   1253     typestr = repr(str(self.type))[1:-1]
   1254 else:

AttributeError: module 'awkward._v2' has no attribute '_v2'

In [8]: out = array.rotateZ(1)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 out = array.rotateZ(1)

File ~\Saransh_softwares\Python_3.9\lib\site-packages\awkward\_v2\highlevel.py:1137, in Array.__getattr__(self, where)
   1132         raise AttributeError(
   1133             "while trying to get field {}, an exception "
   1134             "occurred:\n{}: {}".format(repr(where), type(err), str(err))
   1135         )
   1136 else:
-> 1137     raise AttributeError(f"no field named {where!r}")

AttributeError: no field named 'rotateZ'

@henryiii
Copy link
Member

henryiii commented Aug 6, 2022

I’d guess Awkward is not careful to do relative imports. But you could try adding _v2 to itself.

@Saransh-cpp
Copy link
Member Author

I’d guess Awkward is not careful to do relative imports. But you could try adding _v2 to itself.

Thanks! This seems to work! But flake8 throws an error -

tests/conftest.py:8:5: B010 Do not call setattr with a constant attribute value, it is not any safer than normal property access.

conftest.py -

import os
import sys

if os.environ.get("VECTOR_USE_AWKWARDV2", None):
    import awkward._v2

    sys.modules["awkward"] = awkward._v2
    setattr(sys.modules["awkward"], "_v2", awkward._v2)

Ignoring that error, the tests still fail as tolist() behaves weirdly after patching -

def test_rotateZ():
    array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])
    out = array.rotateZ(1)
    assert isinstance(out, vector.backends.awkward.MomentumArray2D)
    print(out)
    print(out.tolist())
    assert out.tolist() == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]

...

Running VECTOR_USE_AWKWARDV2=1 pytest -s prints the following -

[{rho: 1, phi: 1}], [], [{rho: 2, phi: 2}]]
[<MomentumArray2D [{rho: 1, phi: 1}] type='1 * Momentum2D[rho: int64, phi: f...'>, <MomentumArray2D [] type='0 * Momentum2D[rho: int64, phi: float64]'>, <MomentumArray2D [{rho: 2, phi: 2}] type='1 * Momentum2D[rho: int64, phi: f...'>]

and gives the same error as before (but at least I can print the awkward vectors now, thanks to setattr).

Pushing the commit for the full stacktrace.

@Saransh-cpp
Copy link
Member Author

If I don't patch but replace all the awkward imports with awkward._v2 imports, this happens -

In [1]: import vector

In [2]: array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])

In [3]: array
Out[3]: <MomentumArray2D [[{rho: 1, phi: 0}], [], [{rho: 2, ...}]] type='3 * var * ...'>

In [4]: out = array.rotateZ(1)

In [5]: out
Out[5]: <MomentumArray2D [[{rho: 1, phi: 1}], [], [{rho: 2, ...}]] type='3 * var * ...'>

In [6]: out.tolist()
Out[6]: 
[<Array [{rho: 1, phi: 1}] type='1 * Momentum2D[rho: int64, phi: float64]'>,
 <Array [] type='0 * Momentum2D[rho: int64, phi: float64]'>,
 <Array [{rho: 2, phi: 2}] type='1 * Momentum2D[rho: int64, phi: float64]'>]

The behaviors observed after patching and after replacing are definitely different.

Saransh-cpp and others added 2 commits August 17, 2022 01:30
Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>
Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>
@Saransh-cpp
Copy link
Member Author

@henryiii, @jpivarski, this should be finally ready for a review! I have marked some tests with xfail because their failures come from awkward._v2.

@Saransh-cpp Saransh-cpp added this to the v0.9.0 milestone Aug 29, 2022
@Saransh-cpp Saransh-cpp added tests More or better tests are needed dependencies Pull requests that update a dependency file labels Aug 29, 2022
Comment on lines 297 to 317
fields = awkward.fields(akarray)
fields = awkward.fields(akarray).copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I hadn't noticed that Awkward v2's fields are passed by reference, which exposes them to the danger that someone might modify them downstream:

v1:

>>> array = awkward.Array([{"x": 1, "y": 1.1}])
>>> fields = awkward.fields(array)
>>> array
<Array [{x: 1, y: 1.1}] type='1 * {"x": int64, "y": float64}'>
>>> fields
['x', 'y']
>>> fields[0] = "XXX"
>>> fields
['XXX', 'y']
>>> array
<Array [{x: 1, y: 1.1}] type='1 * {"x": int64, "y": float64}'>

v2:

>>> array = awkward._v2.Array([{"x": 1, "y": 1.1}])
>>> fields = awkward._v2.fields(array)
>>> array
<Array [{x: 1, y: 1.1}] type='1 * {x: int64, y: float64}'>
>>> fields
['x', 'y']
>>> fields[0] = "XXX"
>>> fields
['XXX', 'y']
>>> array
<Array [{XXX: 1, y: 1.1}] type='1 * {XXX: int64, y: float64}'>

It could be fixed here, in Awkward, or maybe here (to only suffer the list-copy when handing it off to a user, so that internal uses can still be by reference).

I'll use this comment to open an issue in Awkward. Once awkward.fields is guarded, your .copy() can be removed, but it can also not be removed with no consequences but a little performance.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good to me! I wouldn't want the switch between versions to be an environment variable in the long term, but it won't have to be in the long term.

I'll keep you in the loop on our splitting of the Awkward codebase into v2-only main and the status quo main-v1, and that would simplify the switch in your testing: it would just depend on Python package version number (because "awkward" in 2.0.0rc1 onward will simply be v2).

@Saransh-cpp
Copy link
Member Author

Thanks for the review, @jpivarski! Yes, all this env variable stuff will be removed once vector officially supports awkward v2. I can take up the awkward issue too (which looks like a quick fix)!

@Saransh-cpp
Copy link
Member Author

I'll wait for a review from @henryiii before merging this in (as he suggested the environment variable solution). .copy() can be removed in another PR when awkward makes a new release!

Copy link
Member Author

@Saransh-cpp Saransh-cpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that scikit-hep/awkward#1652 and scikit-hep/awkward#1650 are merged, we can make the following changes -

(Should wait till a new release. IMO it should be okay to make these changes in a follow-up PR.)

Comment on lines 14 to 19
# this is a known issue in awkward._v2
# see https://github.com/scikit-hep/awkward/issues/1600
# TODO: ensure this passes once awkward v2 is out
@pytest.mark.xfail(
strict=True if os.environ.get("VECTOR_USE_AWKWARDV2") is not None else False
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed.

Suggested change
# this is a known issue in awkward._v2
# see https://github.com/scikit-hep/awkward/issues/1600
# TODO: ensure this passes once awkward v2 is out
@pytest.mark.xfail(
strict=True if os.environ.get("VECTOR_USE_AWKWARDV2") is not None else False
)

@@ -294,7 +314,7 @@ def Array(*args: typing.Any, **kwargs: typing.Any) -> typing.Any:

if not _is_type_safe(array_type):
raise TypeError("a coordinate must be of the type int or float")
fields = awkward.fields(akarray)
fields = awkward.fields(akarray).copy()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.copy() can be removed.

Suggested change
fields = awkward.fields(akarray).copy()
fields = awkward.fields(akarray)

@Saransh-cpp Saransh-cpp merged commit 1f09bb2 into scikit-hep:main Sep 4, 2022
@Saransh-cpp Saransh-cpp deleted the saransh/upper-cap-awkward branch September 4, 2022 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file tests More or better tests are needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants