chore: support `awkward` `v1` and `v2` together #226

Saransh-cpp · 2022-07-20T06:11:09Z

No description provided.

codecov-commenter · 2022-07-20T06:14:04Z

Codecov Report

Merging #226 (2f737b3) into main (cac88a2) will increase coverage by 0.00%.
The diff coverage is 88.88%.

@@           Coverage Diff           @@
##             main     #226   +/-   ##
=======================================
  Coverage   82.63%   82.64%           
=======================================
  Files          96       96           
  Lines       10524    10525    +1     
=======================================
+ Hits         8697     8698    +1     
  Misses       1827     1827

Impacted Files	Coverage Δ
src/vector/backends/awkward_constructors.py	`47.59% <85.71%> (+0.25%)`	⬆️
src/vector/backends/awkward.py	`75.89% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cac88a2...2f737b3. Read the comment docs.

henryiii · 2022-07-21T05:40:32Z

How hard would it be to make it work with both? Caps are highly problematic. If someone has the following file:

awkward>=2
vector

Modern pip and other modern solvers will just pick the last version of Vector without the cap (& custom error) and awkward 2 - that's a logically valid solution.

jpivarski · 2022-07-21T13:10:42Z

Well, in a major version update, you're allowed to change the public API, and a few things get renamed that would be visible here. I've found a couple by searching with grep, but there's nothing like actually doing tests.

vector/src/vector/backends/awkward_constructors.py

Line 217 in cac88a2

array_type = array_type.type

The .type of nestable Types (all but ArrayType) becomes .content for consistency with Contents and Forms.

vector/src/vector/backends/awkward_constructors.py

Line 224 in cac88a2

field_type = field_type.type

There it is again.

vector/src/vector/backends/awkward_constructors.py

Line 227 in cac88a2

dt = field_type.dtype

This string, badly named .dtype in v1, is now .primitive (one of our list of primitive types).

vector/src/vector/backends/awkward.py

Lines 1601 to 1603 in cac88a2

    
           if isinstance(awkwardtype, ak._connect._numba.layout.NumpyArrayType): 
        
               return awkwardtype.arraytype 
        
           elif isinstance(awkwardtype, ak._connect._numba.layout.IndexedArrayType):

In v2, _connect still has an underscore, but submodules within it, like numba, do not. The underscore is a gateway from public to private. And actually, these two types are technically private, though I don't see a better way of writing this code in Vector and these names will not be changing.

So, after reviewing the Vector code, it has been written with enough decoupling from Awkward that it's possible to make it work for both v1 and v2. Testing that will be tricky, since all references to "awkward" and names derived from it, like "ak", will need to be swapped for "awkward._v2" to do a test. That swap must be 100% consistent—nothing left unswitched, otherwise it won't be a complete test.

Saransh-cpp · 2022-07-21T19:34:44Z

Thanks for the review, @henryiii, and @jpivarski!

Modern pip and other modern solvers will just pick the last version of Vector without the cap (& custom error) and awkward 2 - that's a logically valid solution.

Oh, I had no idea about this behavior.

I have made the required changes, and now vector should support both awkward v1 and v2.

Testing that will be tricky, since all references to "awkward" and names derived from it, like "ak", will need to be swapped for "awkward._v2" to do a test

Yes, the tests are tricky. Still searching for ways to test both awkward versions, but no luck so far.

henryiii · 2022-07-21T21:20:30Z

Tricky part is loading it - will have to check with PyTest - otherwise we should be able to just monkeypatch it into sys.modules. At the worst, it could be a envvar setting when you run the tests.

Saransh-cpp · 2022-07-22T18:49:32Z

I tried using monkeypatch in different ways, but none of them worked perfectly.

I tried editing sys.modules but the edits are not affecting the imports at all. I am not sure how to tell Python to use the value associated with awkward._v2 in sys.modules instead of awkward.
I also tried using it with pytest (https://docs.pytest.org/en/7.1.x/how-to/monkeypatch.html). The workaround I was thinking of was importing awkward by declaring a global variable first and then monkeypatching it from the tests, but the tests fail with multiple errors. These errors are not related to the v1 -> v2 switch, but I think because of ambiguous variable (and global variable), module, and dependency name (all awkward).-

backends/awkward.py (similar for the constructor file, I had to declare the variable globally there too for pytest monkeypatching to work)

global awkward
awkward = __import__('awkward', globals())

test_awkward.py (similar for the constructor tests)

def test_basic(monkeypatch):
    monkeypatch.setattr("vector.backends.awkward.awkward", awkward._v2, raising=True)
    monkeypatch.setattr("vector.backends.awkward_constructors.awkward", awkward._v2, raising=True)

...

__init__.py

def _is_awkward_v2(obj: typing.Any) -> bool:
    return packaging.version.Version(
        importlib_metadata.version("awkward")
    ) >= packaging.version.Version("2") or "awkward._v2" in inspect.getmodule(obj).__name__

Using environment variable seems to work, and with @pytest.fixture or @pytest.mark.parametrize we will be able to run the tests twice, once for v1 and once for v2.

henryiii · 2022-07-22T20:17:17Z

I'll push what I was thinking about, can be run with:

VECTOR_USE_AWKWARDV2=1 pytest

You can revert and continue with your proposal above, or continue on this path. But I thought this was the easiest thing to show what I was thinking about.

Saransh-cpp · 2022-07-23T15:06:58Z

This solution looks much simple! Thanks, @henryiii!

I made some fixes after your commit, but the tests are still failing due to some reasons. I was able to pinpoint a couple of them -

awkward.fields() and array.fields now return the exact object, not a copy (?) This list is currently modified in awkward_constructors._check_names(), but passing a copy in _check_names() works!

@@ -314,7 +315,7 @@ def Array(*args: typing.Any, **kwargs: typing.Any) -> typing.Any:

     if not _is_type_safe(array_type):
         raise TypeError("a coordinate must be of the type int or float")
-    fields = awkward.fields(akarray)
+    fields = awkward.fields(akarray).copy()

     is_momentum, dimension, names, arrays = _check_names(akarray, fields)

tolist() on VectorArrays behaves weirdly. To check further, I went ahead and changed all the awkward imports to awkward._v2 imports inside vector and discovered this -

>>> array = vector.Array(
...     [
...         [{"x": 1, "y": 2, "z": 3, "wow": 99}],
...         [],
...         [{"x": 4, "y": 5, "z": 6, "wow": 123}],
...     ]
... )
>>> out = array.to_Vector2D()
>>> assert isinstance(out, vector.backends.awkward.VectorArray2D)
>>> out.tolist()
[<Array [{x: 1, y: 2, wow: 99}] type='1 * Vector2D[x: int64, y: int64, wow: ...'>, <Array [] type='0 * Vector2D[x: int64, y: int64, wow: int64]'>, <Array [{x: 4, y: 5, wow: 123}] type='1 * Vector2D[x: int64, y: int64, wow:...'>]
>>> assert [arr.tolist() for arr in out.tolist()] == [
...     [{"x": 1, "y": 2, "wow": 99}],
...     [],
...     [{"x": 4, "y": 5, "wow": 123}],
... ]

Changing tolist() lines to a list comprehension works when the tests are run manually, without using pytest -

@@ -65,7 +65,7 @@ def test_rotateZ():
     array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])
     out = array.rotateZ(1)
     assert isinstance(out, vector.backends.awkward.MomentumArray2D)
-    assert out.tolist() == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]
+    assert [arr.tolist() for arr in out.tolist()] == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]

     array = vector.Array(
         [[{"x": 1, "y": 0, "wow": 99}], [], [{"x": 2, "y": 1, "wow": 123}]]

Running this test manually -

Python 3.9.0 (tags/v3.9.0:9cf6752, Oct  5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import vector
>>> array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])
>>> out = array.rotateZ(1)
>>> assert isinstance(out, vector.backends.awkward.MomentumArray2D)
>>> assert [arr.tolist() for arr in out.tolist()] == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]

Running tests using pytest -ra (specifically pasting the error from test_rotateZ below, but all of the ValueError: cannot broadcast recordsin equal errors are never thrown when the same functions are run manually, given that the tolist() output is fixed in these functions) -

=============================================== short test summary info ================================================
FAILED tests/test_issues.py::test_issue_99 - AttributeError: no field named 'to_xyz'
FAILED tests/test_issues.py::test_issue_161 - numba.core.errors.TypingError: Failed in nopython mode pipeline (step: ...
FAILED tests/backends/test_awkward.py::test_rotateZ - ValueError: cannot broadcast recordsin equal
FAILED tests/backends/test_awkward.py::test_projection - ValueError: cannot broadcast recordsin equal
FAILED tests/backends/test_awkward.py::test_add - ValueError: cannot broadcast recordsin equal
FAILED tests/backends/test_awkward.py::test_ufuncs - TypeError: <lambda>() takes 1 positional argument but 4 were given
FAILED tests/backends/test_awkward.py::test_zip - ValueError: cannot broadcast recordsin equal
FAILED tests/backends/test_awkward_numba.py::test - numba.core.errors.TypingError: Failed in nopython mode pipeline (...
====================================== 8 failed, 2781 passed in 75.56s (0:01:15) =======================================

Full stacktrace -

Trace:

_____________________________________________________ test_rotateZ _____________________________________________________

    def test_rotateZ():
        array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])
        out = array.rotateZ(1)
        assert isinstance(out, vector.backends.awkward.MomentumArray2D)
>       assert [arr.tolist() for arr in out.tolist()] == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]

tests/backends/test_awkward.py:68:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/saransh/.local/lib/python3.8/site-packages/numpy/lib/mixins.py:21: in func
    return ufunc(self, other)
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/highlevel.py:2001: in __array_ufunc__
    return ak._v2._connect.numpy.array_ufunc(ufunc, method, inputs, kwargs)
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_connect/numpy.py:269: in array_ufunc
    out = ak._v2._broadcasting.broadcast_and_apply(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:738: in broadcast_and_apply
    out = apply_step(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:718: in apply_step
    return continuation()
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:483: in continuation
    outcontent = apply_step(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:718: in apply_step
    return continuation()
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:235: in continuation
    return apply_step(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:718: in apply_step
    return continuation()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def continuation():
        # Any EmptyArrays?
        if any(isinstance(x, EmptyArray) for x in inputs):
            nextinputs = [
                x.toNumpyArray(np.float64, nplike) if isinstance(x, EmptyArray) else x
                for x in inputs
            ]
            return apply_step(
                nplike,
                nextinputs,
                action,
                depth,
                copy.copy(depth_context),
                lateral_context,
                behavior,
                options,
            )

        # Any NumpyArrays with ndim != 1?
        elif any(isinstance(x, NumpyArray) and x.data.ndim != 1 for x in inputs):
            nextinputs = [
                x.toRegularArray() if isinstance(x, NumpyArray) else x for x in inputs
            ]
            return apply_step(
                nplike,
                nextinputs,
                action,
                depth,
                copy.copy(depth_context),
                lateral_context,
                behavior,
                options,
            )

        # Any IndexedArrays?
        elif any(isinstance(x, IndexedArray) for x in inputs):
            nextinputs = [
                x.project() if isinstance(x, IndexedArray) else x for x in inputs
            ]
            return apply_step(
                nplike,
                nextinputs,
                action,
                depth,
                copy.copy(depth_context),
                lateral_context,
                behavior,
                options,
            )

        # Any UnionArrays?
        elif any(isinstance(x, UnionArray) for x in inputs):
            if not nplike.known_data:
                numtags, length = [], None
                for x in inputs:
                    if isinstance(x, UnionArray):
                        numtags.append(len(x.contents))
                        if length is None:
                            length = x.tags.data.shape[0]
                assert length is not None

                all_combos = list(itertools.product(*[range(x) for x in numtags]))

                tags = nplike.empty(length, dtype=np.int8)
                index = nplike.empty(length, dtype=np.int64)
                numoutputs, outcontents = None, []
                for combo in all_combos:
                    nextinputs = []
                    i = 0
                    for x in inputs:
                        if isinstance(x, UnionArray):
                            nextinputs.append(x._contents[combo[i]])
                            i += 1
                        else:
                            nextinputs.append(x)

                    outcontents.append(
                        apply_step(
                            nplike,
                            nextinputs,
                            action,
                            depth,
                            copy.copy(depth_context),
                            lateral_context,
                            behavior,
                            options,
                        )
                    )
                    assert isinstance(outcontents[-1], tuple)
                    if numoutputs is None:
                        numoutputs = len(outcontents[-1])
                    else:
                        assert numoutputs == len(outcontents[-1])

                assert numoutputs is not None

            else:
                tagslist, numtags, length = [], [], None
                for x in inputs:
                    if isinstance(x, UnionArray):
                        tagslist.append(x.tags.raw(nplike))
                        numtags.append(len(x.contents))
                        if length is None:
                            length = tagslist[-1].shape[0]
                        elif length != tagslist[-1].shape[0]:
                            raise ValueError(
                                "cannot broadcast UnionArray of length {} "
                                "with UnionArray of length {}{}".format(
                                    length, tagslist[-1].shape[0], in_function(options)
                                )
                            )
                assert length is not None

                combos = nplike.stack(tagslist, axis=-1)

                all_combos = nplike.array(
                    list(itertools.product(*[range(x) for x in numtags])),
                    dtype=[(str(i), combos.dtype) for i in range(len(tagslist))],
                )

                combos = combos.view(
                    [(str(i), combos.dtype) for i in range(len(tagslist))]
                ).reshape(length)

                tags = nplike.empty(length, dtype=np.int8)
                index = nplike.empty(length, dtype=np.int64)
                numoutputs, outcontents = None, []
                for tag, combo in enumerate(all_combos):
                    mask = combos == combo
                    tags[mask] = tag
                    index[mask] = nplike.arange(
                        nplike.count_nonzero(mask), dtype=np.int64
                    )
                    nextinputs = []
                    i = 0
                    for x in inputs:
                        if isinstance(x, UnionArray):
                            nextinputs.append(x[mask].project(combo[str(i)]))
                            i += 1
                        elif isinstance(x, Content):
                            nextinputs.append(x[mask])
                        else:
                            nextinputs.append(x)
                    outcontents.append(
                        apply_step(
                            nplike,
                            nextinputs,
                            action,
                            depth,
                            copy.copy(depth_context),
                            lateral_context,
                            behavior,
                            options,
                        )
                    )
                    assert isinstance(outcontents[-1], tuple)
                    if numoutputs is None:
                        numoutputs = len(outcontents[-1])
                    else:
                        assert numoutputs == len(outcontents[-1])

                assert numoutputs is not None

            return tuple(
                UnionArray(
                    Index8(tags), Index64(index), [x[i] for x in outcontents]
                ).simplify_uniontype()
                for i in range(numoutputs)
            )

        # Any option-types?
        elif any(isinstance(x, optiontypes) for x in inputs):
            if nplike.known_data:
                mask = None
                for x in inputs:
                    if isinstance(x, optiontypes):
                        m = x.mask_as_bool(valid_when=False, nplike=nplike)
                        if mask is None:
                            mask = m
                        else:
                            mask = nplike.bitwise_or(mask, m, out=mask)

                nextmask = Index8(mask.view(np.int8))
                index = nplike.full(mask.shape[0], -1, dtype=np.int64)
                index[~mask] = nplike.arange(
                    mask.shape[0] - nplike.count_nonzero(mask), dtype=np.int64
                )
                index = Index64(index)
                if any(not isinstance(x, optiontypes) for x in inputs):
                    nextindex = nplike.arange(mask.shape[0], dtype=np.int64)
                    nextindex[mask] = -1
                    nextindex = Index64(nextindex)

                nextinputs = []
                for x in inputs:
                    if isinstance(x, optiontypes):
                        nextinputs.append(x.project(nextmask))
                    elif isinstance(x, Content):
                        nextinputs.append(
                            IndexedOptionArray(nextindex, x).project(nextmask)
                        )
                    else:
                        nextinputs.append(x)

            else:
                index = None
                nextinputs = []
                for x in inputs:
                    if isinstance(x, optiontypes):
                        index = Index64(nplike.empty((x.length,), np.int64))
                        nextinputs.append(x.content)
                    else:
                        nextinputs.append(x)
                assert index is not None

            outcontent = apply_step(
                nplike,
                nextinputs,
                action,
                depth,
                copy.copy(depth_context),
                lateral_context,
                behavior,
                options,
            )
            assert isinstance(outcontent, tuple)
            return tuple(
                IndexedOptionArray(index, x).simplify_optiontype() for x in outcontent
            )

        # Any list-types?
        elif any(isinstance(x, listtypes) for x in inputs):
            # All regular?
            if all(
                isinstance(x, RegularArray) or not isinstance(x, listtypes)
                for x in inputs
            ):
                maxsize = max(x.size for x in inputs if isinstance(x, RegularArray))

                if nplike.known_data:
                    for x in inputs:
                        if isinstance(x, RegularArray):
                            if maxsize > 1 and x.size == 1:
                                tmpindex = Index64(
                                    nplike.repeat(
                                        nplike.arange(x.length, dtype=np.int64), maxsize
                                    )
                                )

                    nextinputs = []
                    for x in inputs:
                        if isinstance(x, RegularArray):
                            if maxsize > 1 and x.size == 1:
                                nextinputs.append(
                                    IndexedArray(
                                        tmpindex, x.content[: x.length * x.size]
                                    ).project()
                                )
                            elif x.size == maxsize:
                                nextinputs.append(x.content[: x.length * x.size])
                            else:
                                raise ValueError(
                                    "cannot broadcast RegularArray of size "
                                    "{} with RegularArray of size {} {}".format(
                                        x.size, maxsize, in_function(options)
                                    )
                                )
                        else:
                            nextinputs.append(x)

                else:
                    nextinputs = []
                    for x in inputs:
                        if isinstance(x, RegularArray):
                            nextinputs.append(x.content)
                        else:
                            nextinputs.append(x)

                length = None
                for x in inputs:
                    if isinstance(x, Content):
                        if length is None:
                            length = x.length
                        elif nplike.known_shape:
                            assert length == x.length
                assert length is not None

                outcontent = apply_step(
                    nplike,
                    nextinputs,
                    action,
                    depth + 1,
                    copy.copy(depth_context),
                    lateral_context,
                    behavior,
                    options,
                )
                assert isinstance(outcontent, tuple)
                return tuple(RegularArray(x, maxsize, length) for x in outcontent)

            elif not nplike.known_data or not nplike.known_shape:
                offsets = None
                nextinputs = []
                for x in inputs:
                    if isinstance(x, ListOffsetArray):
                        offsets = Index64(
                            nplike.empty((x.offsets.data.shape[0],), np.int64)
                        )
                        nextinputs.append(x.content)
                    elif isinstance(x, ListArray):
                        offsets = Index64(
                            nplike.empty((x.starts.data.shape[0] + 1,), np.int64)
                        )
                        nextinputs.append(x.content)
                    elif isinstance(x, RegularArray):
                        nextinputs.append(x.content)
                    else:
                        nextinputs.append(x)
                assert offsets is not None

                outcontent = apply_step(
                    nplike,
                    nextinputs,
                    action,
                    depth + 1,
                    copy.copy(depth_context),
                    lateral_context,
                    behavior,
                    options,
                )
                assert isinstance(outcontent, tuple)

                return tuple(ListOffsetArray(offsets, x) for x in outcontent)

            # Not all regular, but all same offsets?
            # Optimization: https://github.com/scikit-hep/awkward-1.0/issues/442
            elif all_same_offsets(nplike, inputs):
                lencontent, offsets, starts, stops = None, None, None, None
                nextinputs = []

                for x in inputs:
                    if isinstance(x, ListOffsetArray):
                        offsets = x.offsets
                        lencontent = offsets[-1]
                        nextinputs.append(x.content[:lencontent])

                    elif isinstance(x, ListArray):
                        starts, stops = x.starts, x.stops
                        if starts.length == 0 or stops.length == 0:
                            nextinputs.append(x.content[:0])
                        else:
                            lencontent = nplike.max(stops)
                            nextinputs.append(x.content[:lencontent])

                    else:
                        nextinputs.append(x)

                outcontent = apply_step(
                    nplike,
                    nextinputs,
                    action,
                    depth + 1,
                    copy.copy(depth_context),
                    lateral_context,
                    behavior,
                    options,
                )
                assert isinstance(outcontent, tuple)

                if isinstance(offsets, Index):
                    return tuple(
                        ListOffsetArray(offsets, x).toListOffsetArray64(False)
                        for x in outcontent
                    )
                elif isinstance(starts, Index) and isinstance(stops, Index):
                    return tuple(
                        ListArray(starts, stops, x).toListOffsetArray64(False)
                        for x in outcontent
                    )
                else:
                    raise AssertionError(
                        "unexpected offsets, starts: {}, {}".format(
                            type(offsets), type(starts)
                        )
                    )

            # General list-handling case: the offsets of each list may be different.
            else:
                fcns = [
                    ak._v2._util.custom_broadcast(x, behavior)
                    if isinstance(x, Content)
                    else None
                    for x in inputs
                ]

                first, secondround = None, False
                for x, fcn in zip(inputs, fcns):
                    if (
                        isinstance(x, listtypes)
                        and not isinstance(x, RegularArray)
                        and fcn is None
                    ):
                        first = x
                        break

                if first is None:
                    secondround = True
                    for x in inputs:
                        if isinstance(x, listtypes) and not isinstance(x, RegularArray):
                            first = x
                            break

                offsets = first._compact_offsets64(True)

                nextinputs = []
                for x, fcn in zip(inputs, fcns):
                    if callable(fcn) and not secondround:
                        nextinputs.append(fcn(x, offsets))
                    elif isinstance(x, listtypes):
                        nextinputs.append(x._broadcast_tooffsets64(offsets).content)

                    # Handle implicit left-broadcasting (non-NumPy-like broadcasting).
                    elif options["left_broadcast"] and isinstance(x, Content):
                        nextinputs.append(
                            RegularArray(x, 1, x.length)
                            ._broadcast_tooffsets64(offsets)
                            .content
                        )
                    else:
                        nextinputs.append(x)

                outcontent = apply_step(
                    nplike,
                    nextinputs,
                    action,
                    depth + 1,
                    copy.copy(depth_context),
                    lateral_context,
                    behavior,
                    options,
                )
                assert isinstance(outcontent, tuple)

                return tuple(ListOffsetArray(offsets, x) for x in outcontent)

        # Any RecordArrays?
        elif any(isinstance(x, RecordArray) for x in inputs):
            if not options["allow_records"]:
>               raise ValueError(f"cannot broadcast records{in_function(options)}")
E               ValueError: cannot broadcast recordsin equal

/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:644: ValueError

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

Saransh-cpp · 2022-08-06T14:17:14Z

Monkeypatching sys.modules also brings up a few unrelated errors. I think a user is not supposed to monkeypatch the imports in this manner; hence these errors cannot be classified as bugs in awkward._v2 or vector (?)

In [1]: import vector

In [2]: import sys

In [3]: import awkward._v2

In [4]: sys.modules["awkward"] = awkward._v2

In [5]: sys.modules["awkward"]
Out[5]: <module 'awkward._v2' from 'C:\\Users\\Saransh\\Saransh_softwares\\Python_3.9\\lib\\site-packages\\awkward\\_v2\\__init__.py'>

In [6]: array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])

In [7]: array
Out[7]: ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~\Saransh_softwares\Python_3.9\lib\site-packages\IPython\core\formatters.py:707, in PlainTextFormatter.__call__(self, obj)
    700 stream = StringIO()
    701 printer = pretty.RepresentationPrinter(stream, self.verbose,
    702     self.max_width, self.newline,
    703     max_seq_length=self.max_seq_length,
    704     singleton_pprinters=self.singleton_printers,
    705     type_pprinters=self.type_printers,
    706     deferred_pprinters=self.deferred_printers)
--> 707 printer.pretty(obj)
    708 printer.flush()
    709 return stream.getvalue()

File ~\Saransh_softwares\Python_3.9\lib\site-packages\IPython\lib\pretty.py:410, in RepresentationPrinter.pretty(self, obj)
    407                         return meth(obj, self, cycle)
    408                 if cls is not object \
    409                         and callable(cls.__dict__.get('__repr__')):
--> 410                     return _repr_pprint(obj, self, cycle)
    412     return _default_pprint(obj, self, cycle)
    413 finally:

File ~\Saransh_softwares\Python_3.9\lib\site-packages\IPython\lib\pretty.py:778, in _repr_pprint(obj, p, cycle)
    776 """A pprint that just redirects to the normal repr function."""
    777 # Find newlines and replace them with p.break_()
--> 778 output = repr(obj)
    779 lines = output.splitlines()
    780 with p.group():

File ~\Saransh_softwares\Python_3.9\lib\site-packages\awkward\_v2\highlevel.py:1252, in Array.__repr__(self)
   1250 pytype = type(self).__name__
   1251 if self._layout.nplike.known_shape and self._layout.nplike.known_data:
-> 1252     valuestr = " " + awkward._v2._prettyprint.valuestr(self, 1, 50)
   1253     typestr = repr(str(self.type))[1:-1]
   1254 else:

AttributeError: module 'awkward._v2' has no attribute '_v2'

In [8]: out = array.rotateZ(1)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 out = array.rotateZ(1)

File ~\Saransh_softwares\Python_3.9\lib\site-packages\awkward\_v2\highlevel.py:1137, in Array.__getattr__(self, where)
   1132         raise AttributeError(
   1133             "while trying to get field {}, an exception "
   1134             "occurred:\n{}: {}".format(repr(where), type(err), str(err))
   1135         )
   1136 else:
-> 1137     raise AttributeError(f"no field named {where!r}")

AttributeError: no field named 'rotateZ'

henryiii · 2022-08-06T17:08:26Z

I’d guess Awkward is not careful to do relative imports. But you could try adding _v2 to itself.

Saransh-cpp · 2022-08-06T19:19:01Z

I’d guess Awkward is not careful to do relative imports. But you could try adding _v2 to itself.

Thanks! This seems to work! But flake8 throws an error -

tests/conftest.py:8:5: B010 Do not call setattr with a constant attribute value, it is not any safer than normal property access.

conftest.py -

import os
import sys

if os.environ.get("VECTOR_USE_AWKWARDV2", None):
    import awkward._v2

    sys.modules["awkward"] = awkward._v2
    setattr(sys.modules["awkward"], "_v2", awkward._v2)

Ignoring that error, the tests still fail as tolist() behaves weirdly after patching -

def test_rotateZ():
    array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])
    out = array.rotateZ(1)
    assert isinstance(out, vector.backends.awkward.MomentumArray2D)
    print(out)
    print(out.tolist())
    assert out.tolist() == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]

...

Running VECTOR_USE_AWKWARDV2=1 pytest -s prints the following -

[{rho: 1, phi: 1}], [], [{rho: 2, phi: 2}]]
[<MomentumArray2D [{rho: 1, phi: 1}] type='1 * Momentum2D[rho: int64, phi: f...'>, <MomentumArray2D [] type='0 * Momentum2D[rho: int64, phi: float64]'>, <MomentumArray2D [{rho: 2, phi: 2}] type='1 * Momentum2D[rho: int64, phi: f...'>]

and gives the same error as before (but at least I can print the awkward vectors now, thanks to setattr).

Pushing the commit for the full stacktrace.

Saransh-cpp · 2022-08-07T11:33:47Z

If I don't patch but replace all the awkward imports with awkward._v2 imports, this happens -

In [1]: import vector

In [2]: array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])

In [3]: array
Out[3]: <MomentumArray2D [[{rho: 1, phi: 0}], [], [{rho: 2, ...}]] type='3 * var * ...'>

In [4]: out = array.rotateZ(1)

In [5]: out
Out[5]: <MomentumArray2D [[{rho: 1, phi: 1}], [], [{rho: 2, ...}]] type='3 * var * ...'>

In [6]: out.tolist()
Out[6]: 
[<Array [{rho: 1, phi: 1}] type='1 * Momentum2D[rho: int64, phi: float64]'>,
 <Array [] type='0 * Momentum2D[rho: int64, phi: float64]'>,
 <Array [{rho: 2, phi: 2}] type='1 * Momentum2D[rho: int64, phi: float64]'>]

The behaviors observed after patching and after replacing are definitely different.

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

.github/workflows/ci.yml

Saransh-cpp · 2022-08-29T14:37:25Z

@henryiii, @jpivarski, this should be finally ready for a review! I have marked some tests with xfail because their failures come from awkward._v2.

jpivarski · 2022-08-30T15:53:41Z

src/vector/backends/awkward_constructors.py

-    fields = awkward.fields(akarray)
+    fields = awkward.fields(akarray).copy()


Okay, so I hadn't noticed that Awkward v2's fields are passed by reference, which exposes them to the danger that someone might modify them downstream:

v1:

>>> array = awkward.Array([{"x": 1, "y": 1.1}]) >>> fields = awkward.fields(array) >>> array <Array [{x: 1, y: 1.1}] type='1 * {"x": int64, "y": float64}'> >>> fields ['x', 'y'] >>> fields[0] = "XXX" >>> fields ['XXX', 'y'] >>> array <Array [{x: 1, y: 1.1}] type='1 * {"x": int64, "y": float64}'>

v2:

>>> array = awkward._v2.Array([{"x": 1, "y": 1.1}]) >>> fields = awkward._v2.fields(array) >>> array <Array [{x: 1, y: 1.1}] type='1 * {x: int64, y: float64}'> >>> fields ['x', 'y'] >>> fields[0] = "XXX" >>> fields ['XXX', 'y'] >>> array <Array [{XXX: 1, y: 1.1}] type='1 * {XXX: int64, y: float64}'>

It could be fixed here, in Awkward, or maybe here (to only suffer the list-copy when handing it off to a user, so that internal uses can still be by reference).

I'll use this comment to open an issue in Awkward. Once awkward.fields is guarded, your .copy() can be removed, but it can also not be removed with no consequences but a little performance.

jpivarski

This looks very good to me! I wouldn't want the switch between versions to be an environment variable in the long term, but it won't have to be in the long term.

I'll keep you in the loop on our splitting of the Awkward codebase into v2-only main and the status quo main-v1, and that would simplify the switch in your testing: it would just depend on Python package version number (because "awkward" in 2.0.0rc1 onward will simply be v2).

Saransh-cpp · 2022-08-30T16:41:02Z

Thanks for the review, @jpivarski! Yes, all this env variable stuff will be removed once vector officially supports awkward v2. I can take up the awkward issue too (which looks like a quick fix)!

Saransh-cpp · 2022-09-01T08:51:33Z

I'll wait for a review from @henryiii before merging this in (as he suggested the environment variable solution). .copy() can be removed in another PR when awkward makes a new release!

Saransh-cpp

Now that scikit-hep/awkward#1652 and scikit-hep/awkward#1650 are merged, we can make the following changes -

(Should wait till a new release. IMO it should be okay to make these changes in a follow-up PR.)

Saransh-cpp · 2022-09-01T17:16:49Z

tests/test_issues.py

+# this is a known issue in awkward._v2
+# see https://github.com/scikit-hep/awkward/issues/1600
+# TODO: ensure this passes once awkward v2 is out
+@pytest.mark.xfail(
+    strict=True if os.environ.get("VECTOR_USE_AWKWARDV2") is not None else False
+)


This can be removed.

Suggested change

# this is a known issue in awkward._v2

# see https://github.com/scikit-hep/awkward/issues/1600

# TODO: ensure this passes once awkward v2 is out

@pytest.mark.xfail(

strict=True if os.environ.get("VECTOR_USE_AWKWARDV2") is not None else False

)

Saransh-cpp · 2022-09-01T17:17:35Z

src/vector/backends/awkward_constructors.py

@@ -294,7 +314,7 @@ def Array(*args: typing.Any, **kwargs: typing.Any) -> typing.Any:

    if not _is_type_safe(array_type):
        raise TypeError("a coordinate must be of the type int or float")
-    fields = awkward.fields(akarray)
+    fields = awkward.fields(akarray).copy()


.copy() can be removed.

Suggested change

fields = awkward.fields(akarray).copy()

fields = awkward.fields(akarray)

Saransh-cpp marked this pull request as draft July 20, 2022 07:56

Saransh-cpp marked this pull request as ready for review July 21, 2022 04:06

Saransh-cpp requested a review from jpivarski July 21, 2022 04:06

Saransh-cpp changed the title ~~Upper cap awkward to v2~~ chore: support awkward v1 and v2 together Jul 21, 2022

henryiii force-pushed the saransh/upper-cap-awkward branch from 0351277 to 2f737b3 Compare July 22, 2022 20:18

Saransh-cpp force-pushed the saransh/upper-cap-awkward branch 2 times, most recently from 97c3093 to 97b5623 Compare July 29, 2022 15:52

Saransh-cpp and others added 6 commits August 5, 2022 13:14

Upper cap awkward to v2

9ef9392

Update environment.yml

5f986e2

Support both awkward v1 and v2

b75944a

refactor: run tests with awkward._v2

f63e203

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

Test awkward._v2 backend in workflows

ad4cc6c

Fix few tests

bb67283

Saransh-cpp force-pushed the saransh/upper-cap-awkward branch from 97b5623 to bb67283 Compare August 5, 2022 07:45

patch sys.modules['awkward']._v2 + consistent import names

af340d8

henryiii mentioned this pull request Aug 16, 2022

Consistency between __getitem__ and to_list when a Record is the top-level scikit-hep/awkward#1600

Closed

Saransh-cpp and others added 2 commits August 17, 2022 01:30

Revert some changes

a955f02

fix: move __getitem__ to Protocol

81a4793

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

chore: pre-commit autoupdate

8089636

Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>

henryiii reviewed Aug 26, 2022

View reviewed changes

.github/workflows/ci.yml Show resolved Hide resolved

henryiii and others added 5 commits August 26, 2022 14:31

ci: require Awkward RC for v2

7e702db

Update pickle file for v2

eaa8f56

Use quotes

5e381ce

Mark failing tests with xfail

f3382d9

Tidy up

56c83b8

Saransh-cpp requested a review from henryiii August 29, 2022 14:34

Saransh-cpp added this to the v0.9.0 milestone Aug 29, 2022

Saransh-cpp added tests More or better tests are needed dependencies Pull requests that update a dependency file labels Aug 29, 2022

Saransh-cpp mentioned this pull request Aug 29, 2022

Fix all tests marked with @pytest.mark.xfail once Awkward v2 is out #249

Closed

jpivarski reviewed Aug 30, 2022

View reviewed changes

jpivarski mentioned this pull request Aug 30, 2022

ak.fields (v2) passes a RecordArray's internal fields by reference scikit-hep/awkward#1648

Closed

jpivarski approved these changes Aug 30, 2022

View reviewed changes

Saransh-cpp commented Sep 1, 2022

View reviewed changes

Saransh-cpp force-pushed the saransh/upper-cap-awkward branch from 4d8e51f to c627111 Compare September 3, 2022 07:35

Sync with awkward v1.9.0

8dc18a6

Saransh-cpp force-pushed the saransh/upper-cap-awkward branch from c627111 to 8dc18a6 Compare September 3, 2022 07:36

henryiii approved these changes Sep 4, 2022

View reviewed changes

Saransh-cpp merged commit 1f09bb2 into scikit-hep:main Sep 4, 2022

Saransh-cpp deleted the saransh/upper-cap-awkward branch September 4, 2022 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: support `awkward` `v1` and `v2` together #226

chore: support `awkward` `v1` and `v2` together #226

Saransh-cpp commented Jul 20, 2022 •

edited

Loading

codecov-commenter commented Jul 20, 2022 •

edited

Loading

henryiii commented Jul 21, 2022

jpivarski commented Jul 21, 2022

Saransh-cpp commented Jul 21, 2022 •

edited

Loading

henryiii commented Jul 21, 2022

Saransh-cpp commented Jul 22, 2022 •

edited

Loading

henryiii commented Jul 22, 2022

Saransh-cpp commented Jul 23, 2022

Saransh-cpp commented Aug 6, 2022 •

edited

Loading

henryiii commented Aug 6, 2022

Saransh-cpp commented Aug 6, 2022

Saransh-cpp commented Aug 7, 2022

Saransh-cpp commented Aug 29, 2022

jpivarski Aug 30, 2022

jpivarski left a comment

Saransh-cpp commented Aug 30, 2022

Saransh-cpp commented Sep 1, 2022

Saransh-cpp left a comment

Saransh-cpp Sep 1, 2022

Saransh-cpp Sep 1, 2022

		fields = awkward.fields(akarray)
		fields = awkward.fields(akarray).copy()

chore: support awkward v1 and v2 together #226

chore: support awkward v1 and v2 together #226

Conversation

Saransh-cpp commented Jul 20, 2022 • edited Loading

codecov-commenter commented Jul 20, 2022 • edited Loading

Codecov Report

henryiii commented Jul 21, 2022

jpivarski commented Jul 21, 2022

Saransh-cpp commented Jul 21, 2022 • edited Loading

henryiii commented Jul 21, 2022

Saransh-cpp commented Jul 22, 2022 • edited Loading

henryiii commented Jul 22, 2022

Saransh-cpp commented Jul 23, 2022

Saransh-cpp commented Aug 6, 2022 • edited Loading

henryiii commented Aug 6, 2022

Saransh-cpp commented Aug 6, 2022

Saransh-cpp commented Aug 7, 2022

Saransh-cpp commented Aug 29, 2022

jpivarski Aug 30, 2022

Choose a reason for hiding this comment

jpivarski left a comment

Choose a reason for hiding this comment

Saransh-cpp commented Aug 30, 2022

Saransh-cpp commented Sep 1, 2022

Saransh-cpp left a comment

Choose a reason for hiding this comment

Saransh-cpp Sep 1, 2022

Choose a reason for hiding this comment

Saransh-cpp Sep 1, 2022

Choose a reason for hiding this comment

chore: support `awkward` `v1` and `v2` together #226

chore: support `awkward` `v1` and `v2` together #226

Saransh-cpp commented Jul 20, 2022 •

edited

Loading

codecov-commenter commented Jul 20, 2022 •

edited

Loading

Saransh-cpp commented Jul 21, 2022 •

edited

Loading

Saransh-cpp commented Jul 22, 2022 •

edited

Loading

Saransh-cpp commented Aug 6, 2022 •

edited

Loading