-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: support awkward
v1
and v2
together
#226
chore: support awkward
v1
and v2
together
#226
Conversation
Codecov Report
@@ Coverage Diff @@
## main #226 +/- ##
=======================================
Coverage 82.63% 82.64%
=======================================
Files 96 96
Lines 10524 10525 +1
=======================================
+ Hits 8697 8698 +1
Misses 1827 1827
Continue to review full report at Codecov.
|
How hard would it be to make it work with both? Caps are highly problematic. If someone has the following file:
Modern pip and other modern solvers will just pick the last version of Vector without the cap (& custom error) and awkward 2 - that's a logically valid solution. |
Well, in a major version update, you're allowed to change the public API, and a few things get renamed that would be visible here. I've found a couple by searching with grep, but there's nothing like actually doing tests.
The
There it is again.
This string, badly named vector/src/vector/backends/awkward.py Lines 1601 to 1603 in cac88a2
In v2, So, after reviewing the Vector code, it has been written with enough decoupling from Awkward that it's possible to make it work for both v1 and v2. Testing that will be tricky, since all references to " |
Thanks for the review, @henryiii, and @jpivarski!
Oh, I had no idea about this behavior. I have made the required changes, and now
Yes, the tests are tricky. Still searching for ways to test both awkward versions, but no luck so far. |
awkward
v1
and v2
together
Tricky part is loading it - will have to check with PyTest - otherwise we should be able to just monkeypatch it into sys.modules. At the worst, it could be a envvar setting when you run the tests. |
I tried using
global awkward
awkward = __import__('awkward', globals())
def test_basic(monkeypatch):
monkeypatch.setattr("vector.backends.awkward.awkward", awkward._v2, raising=True)
monkeypatch.setattr("vector.backends.awkward_constructors.awkward", awkward._v2, raising=True)
...
def _is_awkward_v2(obj: typing.Any) -> bool:
return packaging.version.Version(
importlib_metadata.version("awkward")
) >= packaging.version.Version("2") or "awkward._v2" in inspect.getmodule(obj).__name__ Using environment variable seems to work, and with |
I'll push what I was thinking about, can be run with:
You can revert and continue with your proposal above, or continue on this path. But I thought this was the easiest thing to show what I was thinking about. |
0351277
to
2f737b3
Compare
This solution looks much simple! Thanks, @henryiii! I made some fixes after your commit, but the tests are still failing due to some reasons. I was able to pinpoint a couple of them -
@@ -314,7 +315,7 @@ def Array(*args: typing.Any, **kwargs: typing.Any) -> typing.Any:
if not _is_type_safe(array_type):
raise TypeError("a coordinate must be of the type int or float")
- fields = awkward.fields(akarray)
+ fields = awkward.fields(akarray).copy()
is_momentum, dimension, names, arrays = _check_names(akarray, fields)
Changing @@ -65,7 +65,7 @@ def test_rotateZ():
array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])
out = array.rotateZ(1)
assert isinstance(out, vector.backends.awkward.MomentumArray2D)
- assert out.tolist() == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]
+ assert [arr.tolist() for arr in out.tolist()] == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]
array = vector.Array(
[[{"x": 1, "y": 0, "wow": 99}], [], [{"x": 2, "y": 1, "wow": 123}]] Running this test manually -
Running tests using
Full stacktrace - Trace:_____________________________________________________ test_rotateZ _____________________________________________________
def test_rotateZ():
array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])
out = array.rotateZ(1)
assert isinstance(out, vector.backends.awkward.MomentumArray2D)
> assert [arr.tolist() for arr in out.tolist()] == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]
tests/backends/test_awkward.py:68:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/saransh/.local/lib/python3.8/site-packages/numpy/lib/mixins.py:21: in func
return ufunc(self, other)
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/highlevel.py:2001: in __array_ufunc__
return ak._v2._connect.numpy.array_ufunc(ufunc, method, inputs, kwargs)
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_connect/numpy.py:269: in array_ufunc
out = ak._v2._broadcasting.broadcast_and_apply(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:738: in broadcast_and_apply
out = apply_step(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:718: in apply_step
return continuation()
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:483: in continuation
outcontent = apply_step(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:718: in apply_step
return continuation()
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:235: in continuation
return apply_step(
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:718: in apply_step
return continuation()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def continuation():
# Any EmptyArrays?
if any(isinstance(x, EmptyArray) for x in inputs):
nextinputs = [
x.toNumpyArray(np.float64, nplike) if isinstance(x, EmptyArray) else x
for x in inputs
]
return apply_step(
nplike,
nextinputs,
action,
depth,
copy.copy(depth_context),
lateral_context,
behavior,
options,
)
# Any NumpyArrays with ndim != 1?
elif any(isinstance(x, NumpyArray) and x.data.ndim != 1 for x in inputs):
nextinputs = [
x.toRegularArray() if isinstance(x, NumpyArray) else x for x in inputs
]
return apply_step(
nplike,
nextinputs,
action,
depth,
copy.copy(depth_context),
lateral_context,
behavior,
options,
)
# Any IndexedArrays?
elif any(isinstance(x, IndexedArray) for x in inputs):
nextinputs = [
x.project() if isinstance(x, IndexedArray) else x for x in inputs
]
return apply_step(
nplike,
nextinputs,
action,
depth,
copy.copy(depth_context),
lateral_context,
behavior,
options,
)
# Any UnionArrays?
elif any(isinstance(x, UnionArray) for x in inputs):
if not nplike.known_data:
numtags, length = [], None
for x in inputs:
if isinstance(x, UnionArray):
numtags.append(len(x.contents))
if length is None:
length = x.tags.data.shape[0]
assert length is not None
all_combos = list(itertools.product(*[range(x) for x in numtags]))
tags = nplike.empty(length, dtype=np.int8)
index = nplike.empty(length, dtype=np.int64)
numoutputs, outcontents = None, []
for combo in all_combos:
nextinputs = []
i = 0
for x in inputs:
if isinstance(x, UnionArray):
nextinputs.append(x._contents[combo[i]])
i += 1
else:
nextinputs.append(x)
outcontents.append(
apply_step(
nplike,
nextinputs,
action,
depth,
copy.copy(depth_context),
lateral_context,
behavior,
options,
)
)
assert isinstance(outcontents[-1], tuple)
if numoutputs is None:
numoutputs = len(outcontents[-1])
else:
assert numoutputs == len(outcontents[-1])
assert numoutputs is not None
else:
tagslist, numtags, length = [], [], None
for x in inputs:
if isinstance(x, UnionArray):
tagslist.append(x.tags.raw(nplike))
numtags.append(len(x.contents))
if length is None:
length = tagslist[-1].shape[0]
elif length != tagslist[-1].shape[0]:
raise ValueError(
"cannot broadcast UnionArray of length {} "
"with UnionArray of length {}{}".format(
length, tagslist[-1].shape[0], in_function(options)
)
)
assert length is not None
combos = nplike.stack(tagslist, axis=-1)
all_combos = nplike.array(
list(itertools.product(*[range(x) for x in numtags])),
dtype=[(str(i), combos.dtype) for i in range(len(tagslist))],
)
combos = combos.view(
[(str(i), combos.dtype) for i in range(len(tagslist))]
).reshape(length)
tags = nplike.empty(length, dtype=np.int8)
index = nplike.empty(length, dtype=np.int64)
numoutputs, outcontents = None, []
for tag, combo in enumerate(all_combos):
mask = combos == combo
tags[mask] = tag
index[mask] = nplike.arange(
nplike.count_nonzero(mask), dtype=np.int64
)
nextinputs = []
i = 0
for x in inputs:
if isinstance(x, UnionArray):
nextinputs.append(x[mask].project(combo[str(i)]))
i += 1
elif isinstance(x, Content):
nextinputs.append(x[mask])
else:
nextinputs.append(x)
outcontents.append(
apply_step(
nplike,
nextinputs,
action,
depth,
copy.copy(depth_context),
lateral_context,
behavior,
options,
)
)
assert isinstance(outcontents[-1], tuple)
if numoutputs is None:
numoutputs = len(outcontents[-1])
else:
assert numoutputs == len(outcontents[-1])
assert numoutputs is not None
return tuple(
UnionArray(
Index8(tags), Index64(index), [x[i] for x in outcontents]
).simplify_uniontype()
for i in range(numoutputs)
)
# Any option-types?
elif any(isinstance(x, optiontypes) for x in inputs):
if nplike.known_data:
mask = None
for x in inputs:
if isinstance(x, optiontypes):
m = x.mask_as_bool(valid_when=False, nplike=nplike)
if mask is None:
mask = m
else:
mask = nplike.bitwise_or(mask, m, out=mask)
nextmask = Index8(mask.view(np.int8))
index = nplike.full(mask.shape[0], -1, dtype=np.int64)
index[~mask] = nplike.arange(
mask.shape[0] - nplike.count_nonzero(mask), dtype=np.int64
)
index = Index64(index)
if any(not isinstance(x, optiontypes) for x in inputs):
nextindex = nplike.arange(mask.shape[0], dtype=np.int64)
nextindex[mask] = -1
nextindex = Index64(nextindex)
nextinputs = []
for x in inputs:
if isinstance(x, optiontypes):
nextinputs.append(x.project(nextmask))
elif isinstance(x, Content):
nextinputs.append(
IndexedOptionArray(nextindex, x).project(nextmask)
)
else:
nextinputs.append(x)
else:
index = None
nextinputs = []
for x in inputs:
if isinstance(x, optiontypes):
index = Index64(nplike.empty((x.length,), np.int64))
nextinputs.append(x.content)
else:
nextinputs.append(x)
assert index is not None
outcontent = apply_step(
nplike,
nextinputs,
action,
depth,
copy.copy(depth_context),
lateral_context,
behavior,
options,
)
assert isinstance(outcontent, tuple)
return tuple(
IndexedOptionArray(index, x).simplify_optiontype() for x in outcontent
)
# Any list-types?
elif any(isinstance(x, listtypes) for x in inputs):
# All regular?
if all(
isinstance(x, RegularArray) or not isinstance(x, listtypes)
for x in inputs
):
maxsize = max(x.size for x in inputs if isinstance(x, RegularArray))
if nplike.known_data:
for x in inputs:
if isinstance(x, RegularArray):
if maxsize > 1 and x.size == 1:
tmpindex = Index64(
nplike.repeat(
nplike.arange(x.length, dtype=np.int64), maxsize
)
)
nextinputs = []
for x in inputs:
if isinstance(x, RegularArray):
if maxsize > 1 and x.size == 1:
nextinputs.append(
IndexedArray(
tmpindex, x.content[: x.length * x.size]
).project()
)
elif x.size == maxsize:
nextinputs.append(x.content[: x.length * x.size])
else:
raise ValueError(
"cannot broadcast RegularArray of size "
"{} with RegularArray of size {} {}".format(
x.size, maxsize, in_function(options)
)
)
else:
nextinputs.append(x)
else:
nextinputs = []
for x in inputs:
if isinstance(x, RegularArray):
nextinputs.append(x.content)
else:
nextinputs.append(x)
length = None
for x in inputs:
if isinstance(x, Content):
if length is None:
length = x.length
elif nplike.known_shape:
assert length == x.length
assert length is not None
outcontent = apply_step(
nplike,
nextinputs,
action,
depth + 1,
copy.copy(depth_context),
lateral_context,
behavior,
options,
)
assert isinstance(outcontent, tuple)
return tuple(RegularArray(x, maxsize, length) for x in outcontent)
elif not nplike.known_data or not nplike.known_shape:
offsets = None
nextinputs = []
for x in inputs:
if isinstance(x, ListOffsetArray):
offsets = Index64(
nplike.empty((x.offsets.data.shape[0],), np.int64)
)
nextinputs.append(x.content)
elif isinstance(x, ListArray):
offsets = Index64(
nplike.empty((x.starts.data.shape[0] + 1,), np.int64)
)
nextinputs.append(x.content)
elif isinstance(x, RegularArray):
nextinputs.append(x.content)
else:
nextinputs.append(x)
assert offsets is not None
outcontent = apply_step(
nplike,
nextinputs,
action,
depth + 1,
copy.copy(depth_context),
lateral_context,
behavior,
options,
)
assert isinstance(outcontent, tuple)
return tuple(ListOffsetArray(offsets, x) for x in outcontent)
# Not all regular, but all same offsets?
# Optimization: https://github.com/scikit-hep/awkward-1.0/issues/442
elif all_same_offsets(nplike, inputs):
lencontent, offsets, starts, stops = None, None, None, None
nextinputs = []
for x in inputs:
if isinstance(x, ListOffsetArray):
offsets = x.offsets
lencontent = offsets[-1]
nextinputs.append(x.content[:lencontent])
elif isinstance(x, ListArray):
starts, stops = x.starts, x.stops
if starts.length == 0 or stops.length == 0:
nextinputs.append(x.content[:0])
else:
lencontent = nplike.max(stops)
nextinputs.append(x.content[:lencontent])
else:
nextinputs.append(x)
outcontent = apply_step(
nplike,
nextinputs,
action,
depth + 1,
copy.copy(depth_context),
lateral_context,
behavior,
options,
)
assert isinstance(outcontent, tuple)
if isinstance(offsets, Index):
return tuple(
ListOffsetArray(offsets, x).toListOffsetArray64(False)
for x in outcontent
)
elif isinstance(starts, Index) and isinstance(stops, Index):
return tuple(
ListArray(starts, stops, x).toListOffsetArray64(False)
for x in outcontent
)
else:
raise AssertionError(
"unexpected offsets, starts: {}, {}".format(
type(offsets), type(starts)
)
)
# General list-handling case: the offsets of each list may be different.
else:
fcns = [
ak._v2._util.custom_broadcast(x, behavior)
if isinstance(x, Content)
else None
for x in inputs
]
first, secondround = None, False
for x, fcn in zip(inputs, fcns):
if (
isinstance(x, listtypes)
and not isinstance(x, RegularArray)
and fcn is None
):
first = x
break
if first is None:
secondround = True
for x in inputs:
if isinstance(x, listtypes) and not isinstance(x, RegularArray):
first = x
break
offsets = first._compact_offsets64(True)
nextinputs = []
for x, fcn in zip(inputs, fcns):
if callable(fcn) and not secondround:
nextinputs.append(fcn(x, offsets))
elif isinstance(x, listtypes):
nextinputs.append(x._broadcast_tooffsets64(offsets).content)
# Handle implicit left-broadcasting (non-NumPy-like broadcasting).
elif options["left_broadcast"] and isinstance(x, Content):
nextinputs.append(
RegularArray(x, 1, x.length)
._broadcast_tooffsets64(offsets)
.content
)
else:
nextinputs.append(x)
outcontent = apply_step(
nplike,
nextinputs,
action,
depth + 1,
copy.copy(depth_context),
lateral_context,
behavior,
options,
)
assert isinstance(outcontent, tuple)
return tuple(ListOffsetArray(offsets, x) for x in outcontent)
# Any RecordArrays?
elif any(isinstance(x, RecordArray) for x in inputs):
if not options["allow_records"]:
> raise ValueError(f"cannot broadcast records{in_function(options)}")
E ValueError: cannot broadcast recordsin equal
/home/saransh/.local/lib/python3.8/site-packages/awkward/_v2/_broadcasting.py:644: ValueError |
97c3093
to
97b5623
Compare
Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>
97b5623
to
bb67283
Compare
Monkeypatching
|
I’d guess Awkward is not careful to do relative imports. But you could try adding |
Thanks! This seems to work! But
conftest.py - import os
import sys
if os.environ.get("VECTOR_USE_AWKWARDV2", None):
import awkward._v2
sys.modules["awkward"] = awkward._v2
setattr(sys.modules["awkward"], "_v2", awkward._v2) Ignoring that error, the tests still fail as def test_rotateZ():
array = vector.Array([[{"pt": 1, "phi": 0}], [], [{"pt": 2, "phi": 1}]])
out = array.rotateZ(1)
assert isinstance(out, vector.backends.awkward.MomentumArray2D)
print(out)
print(out.tolist())
assert out.tolist() == [[{"rho": 1, "phi": 1}], [], [{"rho": 2, "phi": 2}]]
... Running
and gives the same error as before (but at least I can print the awkward vectors now, thanks to Pushing the commit for the full stacktrace. |
If I don't patch but replace all the
The behaviors observed after patching and after replacing are definitely different. |
Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>
Signed-off-by: Henry Schreiner <henryschreineriii@gmail.com>
@henryiii, @jpivarski, this should be finally ready for a review! I have marked some tests with |
fields = awkward.fields(akarray) | ||
fields = awkward.fields(akarray).copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I hadn't noticed that Awkward v2's fields are passed by reference, which exposes them to the danger that someone might modify them downstream:
v1:
>>> array = awkward.Array([{"x": 1, "y": 1.1}])
>>> fields = awkward.fields(array)
>>> array
<Array [{x: 1, y: 1.1}] type='1 * {"x": int64, "y": float64}'>
>>> fields
['x', 'y']
>>> fields[0] = "XXX"
>>> fields
['XXX', 'y']
>>> array
<Array [{x: 1, y: 1.1}] type='1 * {"x": int64, "y": float64}'>
v2:
>>> array = awkward._v2.Array([{"x": 1, "y": 1.1}])
>>> fields = awkward._v2.fields(array)
>>> array
<Array [{x: 1, y: 1.1}] type='1 * {x: int64, y: float64}'>
>>> fields
['x', 'y']
>>> fields[0] = "XXX"
>>> fields
['XXX', 'y']
>>> array
<Array [{XXX: 1, y: 1.1}] type='1 * {XXX: int64, y: float64}'>
It could be fixed here, in Awkward, or maybe here (to only suffer the list-copy when handing it off to a user, so that internal uses can still be by reference).
I'll use this comment to open an issue in Awkward. Once awkward.fields
is guarded, your .copy()
can be removed, but it can also not be removed with no consequences but a little performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good to me! I wouldn't want the switch between versions to be an environment variable in the long term, but it won't have to be in the long term.
I'll keep you in the loop on our splitting of the Awkward codebase into v2-only main
and the status quo main-v1
, and that would simplify the switch in your testing: it would just depend on Python package version number (because "awkward
" in 2.0.0rc1
onward will simply be v2).
Thanks for the review, @jpivarski! Yes, all this env variable stuff will be removed once |
I'll wait for a review from @henryiii before merging this in (as he suggested the environment variable solution). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that scikit-hep/awkward#1652 and scikit-hep/awkward#1650 are merged, we can make the following changes -
(Should wait till a new release. IMO it should be okay to make these changes in a follow-up PR.)
tests/test_issues.py
Outdated
# this is a known issue in awkward._v2 | ||
# see https://github.com/scikit-hep/awkward/issues/1600 | ||
# TODO: ensure this passes once awkward v2 is out | ||
@pytest.mark.xfail( | ||
strict=True if os.environ.get("VECTOR_USE_AWKWARDV2") is not None else False | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed.
# this is a known issue in awkward._v2 | |
# see https://github.com/scikit-hep/awkward/issues/1600 | |
# TODO: ensure this passes once awkward v2 is out | |
@pytest.mark.xfail( | |
strict=True if os.environ.get("VECTOR_USE_AWKWARDV2") is not None else False | |
) |
@@ -294,7 +314,7 @@ def Array(*args: typing.Any, **kwargs: typing.Any) -> typing.Any: | |||
|
|||
if not _is_type_safe(array_type): | |||
raise TypeError("a coordinate must be of the type int or float") | |||
fields = awkward.fields(akarray) | |||
fields = awkward.fields(akarray).copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.copy()
can be removed.
fields = awkward.fields(akarray).copy() | |
fields = awkward.fields(akarray) |
4d8e51f
to
c627111
Compare
c627111
to
8dc18a6
Compare
No description provided.