Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalar values/shapeless arrays related issues #1055

Closed
bfis opened this issue Aug 10, 2021 · 3 comments · Fixed by #1056
Closed

Scalar values/shapeless arrays related issues #1055

bfis opened this issue Aug 10, 2021 · 3 comments · Fixed by #1056
Labels
bug The problem described is something that must be fixed

Comments

@bfis
Copy link
Contributor

bfis commented Aug 10, 2021

Version of Awkward Array

1.4.0

Description and code to reproduce

The support for shapeless arrays (i.e. numpy shape ()) seems to be either (intentionally?) missing or broken.

In case it is intentionally missing:

Other code, in particular functions that require a scalar argument (i.e. fill_none) may fail to operate as expected. For example, it is not possible to provide a specific-typed scalar fill value argument for fill_none which in turn will always convert the result to the default type:

a = ak.values_astype(ak.Array([1, None]), np.float32)
# value: <Array [1, None] type='2 * ?float32'>

ak.fill_none(a, 0)
# output: <Array [1, 0] type='2 * float64'>
# type gets changed, expected - the same behavior as in numpy

ak.fill_none(a, np.float32(0))
# output: <Array [1, 0] type='2 * float64'>
# type "hint" ignored, unexpected but tolerate since a rarely-used type i.e. not an (nd)array

ak.fill_none(a, np.array([0], dtype=np.float32))
# output: <Array [1, [0]] type='2 * union[float32, 1 * float32]'>
# type is correctly retained, working as expected

ak.fill_none(a, np.array(0, dtype=np.float32))
# output: <Array [1, [0]] type='2 * union[float32, 1 * float32]'>
# type retained, but superfluous dimension inserted

During the internal conversion to an awkward type/layout the superfluous dimension gets inserted.

In case that there is supposed to be support for shapeless arrays, this illustrates the inconsistency:

a = np.array(0) # value: array(0)
b = ak.Array(a) # value: <Array [0] type='1 * int64'>
c = ak.to_numpy(b) # value: array([0])
assert a.shape == c.shape # this fails
@bfis bfis added the bug (unverified) The problem described would be a bug, but needs to be triaged label Aug 10, 2021
@jpivarski
Copy link
Member

Zero-dimensional shapes were deliberately left out, since this is how we express a scalar result. That is, we don't make a distinction between what NumPy would call float(0), np.float64(0), and np.array(0, np.float64). In Awkward 1.x, scalars come out as Python numbers (e.g. float(0)), but in Awkward 2.x development, we've started returning scalars as NumPy scalars (e.g. np.float64(0)), to allow results to keep their type.

I don't think the distinction NumPy makes between np.float64(0) and np.array(0, np.float64) is a useful one to make: neither is an array in the sense of having a length or being able to access items with square brackets. It is useful to be able to describe a scalar with types of different precision, though.

ak.fill_none has changed since 1.4.0; it now uses the type hint. (I remember that being a long-standing wish-list item, but I can't find the closed issue.)

>>> ak.fill_none(a, np.float32(0))
<Array [1, 0] type='2 * float32'>
>>> ak.fill_none(a, np.array(0, np.float32))
<Array [1, 0] type='2 * float32'>

On the other hand, it looks like we broke filling with a NumPy array in doing so:

>>> ak.fill_none(a, np.array([0], np.float32), axis=-1)
<Array [1, 0] type='2 * float32'>
>>> ak.fill_none(a, np.array([[0]], np.float32), axis=-1)
<Array [1, [0]] type='2 * union[float32, 1 * float32]'>

That's off by one dimension. There's no such trouble when the fill value is a non-NumPy iterable.

>>> ak.fill_none(a, 0, axis=-1)
<Array [1, 0] type='2 * float64'>
>>> ak.fill_none(a, [0], axis=-1)
<Array [1, [0]] type='2 * union[float32, 1 * int64]'>
>>> ak.fill_none(a, [[0]], axis=-1)
<Array [1, [[0]]] type='2 * union[float32, 1 * var * int64]'>

I bet that will be an easy fix; I'll do it now.

@jpivarski jpivarski linked a pull request Aug 10, 2021 that will close this issue
@jpivarski jpivarski added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Aug 10, 2021
@jpivarski
Copy link
Member

This is what's supposed to happen:

https://github.com/scikit-hep/awkward-1.0/blob/24397d8360a6760ebc99e8122aa25e799b6c02f9/tests/test_1055-fill_none-numpy-dimension.py#L11-L38

In case that there is supposed to be support for shapeless arrays, this illustrates the inconsistency:

a = np.array(0) # value: array(0)
b = ak.Array(a) # value: <Array [0] type='1 * int64'>
c = ak.to_numpy(b) # value: array([0])
assert a.shape == c.shape # this fails

On this point, I suppose we could make high-level functions (e.g. ak.from_numpy, ak.Array constructor) refuse to convert zero-dimensional arrays. Then, at least, it wouldn't give the impression of recognizing zero-dimensional arrays?

@bfis
Copy link
Contributor Author

bfis commented Aug 10, 2021

Thanks for the quick and helpful replies.

[...]
On this point, I suppose we could make high-level functions (e.g. ak.from_numpy, ak.Array constructor) refuse to convert zero-dimensional arrays. Then, at least, it wouldn't give the impression of recognizing zero-dimensional arrays?

This seems like a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants