Allow for more expressive Array signatures #12

ramonhagenaars · 2020-02-06T21:01:56Z

See also issues #9, #10 and #11.

There have been several requests to extend the expressiveness of Array. I don't feel much for a sudden signature change of Array. Rather, I'd like to introduce a new type NDArray (which name I like more than Array anyway) that will "slowly" replace Array.

I have the following signature in mind:

Signature design
NDArray any dimension of any size of any type
NDArray[...] 1 dimension of any size of any type
NDArray[3] 1 dimension of size 3 of any type
NDArray[(3, 3, 5)] 3 dimensions (3 x 3 x 5) of any type
NDArray[(3, ..., 5)] 3 dimensions (3 x ? x 5) of any type
NDArray[(D1, 3, D1)] 3 dimensions (D1 x 3 x D1 where D1 is an nptyping constant that can be
imported to express a dimension variable, see #9 and #11) of any type

NDArray[int] any dimension of any size of type int
NDArray[..., int] 1 dimension of any size of type int
NDArray[(3, 3, 5), int] 3 dimensions (3 x 3 x 5) of type int
NDArray[(3, 3, 5), np.dtype('int16')] 3 dimensions (3 x 3 x 5) of type int16
NDArray[(3, 3), np.dtype([('f1', np.int16), ('f2', np.int16)])] 2 dimensions (3 x 3) with structured types

Process
The new NDArray is to replace the current Array. Once introduced, the original Array will become deprecated to be removed upon the minor release that follows next.

Before I start investing time into this, I'd love to hear your opinion on this. Please leave any feedback, any comments, any suggestions.

The text was updated successfully, but these errors were encountered:

nannau · 2020-02-07T00:26:44Z

Thank you!

I think the proposed signature design sounds great, and incorporates clearly how to embed the dimension/rank of arrays - my inquiry in #10. Renaming to NDArray is also more in the spirit of numpy arrays themselves. Having the sizes defined in a tuple (ndim > 1), and the dimensions/rank defined implicitly by the size of the tuple makes a lot of sense, too.

It would extend many of the efficiencies of type hinting into the science stack realm, which is an area that could greatly benefit from this!

I'd love to help, where I can.

jameshiebert · 2020-02-07T16:49:28Z

This seems like a solid proposal to me. It would cover the majority of our potential use cases, I wouldn't envision many down sides.

The only thing that I see that could be missing is possibly declaring rank without specifying the exact dimension sizes. Not sure how hard it would be to implement or even how useful it is from a typing perspective. But I know that we occasionally have arrays of fixed rank and length that might have their dimensions resized. E.g. a basic transpose operation would fit under this use case.

Edit: Nevermind paragraph two. A colleague pointed out to me that something like NDArray[(..., ...)] could cover exactly this use case.

alimanfoo · 2020-02-07T23:19:32Z

This sounds great. Would there be a way to name your own dimension variables? Could be very helpful as part of documenting the intent of each dimension. E.g., NDArray[(LAT, LON), float] or NDArray[(VARIANTS, SAMPLES, PLOIDY), int])?

ramonhagenaars · 2020-02-08T11:21:02Z

@alimanfoo , I think you would rather explicitly name the columns of a dimension, rather than the dimension itself. Correct me if I'm wrong.

This is what the signature of an array of coordinates would look like:
NDArray[(..., 2), float] indefinite number of rows, with 2 columns (lat, lon).

So in your case, you want to further elaborate on that 2. With the current design, you could declare the constants LAT = 1 and LON = 1. Then you could write:
NDArray[(..., LAT + LON), float]

We could take this one step further though, by introducing something that allows you to be more precise on what a column value should be:

from nptyping import NamedColumn

# NamedColumn takes a name and an optional predicate to validate a value.
lat = NamedColumn('lattitude', lambda x: x >= 0)
lon = NamedColumn('longitude', lambda x: x >= 0)

NDArray[(..., (lat, lon)), float]

The optional predicate of a NamedColumn would allow the isinstance check of NDArray to validate the correctness of the values of those columns.

With this, you could also write:

from nptyping import NamedColumn

lat = NamedColumn('lattitude', lambda x: isinstance(x, float) and x >= 0)
lon = NamedColumn('longitude', lambda x: isinstance(x, float) and x >= 0)

NDArray[(..., (lat, lon))]  # indefinite number of coordinates
NDArray[(5, (lat, lon))]    # 5 coordinates

Or even something like this:

somewhere_in_europe = NamedColumn('coordinate somewhere in Europe', lambda x: is_in_polygon(x, EU))
somewhere_in_usa = NamedColumn('coordinate somewhere in USA', lambda x: is_in_polygon(x, USA))

NDArray[((somewhere_in_europe, somewhere_in_usa), (lat, lon))]    # 2 coordinates

One needs to keep in mind that instance checks will get more expensive with the typings being more precise. I would recommend type checking only during development anyway, not in a production environment.

Does this extension with NamedColumn make sense? It may be introduced in a following stage after releasing the NDArray.

ramonhagenaars · 2020-04-04T18:21:08Z

The major part of this issue have been addressed and is released in v.1.0.0. Next in line are the dimension variables and the named columns.

alimanfoo · 2020-04-04T19:13:19Z

Great news!

nannau · 2020-04-08T18:24:02Z

Awesome news, and work!

petered · 2022-01-07T12:43:57Z

This is great and very useful. Since we use arrays everywhere, it would be really nice to have a less "brackety" syntax that allows you to name your dimensions to signify that you expect consistency, like:

def compute_image_mask(image: NDArray['H,W,3', np.uint8]) -> NDArray['H,W', bool]

petered · 2022-01-07T12:47:39Z

Second thing:

NDArray[..., int] 1 dimension of any size of type int

I think this conflicts with the way Ellipsis (...) is used in numpy and could cause confusion. In Numpy ... means "all remaining axes". e.g.

>>> arr = np.random.randn(5, 4, 3)
>>> arr[..., 0].shape
(5, 4)
>>> arr[..., 0, 0].shape
(5,)
>>> arr[..., 0, 0, 0].shape
()

Whereas : means "an axis":

>>> arr[:, 0].shape
(5, 3)
>>> arr[:, 0, 0].shape
(5,)
>>> arr[:, 0, 0, 0].shape
IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed

Since : cannot be used as an object in numpy, could we just use typeing.Any instead? e.g. NDArray[Any, int] to represent and arbitrarily-sized 1-dimensional array (or a named dimension like NDArray['N_points', int] as suggested above)

I hope it's not too late to change, but I would propose that ... be used to signify "zero or more axes", as this is also a very useful thing to be able to type. Trivial example:

def take_xy_locations(points: NDArray[(..., 3), float]) -> NDArray[(..., 2), float]:
    return points[..., :2]

assert take_xy_locations(np.random.randn(4, 3)).shape == (4, 2)
assert take_xy_locations(np.random.randn(5, 4, 3)).shape == (5, 4, 2)

kevinsuedmersen · 2023-03-20T20:14:05Z

Hi @ramonhagenaars

Thanks for this really cool repo.

I'm really looking forward to the NamedColumn feature as described by you in this commend. Do you know when it can be released or can you recommend a workaround in the meantime?

ramonhagenaars added help wanted Extra attention is needed feature A new feature labels Feb 6, 2020

This was referenced Feb 6, 2020

Rank of ndarray #10

Closed

Variables in shape #11

Closed

ramonhagenaars mentioned this issue Mar 30, 2020

Release/1.0.0 #15

Merged

ramonhagenaars added WIP and removed help wanted Extra attention is needed labels Apr 4, 2020

ramonhagenaars mentioned this issue Jun 25, 2020

Named dimensions in type-hint shapes #24

Closed

EricCousineau-TRI mentioned this issue Aug 10, 2020

ndarray: Add naive Dimension implementation #28

Closed

ramonhagenaars mentioned this issue Mar 31, 2022

Release/2.0.0 #60

Merged

ramonhagenaars closed this as completed in #60 Apr 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for more expressive Array signatures #12

Allow for more expressive Array signatures #12

ramonhagenaars commented Feb 6, 2020 •

edited

nannau commented Feb 7, 2020

jameshiebert commented Feb 7, 2020 •

edited

alimanfoo commented Feb 7, 2020

ramonhagenaars commented Feb 8, 2020 •

edited

ramonhagenaars commented Apr 4, 2020

alimanfoo commented Apr 4, 2020

nannau commented Apr 8, 2020

petered commented Jan 7, 2022

petered commented Jan 7, 2022 •

edited

kevinsuedmersen commented Mar 20, 2023 •

edited

Allow for more expressive Array signatures #12

Allow for more expressive Array signatures #12

Comments

ramonhagenaars commented Feb 6, 2020 • edited

nannau commented Feb 7, 2020

jameshiebert commented Feb 7, 2020 • edited

alimanfoo commented Feb 7, 2020

ramonhagenaars commented Feb 8, 2020 • edited

ramonhagenaars commented Apr 4, 2020

alimanfoo commented Apr 4, 2020

nannau commented Apr 8, 2020

petered commented Jan 7, 2022

petered commented Jan 7, 2022 • edited

kevinsuedmersen commented Mar 20, 2023 • edited

ramonhagenaars commented Feb 6, 2020 •

edited

jameshiebert commented Feb 7, 2020 •

edited

ramonhagenaars commented Feb 8, 2020 •

edited

petered commented Jan 7, 2022 •

edited

kevinsuedmersen commented Mar 20, 2023 •

edited