Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add annotations to the last 8 functions in numpy.core.fromnumeric #16729

Merged
merged 6 commits into from
Aug 24, 2020

Conversation

BvB93
Copy link
Member

@BvB93 BvB93 commented Jul 2, 2020

This pull requests adds annotations to 8 functions numpy.core.fromnumeric,
thus adding annotations to the last untyped functions from aforementioned module.

The functions in question are:

  • prod()
  • cumprod()
  • ndim()
  • size()
  • around()
  • mean()
  • std()
  • var()

The 8 respective functions, all located in `np.core.fromnumeric`, consist of:
* `prod`
* `cumprod`
* `ndim`
* `size`
* `around`
* `mean`
* `std`
* `var`
np.prod(a, axis=1.0) # E: No overload variant of "prod" matches argument type
np.prod(a, out=False) # E: No overload variant of "prod" matches argument type
np.prod(a, keepdims=1.0) # E: No overload variant of "prod" matches argument type
np.prod(a, initial=int) # E: No overload variant of "prod" matches argument type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is legal for suitable a:

>>> a = np.array([], dtype=object)
>>> np.prod(a, initial=int)
int

Or for an almost plausible use:

>>> from ctypes import c_uint8
>>> shape = np.array([1, 2, 3])
>>> np.prod(shape.astype(object), initial=c_uint8)
# a ctypes array type
numpy.core.fromnumeric.c_ubyte_Array_1_Array_2_Array_3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, this also proves that the return type of np.prod can be anything, not just number.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, this also proves that the return type of np.prod can be anything, not just number.

If I'm not mistaken this is limited to, the notoriously difficult to type, object arrays.
If so I feel it'd be better to leave things as they are for now, especially since this appears to be a very specific (seemingly undocumented?) situation.

The alternative would be to set number and the return type to Any,
which is practically useless from a typing standpoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

especially since this appears to be a very specific (seemingly undocumented?) situation.

How so? initial is documented as a "scalar", not as a number.

Perhaps the compromise is to stick # TODO: This is actually legal or similar by each test case that currently expects and results in an error, but should not.

Copy link
Member Author

@BvB93 BvB93 Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? initial is documented as a "scalar", not as a number.

That's actually a good point; I more or less conflated scalar with numerical scalar here.

Perhaps the compromise is to stick # TODO: This is actually legal or similar by each test case that currently expects and results in an error, but should not.

I agree, I'll push an update in a bit.

As for the future:

What I've gathered is, in the case of object arrays, that np.prod() simply falls back to the .__mult__() method of the passed arrays individual elements and hence all that's required of initial is that its type is compatible with aforementioned elements.
For object arrays we may or may not be able to easily express this with a __mult__ Protocol once ndarray is generic over its dtype, though this will depend on the exact implementation details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the compromise is to stick # TODO: This is actually legal or similar by each test case that currently expects and results in an error, but should not.

Added in 3bc51b8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For object arrays we may or may not be able to easily express this with a __mult__ Protocol once ndarray is generic over its dtype, though this will depend on the exact implementation details.

I wouldn't personally bother. Simply having prod(ndarray[int], initial: int) -> int, prod(ndarray[object], initial: object) -> object etc is all that's really needed. The details of exactly what the object loops of ufuncs do and do not support is not really interesting.

numpy/__init__.pyi Outdated Show resolved Hide resolved
numpy/__init__.pyi Outdated Show resolved Hide resolved
Comment on lines 1326 to 1334
@overload
def var(
a: ArrayLike,
axis: None = ...,
dtype: DtypeLike = ...,
out: Optional[ndarray] = ...,
ddof: int = ...,
keepdims: Literal[False] = ...,
) -> number: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number isn't correct when out is specified:

>>> np.var([1], out=np.zeros(()))
array(0.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies for most but not all functions in this patch with out.

Copy link
Member Author

@BvB93 BvB93 Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies for most but not all functions in this patch with out.

Considering the widespread nature of this issue I'd propose to save it for a future maintenance pull request and fix it all at once (as there are already quite a few older annotated functions with out parameter).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime then, it would probably be a good idea to annotate these with Union[number, ndarray] - weak annotations are better than incorrect ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it turns out is fairly easy way to fix this: out: Optional[ndarray] must be changed into out: None when a number is returned.

I have noticed some inconsistent behavior when using out in combination with passing a number though,
as in such case another number is returned rather than a ndarray.
Builtin scalars, builtin sequences and ndarrays do all return an ndarray (see example below).

In [1]: import numpy as np                                                                                                  

In [2]: np.__version__                                                                                                      
Out[2]: '1.20.0.dev0+62f26cf'

In [3]: np.var(np.int64(1), out=np.zeros(()))                                                                               
Out[3]: 0.0

In [4]: np.var(1, out=np.zeros(()))                                                                                         
Out[4]: array(0.)

In [5]: np.var([1], out=np.zeros(()))                                                                                       
Out[5]: array(0.)

In [6]: np.var(np.ones(1), out=np.zeros(()))                                                                                
Out[6]: array(0.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is arguably a bug in np.var, when out is given, it must be the return value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number isn't correct when out is specified:

Fixed as of 5a42440.

Copy link
Member Author

@BvB93 BvB93 Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is arguably a bug in np.var, when out is given, it must be the return value.

I'll double check which (currently annotated) functions are affected by this this and create an issue afterwards (see #16734).

Bas van Beek added 2 commits July 2, 2020 17:39
Clarified that the comment holds for all `np.ufunc.reduce()` wrappers
Return `out` if `out != None`
keepdims: Literal[False] = ...,
initial: _NumberLike = ...,
where: _ArrayLikeBool = ...,
) -> number: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming we are allowing number subtypes for one but not the other why is that ?
In other words, why is there a difference in return types of the above two overloads ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key difference is that the first one is a TypeVar while the latter is not.

To illustrate, in the first overload an arbitrary number subclass (denoted by _Number)
is provided as input and a value of the same type is returned.

In the second overload we know that an instance of np.int_ is returned,
but there is a catch: np.int_ is a platform-dependent alias for one of the number subclasses,
which is not something we've been able to easily express as of yet.

numpy/numpy/__init__.pyi

Lines 511 to 517 in 92665ab

# TODO(alan): Platform dependent types
# longcomplex, longdouble, longfloat
# bytes, short, intc, intp, longlong
# half, single, double, longdouble
# uint_, int_, float_, complex_
# float128, complex256
# float96

As a workaround the latter case is now simply annotated as returning a number instance and
while correct, this is obviously not as specific as it could be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay got it, if my understanding is correct, the functionality is exactly the same if we use number of _Number here (second overload), but it is something we want to change in the future ?

Copy link
Member Author

@BvB93 BvB93 Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..., but it is something we want to change in the future ?

Correct, once we've dealt with these platform-dependent generic subclasses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hang on, I'm actually confusing the issues pertaining int_ and number here, sorry.

The problem with returningnumber is more straightforward.
As ArrayLike is not yet Generic with respect to the data type we don't have a way yet to correlate the input and output types.
In the future we should be able to use expressions such as the one below (or at least something similar):

>>> from typing import TypeVar
>>> import numpy as np
>>> from numpy.typing import ArrayLike

>>> T = TypeVar('T', bound=np.number)

>>> def func(a: ArrayLike[T]) -> T:
...     pass

If you're interested, see #16759 for more details.

numpy/__init__.pyi Show resolved Hide resolved
numpy/__init__.pyi Show resolved Hide resolved
Copy link
Member

@anirudh2290 anirudh2290 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks ! IMO, it would be nice to have np.prod(...,dtype=None) in tests.

@BvB93
Copy link
Member Author

BvB93 commented Jul 16, 2020

LGTM thanks ! IMO, it would be nice to have np.prod(...,dtype=None) in tests.

Done in 5c2dd87.
FYI, #16622 also expands the current dtype tests a bit and includes a new dtype(None) test.

@BvB93
Copy link
Member Author

BvB93 commented Aug 6, 2020

Are there any more remarks or comments on this pull request?
If not, then let's merge.

numpy/__init__.pyi Outdated Show resolved Hide resolved
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
@BvB93
Copy link
Member Author

BvB93 commented Aug 6, 2020

circleci build failure seems to be unrelated.

@mattip
Copy link
Member

mattip commented Aug 6, 2020

Unfortunately, circleci does not merge-from-master before building. You can ignore the error, merge from master, or rebase off master.

@BvB93
Copy link
Member Author

BvB93 commented Aug 6, 2020

You can ignore the error, merge from master, or rebase off master.

I'd say let's leave it as it is for now in order to not trigger any more (arguably) redundant CI runs.

@mattip mattip merged commit 6989899 into numpy:master Aug 24, 2020
@mattip
Copy link
Member

mattip commented Aug 24, 2020

Thanks @BvB93

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants