Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Overhaul of NumPy main namespace [NEP 52] #24306

Closed
mtsokol opened this issue Jul 31, 2023 · 36 comments
Closed

ENH: Overhaul of NumPy main namespace [NEP 52] #24306

mtsokol opened this issue Jul 31, 2023 · 36 comments
Assignees
Labels
01 - Enhancement 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. Numpy 2.0 API Changes

Comments

@mtsokol
Copy link
Member

mtsokol commented Jul 31, 2023

Hi @rgommers @seberg @ngoldbaum!

NEP 52 1 (tracking issue #23999) describes cleaning up NumPy's Python API - I started rewiring it, starting from the main namespace (so with a top-down approach - first to clean-up top namespace and then step down to submodules).

Below I share top NumPy namespace (so items that are available within np.*) with entries to be removed from there. I took an aggressive approach to cut as much as possible (from 563 items here I propose to drop 221, so 40% of the main namespace) so I assume to rather relax this list after a review. UPDATE The final list removes 18% of the main namespace.

It mostly covers moving duplicates, removing multiple aliases, moving dtype classes and aliases to np.dtypes submodule proposed in NEP 52, and considers some unused/deprecated methods mentioned in previous issues/PRs.

This list doesn't concern removing any function per se, only restructuring the main namespace.

Please share your feedback!

[UPDATE] Latest list can be found in comments below: #24306 (comment)

[UPDATE 31.08] Here's the final list with "remove" and "stay" columns:

remove 🔴 stay 🟢
ALLOW_THREADS matrix
AxisError (available in np.exceptions) ScalarType
BUFSIZE asfortranarray
CLIP abs
ComplexWarning (available in np.exceptions) add
ERR_CALL all
ERR_DEFAULT allclose
ERR_IGNORE alltrue
ERR_LOG amax
ERR_PRINT amin
ERR_RAISE angle
ERR_WARN any
FLOATING_POINT_SUPPORT append
FPE_DIVIDEBYZERO can_cast (present in Array API)
FPE_INVALID apply_along_axis
FPE_OVERFLOW apply_over_axes
FPE_UNDERFLOW arange
Inf (use np.inf) arccos
Infinity (use np.inf) arccosh
MAXDIMS (use axis=None) arcsin
MAY_SHARE_BOUNDS arcsinh
MAY_SHARE_EXACT arctan
ModuleDeprecationWarning (available in np.exceptions) arctan2
NAN (use np.nan) arctanh
NINF (use -np.inf) argmax
NZERO (use -0.0) argmin
NaN (use np.nan) argpartition
PINF (use np.inf) argsort
PZERO (use 0.0) argwhere
RAISE around
in1d (use np.isin(a, b).ravel()) array
SHIFT_DIVIDEBYZERO array_equal
SHIFT_INVALID array_equiv
SHIFT_OVERFLOW array_repr
SHIFT_UNDERFLOW asanyarray
Tester (originally deleted in 1.25) asarray
TooHardError (moved to np.exceptions) atleast_1d
UFUNC_BUFSIZE_DEFAULT atleast_2d
UFUNC_PYVALS_NAME atleast_3d
VisibleDeprecationWarning (available in np.exceptions) average
WRAP bartlett
issctype base_repr
add_docstring (available in np.lib) binary_repr
add_newdoc (available in np.lib) bincount
add_newdoc_ufunc (niche, no replacement) bitwise_and
cfloat (use np.cdouble) bitwise_not
mat (use np.asmatrix) bitwise_or
clongfloat (use np.clongdouble) bitwise_xor
compat (niche, no replacement) blackman
issubclass_ (use issubclass builtin) block
deprecate (use DeprecationWarning) bool_
deprecate_with_doc (use DeprecationWarning) broadcast
disp (use printing) broadcast_arrays
fastCopyAndTranspose broadcast_shapes
float_ (use np.float64) broadcast_to
infty (use inf) byte
longcomplex (use np.clongdouble) byte_bounds
longfloat (use np.longdouble) bytes_
lookfor cbrt
math (use math directly) cdouble
numarray (already gone) ceil
oldnumeric (already gone) char
recfromcsv (use np.genfromtxt) character
recfromtxt (use np.genfromtxt) choose
safe_eval (use ast.literal_eval) clip
set_numeric_ops (niche) clongdouble
set_string_function column_stack
singlecomplex (use np.csingle) complex128
string_ (use np.bytes_) complex256
unicode_ (use np.str_) complex64
use_hugepage (internal) complex_
who (niche, no replacement) complexfloating
w (internal) compress
asfarray (use np.asarray(..., dtype=...)) concatenate
issubsctype (use issubdtype) conj
maximum_sctype (niche, no replacement) conjugate
obj2sctype (use .dtype.type) convolve
sctype2char copy
sctypeDict copysign
sctypes copyto
row_stack (use vstack) corrcoef
trapz (use scipy.integrate.trapezoid) correlate
tracemalloc_domain (available in np.lib) cos
source cosh
cast (niche, no replacement) count_nonzero
nbytes cov
round_ (instead use np.round) csingle
find_common_type ctypeslib
RankWarning (moved to np.exceptions) cumprod (rename)
get_array_wrap cumproduct (rename)
seterrobj cumsum (rename)
geterrobj datetime64
msort (deprecated, removed) deg2rad
DataSource (still in np.lib.npyio) degrees
float96 delete
float128 diag
compare_chararrays (still in np.char) diag_indices
chararray (still in np.char) diag_indices_from
recarray (still in np.rec) diagflat
format_parser (still in np.rec) diagonal
diff
digitize
divide
divmod
dot
dsplit
double
dstack
dtype
e
ediff1d
einsum
einsum_path
emath
empty
empty_like
equal
error_message
errstate
euler_gamma
exp
exp2
expand_dims
expm1
extract
eye
finfo
fabs
fft
fill_diagonal
fix
flatiter
flatnonzero
flexible
flip
fliplr
flipud
float16
float32
float64
float_power
floating
floor
floor_divide
fmax
fmin
fmod
format_float_positional
format_float_scientific
frexp
from_dlpack
frombuffer
fromfile
fromfunction
fromiter
frompyfunc
fromregex
fromstring
full
full_like
gcd
generic
genfromtxt
geomspace
get_printoptions
getbufsize
gradient
greater
greater_equal
half
hamming
hanning
heaviside
hsplit
hstack
hypot
identity
imag
iinfo
indices
inexact
inf
info
insert
int16
int32
int64
int8
int_
intc
integer
interp
intersect1d
invert
isclose
isfinite
isin
isinf
isnan
isnat
isneginf
isposinf
iterable
kaiser
lcm
ldexp
left_shift
less
less_equal
lexsort
lib
linalg
linspace
little_endian
load
loadtxt
log
log10
log1p
log2
logaddexp
logaddexp2
logical_and
logical_not
logical_or
logical_xor
logspace
longdouble
longlong
ma
mask_indices
matmul
max
maximum
mean
median
memmap
meshgrid
mgrid
min
minimum
mod
modf
moveaxis
multiply
nan
nan_to_num
all nan_* functions (can be addressed another time)
ndarray
ndenumerate
ndim
ndindex
nditer
negative
nested_iters
newaxis
nextafter
nonzero
not_equal
number
object_
ogrid
ones
ones_like
packbits
pad
partition
percentile
pi
piecewise
place
positive
power
printoptions
prod
product
ptp
put
result_type
put_along_axis
quantile
rad2deg
radians
random
ravel
ravel_multi_index
real
real_if_close
rec
reciprocal
record
remainder
repeat
reshape
resize
right_shift
roll
rollaxis
rot90
round
save
savetxt
savez
savez_compressed
searchsorted
select
set_printoptions
setbufsize
setdiff1d
setxor1d
shape
shares_memory
short
show_config
show_runtime
sign
signbit
signedinteger
sin
sinc
single
sinh
size
sometrue
sort
sort_complex
spacing
split
sqrt
square
squeeze
stack
std
str_
subtract
sum
swapaxes
take
take_along_axis
tan
tensordot
tanh
test
testing
tile
timedelta64
trace
tri (tril and triu are in Array API)
tril
tril_indices
tril_indices_from
triu
triu_indices
triu_indices_from
trim_zeros
true_divide
trunc
ubyte
ufunc
uint
uint16
uint32
uint64
uint8
uintc
ulonglong
union1d
unique
unpackbits
unravel_index
unsignedinteger
ushort
unwrap
vectorize
version
void
vsplit
vstack
where
zeros
zeros_like
var
ascontiguousarray
may_share_memory
get_include
intp
uintp
i0
issubdtype
absolute
histogram
histogram2d
histogram_bin_edges
histogramdd
asmatrix
seterrcall
geterr
geterrcall
rint
ix_
outer
common_type
putmask
datetime_as_string
inner
vander
vdot
cross
typecodes
typename
transpose
kron
asarray_chkfinite
all poly* functions
roots
is_busday
busday_count
busday_offset
busdaycalendar
iscomplex
iscomplexobj
isfortran
isreal
isrealobj
isscalar
False_
True_
c_
r_
s_
index_exp
mintypecode
min_scalar_type
array2string
array_str
array_split
require
datetime_data
bmat
promote_types

Footnotes

  1. NEP 52 link

@ngoldbaum
Copy link
Member

We decided on the name np.dtypes instead of np.types. The dtype classes already live there with CamelCased names. Maybe it makes sense to create a np.dtypes.scalars namespace for the scalar classes, to make the distinction between dtype classes and scalar classes a little clearer? That wasn't proposed in the NEP, just noticing that there are a lot of scalars that are being moved into np.dtypes and they're... not dtypes! While we have a chance, let's make that a little clearer in the API.

For np.nansum and other nan ufunc variants, maybe it makes sense to add the option, deprecate the nan variants, and leave e.g. np.nansum around for a release or two? These are much more commonly used than a lot of the other things that are being removed in this list, I also don't see this particular deprecation coming up in previous discussion. That said, the fix is straightforward and only a little bit more complicated than changing imports, so perhaps it's worth doing this along with all the other numpy 2.0 changes.

Other more minor issues I saw reading the list over:

  • The NEP says that all of the scalar aliases under "other aliases" should be removed instead of moved into the new namespace.
  • You have the complex scalars getting removed, they should go to np.dtypes.
  • I don't think it's correct to remove show_config and show_runtime.
  • convolve is specifically for 1D arrays, I don't think it belongs in np.linalg.
  • Moving dot and cross to np.linalg certainly makes sense, that said, those are both very commonly used functions and moving them is going to be pretty disruptive to python API users.
  • You have np.genfromtxt getting deleted, is that correct? It's very commonly used.
  • Also recfromtxt and recfromcsv, I don't see those in previous discussions, but they're definitely a lot more niche and pandas is probably a better replacement anyway.

@ngoldbaum
Copy link
Member

I wrapped the long table in the issue description in a details block to make this discussion thread a little easier to read.

@ksunden
Copy link
Contributor

ksunden commented Jul 31, 2023

Just went through the matplotlib codebase looking for these. You can see a more full analysis in matplotlib/matplotlib#26422

From a matplotlib perspective, I don't think we are prepared to take on a scipy dependency (I think it may even cause a deadlock in some instances... if I remember correctly)

Thus I'd argue that np.interp and np.roots should stay. (interp being a higher priority for us, as it is used often in the matplotlib codebase. roots is only used once, but it is in pretty core functionality)

I'm less worried about the window functions, as those are mostly used in tests/examples anyway, and those functions are simple enough to just include inline if needed (and when they are used in library code, it is for functions that I'd argue we should deprecate ourselves..., mostly).

I was also surprised to see np.prod in the category of "move to linalg", that seems like something a lot more general/useful than linalg. Feels incongruent to me to keep sum and cumprod but not prod.

I would also argue in favor of keeping np.genfromtxt (though that is more for non-matplotlib related reasons). Carting around ascii text files of numbers is a super common use case (unfortunately so, in some cases) genfromtxt is much better than stdlib options for parsing them, and I don't really want to do pandas.read_csv if all I want is numpy array anyway. (or if I don't have headings at all... which is all too common...)

Those are the big ones from my POV, though wouldn't exactly mind not having to update some of the others on that list, but not that bad either.

@mtsokol mtsokol changed the title ENH: Overhaul of NumPy main namespace [NEP 52] ENH: [WIP] Overhaul of NumPy main namespace [NEP 52] Jul 31, 2023
@mtsokol
Copy link
Member Author

mtsokol commented Jul 31, 2023

@ksunden Thank you for the feedback! Looking briefly once again I spotted a mistake in my list: for np.roots I meant move to np.polynomial, fixed it!

@seberg
Copy link
Member

seberg commented Aug 1, 2023

I think the list above is considerably more aggressive than NEP 52. There are things I believe we can even just delete, like practically all of those strange upper-case enums (very few to no one should be using those). But things like genfromtext is widely used, and even the windowing functions may be used often enough in tutorials/examples that I am not sure it is worth the trouble to deprecate them.
Also e.g. the scalar types shouldn't move (maybe them being in two places). Although it seems good to deprecate all aliases except the canonical name.

@mtsokol
Copy link
Member Author

mtsokol commented Aug 1, 2023

@ngoldbaum I renamed entires to np.dtypes. When it comes to scalar types: Sebastian mentioned in his comment that scalar types shouldn't move at the end.

  • I agree that first an option needs to be added to those np.nan* functions, and then deprecate them.
  • Thanks for pointing out! I removed "other aliases" explicitly.
  • I moved np.convolve back to main namespace - I think we need to agree what can be moved to np.linalg without being too disruptive (and if it makes sense to move anything at all).
  • I think I confused np.genfromtxt with something else - I moved it back.
  • recfromtxt and recfromcsv were deprecated in API: deprecate undocumented functions #24154

@ksunden I moved np.interp, np.prod, np.genfromtxt and windows functions back to main namespace.

@seberg I removed those upper-case enums from the main namespace.

@ngoldbaum
Copy link
Member

The errors and warnings should probably also get deprecation warnings if they're explicitly imported from the main namespace and removed from __all__, they already have canonical locations in np.exceptions.

@timhoffm
Copy link
Contributor

timhoffm commented Aug 2, 2023

Do you want to remove one of the np.min/np.amin aliases (and likeswise for max)?

@mtsokol
Copy link
Member Author

mtsokol commented Aug 2, 2023

Do you want to remove one of the np.min/np.amin aliases (and likeswise for max)?

@timhoffm I think it would be more consistent to drop aliases and have only one function with a specific functionality. But I would assume that these core functions are heavily used, and a gain of reducing main namespace by two entries might not be worth breaking API here. But I don't have a strong opinion on that.

@timhoffm
Copy link
Contributor

timhoffm commented Aug 2, 2023

@mtsokol one can definitively argue both ways. I just wanted to bring this to the table. I also don't have the usage insight and knowledge on numpy policy priorities to decide what's reasonable here. - If you keep the aliases, I suggest to still bless one and discourage the other, so that at least new code will grow into a consistent direction.

@rgommers rgommers added 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. 01 - Enhancement labels Aug 2, 2023
@rgommers
Copy link
Member

rgommers commented Aug 2, 2023

Thanks @mtsokol to get the ball rolling here! I'm first commenting on all the feedback, then I'll have to go back and add my own comments on the individual per-function proposals.

I think the list above is considerably more aggressive than NEP 52. There are things I believe we can even just delete, like practically all of those strange upper-case enums (very few to no one should be using those).

I agree with this. Most of the feedback here (in the whole thread so far) is quite useful and on-point. I think it can be incorporated and a next version of the table/plan posted for discussion.

Also e.g. the scalar types shouldn't move (maybe them being in two places). Although it seems good to deprecate all aliases except the canonical name.

+1 to them not moving. They should not be in two places, only stay where they are now. And yes, definitely let's deprecate and then remove all the aliases.

For np.nansum and other nan ufunc variants, maybe it makes sense to add the option, deprecate the nan variants, and leave e.g. np.nansum around for a release or two?

This seems desirable, but a project in and of itself. data-apis/array-api#621 is relevant here.

Moving dot and cross to np.linalg certainly makes sense, that said, those are both very commonly used functions and moving them is going to be pretty disruptive to python API users.

I think np.dot is too disruptive, and we have to keep it. cross may be good to move to linalg though, since it's there in the array API standard. For things like that, if we think outright removals are too disruptive, we could keep aliases around in the old locations (but only document them in the new location).

If you keep the aliases, I suggest to still bless one and discourage the other, so that at least new code will grow into a consistent direction.

Agreed. And we know what the canonical names are min/max in this case.

@mtsokol
Copy link
Member Author

mtsokol commented Aug 2, 2023

Hi @rgommers,
Here I share updated table after applying review comments. For scalars I removed aliases (sized and "other") and kept only canonical names. The count of removed entries from main namespace is now 197.

[UPDATE 31.08] The final list is present in the PRs description.

@rgommers
Copy link
Member

rgommers commented Aug 2, 2023

A few more comments on the list below. I think if we can quickly clean up a lot of the obvious ones (stray variables, aliases, enums, etc., then the list will get a lot shorter and easier to review. Another category is the ones that should definitely stay (e.g. everything in the array API standard), as well as numpy-specific heavily used functionality without a clear replacement (e.g. vectorize, set_printoptions, ). If we could put those in a "definitely staying" list, then we get to the "these are the ones left for discussion" list that will be pretty tractable to review in detail.

`'uint16' | -- remove, sized alias

Note that these are the canonical preferred names. These are the ones we want to keep, and in addition the canonical C names. Every other alias should go. The exceptions here are the longdouble aliases (float96, float128) because those may or may not exist, and hence are better removed.

'nanmean' || --

can you undo those? I don't think this will make it into 2.0, and we are likely to keep these aliases around for a long time even if the keyword idea arrives in time.

'result_type' | -- move to np.dtypes

It would be useful to double check your list with the array API standard. Anything that's in the main namespace there , like result_type, clearly should stay.

rot90, unwrap

These are about angles, and go together with rad2deg & co. I think they don't have an obviously better place to go, and are best left alone.

@rkern
Copy link
Member

rkern commented Aug 2, 2023

rot90 isn't really about angles in the way rad2deg and unwrap are. It goes with flip, transpose, etc. Agreed it should be left alone, though.

@mtsokol
Copy link
Member Author

mtsokol commented Aug 2, 2023

@rgommers Sure, I will prepare a shorter list for further discussion.

`'uint16' | -- remove, sized alias

Note that these are the canonical preferred names. These are the ones we want to keep, and in addition the canonical C names. Every other alias should go. The exceptions here are the longdouble aliases (float96, float128) because those may or may not exist, and hence are better removed.

Sure! I will revert it. To explain, for np.uint16?? I got :Canonical name: numpy.ushort therefore I considered it an alias. I will move back names with 16, 32, etc.

'nanmean' || --

can you undo those? I don't think this will make it into 2.0, and we are likely to keep these aliases around for a long time even if the keyword idea arrives in time.

Sure!

'result_type' | -- move to np.dtypes

It would be useful to double check your list with the array API standard. Anything that's in the main namespace there , like result_type, clearly should stay.

Sure! I will check it.

rot90, unwrap

These are about angles, and go together with rad2deg & co. I think they don't have an obviously better place to go, and are best left alone.

Will move these back.

@ngoldbaum
Copy link
Member

ngoldbaum commented Aug 2, 2023

I got :Canonical name: numpy.ushort therefore I considered it an alias. I will move back names with 16, 32, etc.

The bitsized integer types being aliases of the C integer types and not the other way around is a known issue. Ultimately it comes down to these typedefs. This also means that, depending on the platform, the C integer types may or may not be aliased to a bitsize.

@seberg recently attempted to make the C type names aliases to the bitsized types but it's complicated. For now I would focus on the other aliases and come back to the integer aliases later.

@rgommers
Copy link
Member

rgommers commented Aug 2, 2023

I usually think of this purely from an end user focused API perspective. It doesn't matter one way or the other what the name is of the actual implementation under the hood. It's more what the docs say or what common practice is (e.g. code uses np.float64 in >9x% of cases, not np.double or another alias).

@mtsokol
Copy link
Member Author

mtsokol commented Aug 3, 2023

@rgommers here I share an updated list divided by three sections (remove, tentative and keep). I kept canonical and sized names for scalars.

[EDIT 22.08.2023] updated list to the latest version.
[EDIT 31.08.2023] final list is present in the PRs description.

@rgommers
Copy link
Member

rgommers commented Aug 3, 2023

That looks quite nice and easier to review, thanks Mateusz!

The "remove" list looks pretty good; cross is the one that jumps out to me as needing to be in the "tentative" list instead.

The "keep" list looks pretty good too. The few objects that jump out are the ones with trailing underscores - in particular I think that False_/True_ are pretty pointless and can be removed. We also need the proper names back, so bool must be re-added and bool_ must be hidden (as an alias to np.bool_).

For the "tentative list", some more comments:

  • all the *sctype* things can be removed, that is explicitly covered in NEP 52,
  • np.matrix can't go, unfortunately. That one is so heavily used that we should deprecate it first. Which is still blocked by it being used in scipy.sparse,
  • asfarray has bad semantics and I think we should remove it (see MAINT: differentiable fns respect float width. Closes #15602 scipy/scipy#18481 (comment)), while ascontiguousarray and asfortranarray probably should stay,
  • issubclass_: remove it, I don't think that was ever meant to be public,
  • may_share_memory is an important function and should stay
  • get_include is important and there is no clear replacement, so it should stay

@mtsokol
Copy link
Member Author

mtsokol commented Aug 3, 2023

@rgommers Sure! I applied all points to my list and updated the comment.

I've got one thing to confirm about bool/bool_: Currently np.bool gives AttributeError with a message np.bool was a deprecated alias for the builtin bool. (I see the code that does it was introduced only 7 months ago)

@rgommers
Copy link
Member

rgommers commented Aug 3, 2023

I've got one thing to confirm about bool/bool_: Currently np.bool gives AttributeError with a message np.bool was a deprecated alias for the builtin bool. (I see the code that does it was introduced only 7 months ago)

Yeah, best not to touch it now - it's a little complicated. I think the plan is to reintroduce np.bool for NumPy 2.0 (EDIT: as a numpy dtype, not as an alias to builtins.bool); that was the plan anyway and I think we need it for array API compatibility too. But it's possible I am forgetting something (and if so, I'm pretty sure @seberg will know what that is).

@seberg
Copy link
Member

seberg commented Aug 3, 2023

Might be a bit early to reintroduce it, but I am fine to do so. Also remember, at least it probably had a DeprecationWarning before that. Would lean to just not do anything about bool_ and similar ones for now, though. (unless we start hiding things from __dir__ only)

np.True_ and np.False_ need to stay though, IMO. They are clear and it makes it clear they are singletons. Also you would notice if you tried: they are used as repr which you would hav to adapt also then.

@rgommers
Copy link
Member

rgommers commented Aug 3, 2023

Might be a bit early to reintroduce it, but I am fine to do so

I think it's pretty harmless, at least it's hard to imagine what would go wrong (typical usage was np.somefunc(..., dtype=np.bool) which will basically work unchanged).

np.True_ and np.False_ need to stay though, IMO. They are clear and it makes it clear they are singletons.

They're trivial to recover in the rare cases where you need them, right? Like so:

>>> np.bool_(False) is np.False_
True

If that is correct, they really have no business being in the top-level namespace.

@seberg
Copy link
Member

seberg commented Aug 3, 2023

Sure you can, lets see what others think. You will have to change the repr to np.bool(True) from np.True_, although that is easy. I do have a tendency to think that it doesn't hurt if these singletons remain and somehow for singletons that seems right to me. They seem used, but not that much, so at least long-term it probably doesn't matter.

@rgommers
Copy link
Member

rgommers commented Aug 3, 2023

repr to np.bool(True) from np.True_,

I think that's not the repr?

>>> np.False_.__repr__()
'False'
>>> print(np.False_)
False

I do have a tendency to think that it doesn't hurt if these singletons remain and somehow for singletons that seems right to me.

I think it being a singleton should be an implementation detail that no one should rely on. Nor does it really matter. From an API perspective this looks to me like a weird object that is trivially reconstructed.

Sure you can, lets see what others think.

Agreed. Maybe there is a concrete use case someone can share?

@seberg
Copy link
Member

seberg commented Aug 3, 2023

The repr is np.True_ on main, although I started that process at a time when np.bool(True) would have been impossible! The choice remains, False is a bad repr, so we need to decide if we prefer np.True_ or np.bool(True).
I have a slight liking for exposing singletons, probably just for similarity to Pythons bools. Code search finds something is np.True_, but it is probably not common enough to worry about more than maybe not just yaking it out.

@ngoldbaum
Copy link
Member

ngoldbaum commented Aug 23, 2023

@mtsokol we chatted at the triage meeting about the tentative list and we ended up deciding that most of the tentative column that isn't already deprecated should be deprecated, except iscomplex, iscomplexobj, isreal, isrealobj, isscalar, True_, and False_ which should definitely stay in the main namespace.

DataSource is kind of on the bubble. It's not used much but also it's small and stable and not adding much maintenance burden. I'd suggest moving to the "stay" column but could probably be deprecated because there's minimal downstream use, but it probably needs a migration path for users.

Other less-used array utility functions can move to np.lib.array_utils. Functions that have no clear replacement in the existing NumPy API should stay in NumPy but be moved to np.lib.array_utils (except for recarray which already has a canonical home). There should be aliases in the main numpy namespace that raise deprecation warnings when these are imported directly. For items in the tentative list that have clear replacements using the public NumPy API, suggest using those instead and mark the whole item deprecated and slated for removal, but don't remove it immediately unless it has very few downstream users.

I think that covers everything, let me know if there are any other corner cases!

@mattip
Copy link
Member

mattip commented Aug 23, 2023

should be deprecated,

We also discussed perhaps creating a package to put on PyPI that would restore many of the removed functions from NumPy2.0. By pip-installing this package, users could continue to use their favorite aliases and functions that were removed, perhaps with a DeprecationWarning when importing or using some of them. Does that sound like a reasonable comprimise, or would it entail too much ongoing maintenance burden?

@andyfaff
Copy link
Contributor

That sound like it has the potential to be a maintenance burden that could suffer from bitrot.

@charris
Copy link
Member

charris commented Aug 23, 2023

It is good to keep in mind that the needed downstream changes should be minimal if we want to avoid a Python 3 situation. Folks can, and do, ignore deprecation warnings, the warnings don't break code.

@mtsokol
Copy link
Member Author

mtsokol commented Aug 23, 2023

We also discussed perhaps creating a package to put on PyPI that would restore many of the removed functions from NumPy2.0. By pip-installing this package, users could continue to use their favorite aliases and functions that were removed, perhaps with a DeprecationWarning when importing or using some of them. Does that sound like a reasonable comprimise, or would it entail too much ongoing maintenance burden?

If any of the functions/aliases that got removed in Part 1 or Part 2 should be still available and deprecated, I can restore them. If there should be a longer deprecation period I think it's straightforward to provide it.

So far removed items from the main namespace are internal enums, already deprecated functions and redundant aliases. Here's a complete list so far: https://github.com/numpy/numpy/blob/main/doc/source/release/2.0.0-notes.rst#numpy-20-python-api-removals

If all removed aliases and functions from the main namespace should be available even after NumPy 2.0 release, maybe it's better to keep them in a separate "1.x legacy" module and make them injected into main namespace after enabling a flag: np.enable_1_x_main_namespace()?
I would still argue that it's better to reduce scope of changes if it's too disruptive, and keep final design irreversible.

@ngoldbaum
Copy link
Member

ngoldbaum commented Aug 24, 2023

If any of the functions/aliases that got removed in Part 1 or Part 2 should be still available and deprecated, I can restore them. If there should be a longer deprecation period I think it's straightforward to provide it.

So far removed items from the main namespace are internal enums, already deprecated functions and redundant aliases.

I think we want to do this in such a way that guides users to the correct way to fix their code. One alternative to deprecating things is to break them, but make sure the error users see gives the migration path.

For example, if we remove aliases, we can do so in such a way that if a user imports the name or accesses it as an attribute of the numpy module they get an ImportError or AttributeError that gives them the migration path:

>>>np.NINF
AttributeError:NINF was removed in Numpy 2.0, use -np.inf instead

This allows us to do these renames and clean up namespaces, not leave the old names behind with an ignorable deprecation that only delays user pain, and break user code in such a way they hopefully immediately see what broke and how to fix their code.

@rgommers
Copy link
Member

Thanks for the summary of that discussion @ngoldbaum. Overall that sounds quite good to me.

We also discussed perhaps creating a package to put on PyPI that would restore many of the removed functions from NumPy2.0.

I agree with @andyfaff here that this is a maintenance burden. Also, there is no evidence that this is necessary or even desired at this point. We should not do this for now; if there is enough demand it can easily be done quick around the RC period time.

Also, it's good to keep in mind that we already are doing, or are planning to do or at least are discussion the following:

  • clear error messages from __getattr__ with replacements (see comment by Nathan right above)
  • good release notes
  • a migration guide in the docs
  • a sed script to automatically clean up code as much as possible

At some point we've got to stop - the above list is enough. Also a reminder that we've got, according to our original planning, a little over 4 months left and a ton of work to do. We haven't even started on some of the main topics on our wish list. So let's please avoid more new work here that we didn't plan for, like a compat package.

It is good to keep in mind that the needed downstream changes should be minimal if we want to avoid a Python 3 situation

I've now heard this one too many times, so let me write down my assessment here. First, the change of turning the NumPy 1.26 to 2.0 transition into anything like the Python 2.7 to Python 3 one is extremely low. Second, if it does happen, the changes under discussion in this issue are quite unlikely to be the root cause. Instead, it'll be because either something went wrong with the C ABI change or our assessment of the impact of it. That is still the most impactful change we are planning for, and at this point it's still not completely clear what this will look like.1

Regarding Python 3, the main causes were (highest-impact one first):

  1. The major changes in string handling,
  2. Large changes in the C API for which there was very little guidance or tooling to upgrade,
  3. The low-quality early releases. Python 3.0-3.2 were worse than 2.7 and offered little of value,
  4. Trivial-but-annoying changes to widely used things, e.g. making print a function.

This issue is about niche APIs and changes that are not hard to adapt to. It's most similar to (4), but a lot less impactful since we're not touching anything that's idiomatic or heavily used.

Footnotes

  1. We do know pretty much for sure that it's going to cause a lot of breakage, since even packages like Pandas have failed to put <2.0 bounds on most of their releases. But we expect the disruptions to be short-lived, with maintainers of package putting out new releases and users to learn how to add constraints files. But still, pip install pandas==1.5 will remain broken.

@mtsokol
Copy link
Member Author

mtsokol commented Aug 31, 2023

Hi all!
The final up-to-date list with only "remove" and "stay" columns is present in the PRs description.

@seberg
Copy link
Member

seberg commented Sep 5, 2023

I agree we have no big risk of a full blown Python 2-3 situation.

Unfortunately, transition may not be as smooth as we hope for mainly reasons:

  • C-API: how long will it take smaller libraries to recompile wheels against the newer NumPy (some might have to do more work, but that should be very uncommon)?
  • Python API: How many small libraries will just break mostly for silly reasons? How many of them will remain broken for a long time due to lack of active maintenance?

For Python 2-3, even the larger well maintained libs took a long time to transition I think. We will not have that problem for sure, I think (my main worry would be numba/cupy maybe due to promotion changes, but I think they may be OK with having a 95% fix for those; in practice they are only at 95% to begin with).

I don't think I believe in a compat package, downgrading NumPy seems OK as a hot-fix if we go that far.
I could see doing the normal deprecation cycle for anything that we see used downstream (e.g. in a github code search). Because that will remove almost all immediate disruption due to point 2 (yes, the downside is that it might just break later on, because the library is really unmaintained so giving them more time doesn't help).
(Maybe as a FutureWarning/VisibleDeprecationWarning, so that users cannot miss that the library they are using is broken.)

@ngoldbaum
Copy link
Member

Discussion has died down in here and all of the PRs implementing the main namespace refactor have been merged. I'm going to close this.

I don't think there's much appetite for a compat package and instead I think we're going to point people to ruff to lint and update their code. If there are cases where it's not straightforward to write code that works the same in both NumPy 1 and 2 we should look closer and see if it's possible to add code in a NumPy 1 bugfix release or in NumPy 2 to ease writing code that works in both versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. Numpy 2.0 API Changes
Projects
None yet
Development

No branches or pull requests

10 participants