ENH: Overhaul of NumPy main namespace [NEP 52] #24306

mtsokol · 2023-07-31T17:15:52Z

NEP 52 ¹ (tracking issue #23999) describes cleaning up NumPy's Python API - I started rewiring it, starting from the main namespace (so with a top-down approach - first to clean-up top namespace and then step down to submodules).

Below I share top NumPy namespace (so items that are available within np.*) with entries to be removed from there. I took an aggressive approach to cut as much as possible (from 563 items here I propose to drop ~~221, so 40% of the main namespace~~) so I assume to rather relax this list after a review. UPDATE The final list removes 18% of the main namespace.

It mostly covers moving duplicates, removing multiple aliases, moving dtype classes and aliases to np.dtypes submodule proposed in NEP 52, and considers some unused/deprecated methods mentioned in previous issues/PRs.

This list doesn't concern removing any function per se, only restructuring the main namespace.

Please share your feedback!

[UPDATE] Latest list can be found in comments below: #24306 (comment)

[UPDATE 31.08] Here's the final list with "remove" and "stay" columns:

remove 🔴	stay 🟢
ALLOW_THREADS	matrix
AxisError (available in np.exceptions)	ScalarType
BUFSIZE	asfortranarray
CLIP	abs
ComplexWarning (available in np.exceptions)	add
ERR_CALL	all
ERR_DEFAULT	allclose
ERR_IGNORE	alltrue
ERR_LOG	amax
ERR_PRINT	amin
ERR_RAISE	angle
ERR_WARN	any
FLOATING_POINT_SUPPORT	append
FPE_DIVIDEBYZERO	can_cast (present in Array API)
FPE_INVALID	apply_along_axis
FPE_OVERFLOW	apply_over_axes
FPE_UNDERFLOW	arange
Inf (use np.inf)	arccos
Infinity (use np.inf)	arccosh
MAXDIMS (use axis=None)	arcsin
MAY_SHARE_BOUNDS	arcsinh
MAY_SHARE_EXACT	arctan
ModuleDeprecationWarning (available in np.exceptions)	arctan2
NAN (use np.nan)	arctanh
NINF (use -np.inf)	argmax
NZERO (use -0.0)	argmin
NaN (use np.nan)	argpartition
PINF (use np.inf)	argsort
PZERO (use 0.0)	argwhere
RAISE	around
in1d (use np.isin(a, b).ravel())	array
SHIFT_DIVIDEBYZERO	array_equal
SHIFT_INVALID	array_equiv
SHIFT_OVERFLOW	array_repr
SHIFT_UNDERFLOW	asanyarray
Tester (originally deleted in 1.25)	asarray
TooHardError (moved to np.exceptions)	atleast_1d
UFUNC_BUFSIZE_DEFAULT	atleast_2d
UFUNC_PYVALS_NAME	atleast_3d
VisibleDeprecationWarning (available in np.exceptions)	average
WRAP	bartlett
issctype	base_repr
add_docstring (available in np.lib)	binary_repr
add_newdoc (available in np.lib)	bincount
add_newdoc_ufunc (niche, no replacement)	bitwise_and
cfloat (use np.cdouble)	bitwise_not
mat (use np.asmatrix)	bitwise_or
clongfloat (use np.clongdouble)	bitwise_xor
compat (niche, no replacement)	blackman
issubclass_ (use issubclass builtin)	block
deprecate (use DeprecationWarning)	bool_
deprecate_with_doc (use DeprecationWarning)	broadcast
disp (use printing)	broadcast_arrays
fastCopyAndTranspose	broadcast_shapes
float_ (use np.float64)	broadcast_to
infty (use inf)	byte
longcomplex (use np.clongdouble)	byte_bounds
longfloat (use np.longdouble)	bytes_
lookfor	cbrt
math (use math directly)	cdouble
numarray (already gone)	ceil
oldnumeric (already gone)	char
recfromcsv (use np.genfromtxt)	character
recfromtxt (use np.genfromtxt)	choose
safe_eval (use ast.literal_eval)	clip
set_numeric_ops (niche)	clongdouble
set_string_function	column_stack
singlecomplex (use np.csingle)	complex128
string_ (use np.bytes_)	complex256
unicode_ (use np.str_)	complex64
use_hugepage (internal)	complex_
who (niche, no replacement)	complexfloating
w (internal)	compress
asfarray (use np.asarray(..., dtype=...))	concatenate
issubsctype (use issubdtype)	conj
maximum_sctype (niche, no replacement)	conjugate
obj2sctype (use .dtype.type)	convolve
sctype2char	copy
sctypeDict	copysign
sctypes	copyto
row_stack (use vstack)	corrcoef
trapz (use scipy.integrate.trapezoid)	correlate
tracemalloc_domain (available in np.lib)	cos
source	cosh
cast (niche, no replacement)	count_nonzero
nbytes	cov
round_ (instead use np.round)	csingle
find_common_type	ctypeslib
RankWarning (moved to np.exceptions)	cumprod (rename)
get_array_wrap	cumproduct (rename)
seterrobj	cumsum (rename)
geterrobj	datetime64
msort (deprecated, removed)	deg2rad
DataSource (still in np.lib.npyio)	degrees
float96	delete
float128	diag
compare_chararrays (still in np.char)	diag_indices
chararray (still in np.char)	diag_indices_from
recarray (still in np.rec)	diagflat
format_parser (still in np.rec)	diagonal
	diff
	digitize
	divide
	divmod
	dot
	dsplit
	double
	dstack
	dtype
	e
	ediff1d
	einsum
	einsum_path
	emath
	empty
	empty_like
	equal
	error_message
	errstate
	euler_gamma
	exp
	exp2
	expand_dims
	expm1
	extract
	eye
	finfo
	fabs
	fft
	fill_diagonal
	fix
	flatiter
	flatnonzero
	flexible
	flip
	fliplr
	flipud
	float16
	float32
	float64
	float_power
	floating
	floor
	floor_divide
	fmax
	fmin
	fmod
	format_float_positional
	format_float_scientific
	frexp
	from_dlpack
	frombuffer
	fromfile
	fromfunction
	fromiter
	frompyfunc
	fromregex
	fromstring
	full
	full_like
	gcd
	generic
	genfromtxt
	geomspace
	get_printoptions
	getbufsize
	gradient
	greater
	greater_equal
	half
	hamming
	hanning
	heaviside
	hsplit
	hstack
	hypot
	identity
	imag
	iinfo
	indices
	inexact
	inf
	info
	insert
	int16
	int32
	int64
	int8
	int_
	intc
	integer
	interp
	intersect1d
	invert
	isclose
	isfinite
	isin
	isinf
	isnan
	isnat
	isneginf
	isposinf
	iterable
	kaiser
	lcm
	ldexp
	left_shift
	less
	less_equal
	lexsort
	lib
	linalg
	linspace
	little_endian
	load
	loadtxt
	log
	log10
	log1p
	log2
	logaddexp
	logaddexp2
	logical_and
	logical_not
	logical_or
	logical_xor
	logspace
	longdouble
	longlong
	ma
	mask_indices
	matmul
	max
	maximum
	mean
	median
	memmap
	meshgrid
	mgrid
	min
	minimum
	mod
	modf
	moveaxis
	multiply
	nan
	nan_to_num
	all nan_* functions (can be addressed another time)
	ndarray
	ndenumerate
	ndim
	ndindex
	nditer
	negative
	nested_iters
	newaxis
	nextafter
	nonzero
	not_equal
	number
	object_
	ogrid
	ones
	ones_like
	packbits
	pad
	partition
	percentile
	pi
	piecewise
	place
	positive
	power
	printoptions
	prod
	product
	ptp
	put
	result_type
	put_along_axis
	quantile
	rad2deg
	radians
	random
	ravel
	ravel_multi_index
	real
	real_if_close
	rec
	reciprocal
	record
	remainder
	repeat
	reshape
	resize
	right_shift
	roll
	rollaxis
	rot90
	round
	save
	savetxt
	savez
	savez_compressed
	searchsorted
	select
	set_printoptions
	setbufsize
	setdiff1d
	setxor1d
	shape
	shares_memory
	short
	show_config
	show_runtime
	sign
	signbit
	signedinteger
	sin
	sinc
	single
	sinh
	size
	sometrue
	sort
	sort_complex
	spacing
	split
	sqrt
	square
	squeeze
	stack
	std
	str_
	subtract
	sum
	swapaxes
	take
	take_along_axis
	tan
	tensordot
	tanh
	test
	testing
	tile
	timedelta64
	trace
	tri (tril and triu are in Array API)
	tril
	tril_indices
	tril_indices_from
	triu
	triu_indices
	triu_indices_from
	trim_zeros
	true_divide
	trunc
	ubyte
	ufunc
	uint
	uint16
	uint32
	uint64
	uint8
	uintc
	ulonglong
	union1d
	unique
	unpackbits
	unravel_index
	unsignedinteger
	ushort
	unwrap
	vectorize
	version
	void
	vsplit
	vstack
	where
	zeros
	zeros_like
	var
	ascontiguousarray
	may_share_memory
	get_include
	intp
	uintp
	i0
	issubdtype
	absolute
	histogram
	histogram2d
	histogram_bin_edges
	histogramdd
	asmatrix
	seterrcall
	geterr
	geterrcall
	rint
	ix_
	outer
	common_type
	putmask
	datetime_as_string
	inner
	vander
	vdot
	cross
	typecodes
	typename
	transpose
	kron
	asarray_chkfinite
	all poly* functions
	roots
	is_busday
	busday_count
	busday_offset
	busdaycalendar
	iscomplex
	iscomplexobj
	isfortran
	isreal
	isrealobj
	isscalar
	False_
	True_
	c_
	r_
	s_
	index_exp
	mintypecode
	min_scalar_type
	array2string
	array_str
	array_split
	require
	datetime_data
	bmat
	promote_types

NEP 52 link ↩

The text was updated successfully, but these errors were encountered:

ngoldbaum · 2023-07-31T18:11:15Z

We decided on the name np.dtypes instead of np.types. The dtype classes already live there with CamelCased names. Maybe it makes sense to create a np.dtypes.scalars namespace for the scalar classes, to make the distinction between dtype classes and scalar classes a little clearer? That wasn't proposed in the NEP, just noticing that there are a lot of scalars that are being moved into np.dtypes and they're... not dtypes! While we have a chance, let's make that a little clearer in the API.

For np.nansum and other nan ufunc variants, maybe it makes sense to add the option, deprecate the nan variants, and leave e.g. np.nansum around for a release or two? These are much more commonly used than a lot of the other things that are being removed in this list, I also don't see this particular deprecation coming up in previous discussion. That said, the fix is straightforward and only a little bit more complicated than changing imports, so perhaps it's worth doing this along with all the other numpy 2.0 changes.

Other more minor issues I saw reading the list over:

The NEP says that all of the scalar aliases under "other aliases" should be removed instead of moved into the new namespace.
You have the complex scalars getting removed, they should go to np.dtypes.
I don't think it's correct to remove show_config and show_runtime.
convolve is specifically for 1D arrays, I don't think it belongs in np.linalg.
Moving dot and cross to np.linalg certainly makes sense, that said, those are both very commonly used functions and moving them is going to be pretty disruptive to python API users.
You have np.genfromtxt getting deleted, is that correct? It's very commonly used.
Also recfromtxt and recfromcsv, I don't see those in previous discussions, but they're definitely a lot more niche and pandas is probably a better replacement anyway.

ngoldbaum · 2023-07-31T19:57:05Z

I wrapped the long table in the issue description in a details block to make this discussion thread a little easier to read.

ksunden · 2023-07-31T22:24:48Z

Just went through the matplotlib codebase looking for these. You can see a more full analysis in matplotlib/matplotlib#26422

From a matplotlib perspective, I don't think we are prepared to take on a scipy dependency (I think it may even cause a deadlock in some instances... if I remember correctly)

Thus I'd argue that np.interp and np.roots should stay. (interp being a higher priority for us, as it is used often in the matplotlib codebase. roots is only used once, but it is in pretty core functionality)

I'm less worried about the window functions, as those are mostly used in tests/examples anyway, and those functions are simple enough to just include inline if needed (and when they are used in library code, it is for functions that I'd argue we should deprecate ourselves..., mostly).

I was also surprised to see np.prod in the category of "move to linalg", that seems like something a lot more general/useful than linalg. Feels incongruent to me to keep sum and cumprod but not prod.

I would also argue in favor of keeping np.genfromtxt (though that is more for non-matplotlib related reasons). Carting around ascii text files of numbers is a super common use case (unfortunately so, in some cases) genfromtxt is much better than stdlib options for parsing them, and I don't really want to do pandas.read_csv if all I want is numpy array anyway. (or if I don't have headings at all... which is all too common...)

Those are the big ones from my POV, though wouldn't exactly mind not having to update some of the others on that list, but not that bad either.

mtsokol · 2023-07-31T23:14:00Z

@ksunden Thank you for the feedback! Looking briefly once again I spotted a mistake in my list: for np.roots I meant move to np.polynomial, fixed it!

seberg · 2023-08-01T08:31:21Z

I think the list above is considerably more aggressive than NEP 52. There are things I believe we can even just delete, like practically all of those strange upper-case enums (very few to no one should be using those). But things like genfromtext is widely used, and even the windowing functions may be used often enough in tutorials/examples that I am not sure it is worth the trouble to deprecate them.
Also e.g. the scalar types shouldn't move (maybe them being in two places). Although it seems good to deprecate all aliases except the canonical name.

mtsokol · 2023-08-01T10:55:11Z

@ngoldbaum I renamed entires to np.dtypes. When it comes to scalar types: Sebastian mentioned in his comment that scalar types shouldn't move at the end.

I agree that first an option needs to be added to those np.nan* functions, and then deprecate them.
Thanks for pointing out! I removed "other aliases" explicitly.
I moved np.convolve back to main namespace - I think we need to agree what can be moved to np.linalg without being too disruptive (and if it makes sense to move anything at all).
I think I confused np.genfromtxt with something else - I moved it back.
recfromtxt and recfromcsv were deprecated in API: deprecate undocumented functions #24154

@ksunden I moved np.interp, np.prod, np.genfromtxt and windows functions back to main namespace.

@seberg I removed those upper-case enums from the main namespace.

ngoldbaum · 2023-08-01T14:46:06Z

The errors and warnings should probably also get deprecation warnings if they're explicitly imported from the main namespace and removed from __all__, they already have canonical locations in np.exceptions.

timhoffm · 2023-08-02T13:50:36Z

Do you want to remove one of the np.min/np.amin aliases (and likeswise for max)?

mtsokol · 2023-08-02T15:20:01Z

Do you want to remove one of the np.min/np.amin aliases (and likeswise for max)?

@timhoffm I think it would be more consistent to drop aliases and have only one function with a specific functionality. But I would assume that these core functions are heavily used, and a gain of reducing main namespace by two entries might not be worth breaking API here. But I don't have a strong opinion on that.

timhoffm · 2023-08-02T16:02:47Z

@mtsokol one can definitively argue both ways. I just wanted to bring this to the table. I also don't have the usage insight and knowledge on numpy policy priorities to decide what's reasonable here. - If you keep the aliases, I suggest to still bless one and discourage the other, so that at least new code will grow into a consistent direction.

rgommers · 2023-08-02T16:58:49Z

Thanks @mtsokol to get the ball rolling here! I'm first commenting on all the feedback, then I'll have to go back and add my own comments on the individual per-function proposals.

I think the list above is considerably more aggressive than NEP 52. There are things I believe we can even just delete, like practically all of those strange upper-case enums (very few to no one should be using those).

I agree with this. Most of the feedback here (in the whole thread so far) is quite useful and on-point. I think it can be incorporated and a next version of the table/plan posted for discussion.

Also e.g. the scalar types shouldn't move (maybe them being in two places). Although it seems good to deprecate all aliases except the canonical name.

+1 to them not moving. They should not be in two places, only stay where they are now. And yes, definitely let's deprecate and then remove all the aliases.

For np.nansum and other nan ufunc variants, maybe it makes sense to add the option, deprecate the nan variants, and leave e.g. np.nansum around for a release or two?

This seems desirable, but a project in and of itself. data-apis/array-api#621 is relevant here.

Moving dot and cross to np.linalg certainly makes sense, that said, those are both very commonly used functions and moving them is going to be pretty disruptive to python API users.

I think np.dot is too disruptive, and we have to keep it. cross may be good to move to linalg though, since it's there in the array API standard. For things like that, if we think outright removals are too disruptive, we could keep aliases around in the old locations (but only document them in the new location).

If you keep the aliases, I suggest to still bless one and discourage the other, so that at least new code will grow into a consistent direction.

Agreed. And we know what the canonical names are min/max in this case.

mtsokol · 2023-08-02T19:21:53Z

Hi @rgommers,
Here I share updated table after applying review comments. For scalars I removed aliases (sized and "other") and kept only canonical names. ~~The count of removed entries from main namespace is now 197.~~

[UPDATE 31.08] The final list is present in the PRs description.

rgommers · 2023-08-02T19:47:50Z

A few more comments on the list below. I think if we can quickly clean up a lot of the obvious ones (stray variables, aliases, enums, etc., then the list will get a lot shorter and easier to review. Another category is the ones that should definitely stay (e.g. everything in the array API standard), as well as numpy-specific heavily used functionality without a clear replacement (e.g. vectorize, set_printoptions, ). If we could put those in a "definitely staying" list, then we get to the "these are the ones left for discussion" list that will be pretty tractable to review in detail.

`'uint16' | -- remove, sized alias

Note that these are the canonical preferred names. These are the ones we want to keep, and in addition the canonical C names. Every other alias should go. The exceptions here are the longdouble aliases (float96, float128) because those may or may not exist, and hence are better removed.

'nanmean' || --

can you undo those? I don't think this will make it into 2.0, and we are likely to keep these aliases around for a long time even if the keyword idea arrives in time.

'result_type' | -- move to np.dtypes

It would be useful to double check your list with the array API standard. Anything that's in the main namespace there , like result_type, clearly should stay.

rot90, unwrap

These are about angles, and go together with rad2deg & co. I think they don't have an obviously better place to go, and are best left alone.

rkern · 2023-08-02T19:59:08Z

rot90 isn't really about angles in the way rad2deg and unwrap are. It goes with flip, transpose, etc. Agreed it should be left alone, though.

mtsokol · 2023-08-02T20:08:28Z

@rgommers Sure, I will prepare a shorter list for further discussion.

`'uint16' | -- remove, sized alias

Note that these are the canonical preferred names. These are the ones we want to keep, and in addition the canonical C names. Every other alias should go. The exceptions here are the longdouble aliases (float96, float128) because those may or may not exist, and hence are better removed.

Sure! I will revert it. To explain, for np.uint16?? I got :Canonical name: numpy.ushort therefore I considered it an alias. I will move back names with 16, 32, etc.

'nanmean' || --

can you undo those? I don't think this will make it into 2.0, and we are likely to keep these aliases around for a long time even if the keyword idea arrives in time.

Sure!

'result_type' | -- move to np.dtypes

It would be useful to double check your list with the array API standard. Anything that's in the main namespace there , like result_type, clearly should stay.

Sure! I will check it.

rot90, unwrap

These are about angles, and go together with rad2deg & co. I think they don't have an obviously better place to go, and are best left alone.

Will move these back.

ngoldbaum · 2023-08-02T20:17:11Z

I got :Canonical name: numpy.ushort therefore I considered it an alias. I will move back names with 16, 32, etc.

The bitsized integer types being aliases of the C integer types and not the other way around is a known issue. Ultimately it comes down to these typedefs. This also means that, depending on the platform, the C integer types may or may not be aliased to a bitsize.

@seberg recently attempted to make the C type names aliases to the bitsized types but it's complicated. For now I would focus on the other aliases and come back to the integer aliases later.

rgommers · 2023-08-02T20:48:47Z

I usually think of this purely from an end user focused API perspective. It doesn't matter one way or the other what the name is of the actual implementation under the hood. It's more what the docs say or what common practice is (e.g. code uses np.float64 in >9x% of cases, not np.double or another alias).

mtsokol · 2023-08-03T08:53:34Z

@rgommers here I share an updated list divided by three sections (remove, tentative and keep). I kept canonical and sized names for scalars.

[EDIT 22.08.2023] updated list to the latest version.
[EDIT 31.08.2023] final list is present in the PRs description.

rgommers · 2023-08-03T13:30:12Z

That looks quite nice and easier to review, thanks Mateusz!

The "remove" list looks pretty good; cross is the one that jumps out to me as needing to be in the "tentative" list instead.

The "keep" list looks pretty good too. The few objects that jump out are the ones with trailing underscores - in particular I think that False_/True_ are pretty pointless and can be removed. We also need the proper names back, so bool must be re-added and bool_ must be hidden (as an alias to np.bool_).

For the "tentative list", some more comments:

all the *sctype* things can be removed, that is explicitly covered in NEP 52,
np.matrix can't go, unfortunately. That one is so heavily used that we should deprecate it first. Which is still blocked by it being used in scipy.sparse,
asfarray has bad semantics and I think we should remove it (see MAINT: differentiable fns respect float width. Closes #15602 scipy/scipy#18481 (comment)), while ascontiguousarray and asfortranarray probably should stay,
issubclass_: remove it, I don't think that was ever meant to be public,
may_share_memory is an important function and should stay
get_include is important and there is no clear replacement, so it should stay

mtsokol · 2023-08-03T14:02:55Z

@rgommers Sure! I applied all points to my list and updated the comment.

I've got one thing to confirm about bool/bool_: Currently np.bool gives AttributeError with a message np.bool was a deprecated alias for the builtin bool. (I see the code that does it was introduced only 7 months ago)

rgommers · 2023-08-03T14:07:38Z

I've got one thing to confirm about bool/bool_: Currently np.bool gives AttributeError with a message np.bool was a deprecated alias for the builtin bool. (I see the code that does it was introduced only 7 months ago)

Yeah, best not to touch it now - it's a little complicated. I think the plan is to reintroduce np.bool for NumPy 2.0 (EDIT: as a numpy dtype, not as an alias to builtins.bool); that was the plan anyway and I think we need it for array API compatibility too. But it's possible I am forgetting something (and if so, I'm pretty sure @seberg will know what that is).

seberg · 2023-08-03T14:18:02Z

Might be a bit early to reintroduce it, but I am fine to do so. Also remember, at least it probably had a DeprecationWarning before that. Would lean to just not do anything about bool_ and similar ones for now, though. (unless we start hiding things from __dir__ only)

np.True_ and np.False_ need to stay though, IMO. They are clear and it makes it clear they are singletons. Also you would notice if you tried: they are used as repr which you would hav to adapt also then.

rgommers · 2023-08-03T14:24:14Z

Might be a bit early to reintroduce it, but I am fine to do so

I think it's pretty harmless, at least it's hard to imagine what would go wrong (typical usage was np.somefunc(..., dtype=np.bool) which will basically work unchanged).

np.True_ and np.False_ need to stay though, IMO. They are clear and it makes it clear they are singletons.

They're trivial to recover in the rare cases where you need them, right? Like so:

>>> np.bool_(False) is np.False_
True

If that is correct, they really have no business being in the top-level namespace.

seberg · 2023-08-03T14:44:23Z

Sure you can, lets see what others think. You will have to change the repr to np.bool(True) from np.True_, although that is easy. I do have a tendency to think that it doesn't hurt if these singletons remain and somehow for singletons that seems right to me. They seem used, but not that much, so at least long-term it probably doesn't matter.

rgommers · 2023-08-03T15:14:23Z

repr to np.bool(True) from np.True_,

I think that's not the repr?

>>> np.False_.__repr__()
'False'
>>> print(np.False_)
False

I do have a tendency to think that it doesn't hurt if these singletons remain and somehow for singletons that seems right to me.

I think it being a singleton should be an implementation detail that no one should rely on. Nor does it really matter. From an API perspective this looks to me like a weird object that is trivially reconstructed.

Sure you can, lets see what others think.

Agreed. Maybe there is a concrete use case someone can share?

seberg · 2023-08-03T15:24:07Z

The repr is np.True_ on main, although I started that process at a time when np.bool(True) would have been impossible! The choice remains, False is a bad repr, so we need to decide if we prefer np.True_ or np.bool(True).
I have a slight liking for exposing singletons, probably just for similarity to Pythons bools. Code search finds something is np.True_, but it is probably not common enough to worry about more than maybe not just yaking it out.

ngoldbaum · 2023-08-23T18:21:12Z

@mtsokol we chatted at the triage meeting about the tentative list and we ended up deciding that most of the tentative column that isn't already deprecated should be deprecated, except iscomplex, iscomplexobj, isreal, isrealobj, isscalar, True_, and False_ which should definitely stay in the main namespace.

DataSource is kind of on the bubble. It's not used much but also it's small and stable and not adding much maintenance burden. I'd suggest moving to the "stay" column but could probably be deprecated because there's minimal downstream use, but it probably needs a migration path for users.

Other less-used array utility functions can move to np.lib.array_utils. Functions that have no clear replacement in the existing NumPy API should stay in NumPy but be moved to np.lib.array_utils (except for recarray which already has a canonical home). There should be aliases in the main numpy namespace that raise deprecation warnings when these are imported directly. For items in the tentative list that have clear replacements using the public NumPy API, suggest using those instead and mark the whole item deprecated and slated for removal, but don't remove it immediately unless it has very few downstream users.

I think that covers everything, let me know if there are any other corner cases!

mattip · 2023-08-23T23:07:27Z

should be deprecated,

We also discussed perhaps creating a package to put on PyPI that would restore many of the removed functions from NumPy2.0. By pip-installing this package, users could continue to use their favorite aliases and functions that were removed, perhaps with a DeprecationWarning when importing or using some of them. Does that sound like a reasonable comprimise, or would it entail too much ongoing maintenance burden?

andyfaff · 2023-08-23T23:46:56Z

That sound like it has the potential to be a maintenance burden that could suffer from bitrot.

charris · 2023-08-23T23:47:35Z

It is good to keep in mind that the needed downstream changes should be minimal if we want to avoid a Python 3 situation. Folks can, and do, ignore deprecation warnings, the warnings don't break code.

mtsokol · 2023-08-23T23:49:39Z

We also discussed perhaps creating a package to put on PyPI that would restore many of the removed functions from NumPy2.0. By pip-installing this package, users could continue to use their favorite aliases and functions that were removed, perhaps with a DeprecationWarning when importing or using some of them. Does that sound like a reasonable comprimise, or would it entail too much ongoing maintenance burden?

If any of the functions/aliases that got removed in Part 1 or Part 2 should be still available and deprecated, I can restore them. If there should be a longer deprecation period I think it's straightforward to provide it.

So far removed items from the main namespace are internal enums, already deprecated functions and redundant aliases. Here's a complete list so far: https://github.com/numpy/numpy/blob/main/doc/source/release/2.0.0-notes.rst#numpy-20-python-api-removals

If all removed aliases and functions from the main namespace should be available even after NumPy 2.0 release, maybe it's better to keep them in a separate "1.x legacy" module and make them injected into main namespace after enabling a flag: np.enable_1_x_main_namespace()?
I would still argue that it's better to reduce scope of changes if it's too disruptive, and keep final design irreversible.

ngoldbaum · 2023-08-24T01:28:41Z

If any of the functions/aliases that got removed in Part 1 or Part 2 should be still available and deprecated, I can restore them. If there should be a longer deprecation period I think it's straightforward to provide it.

So far removed items from the main namespace are internal enums, already deprecated functions and redundant aliases.

I think we want to do this in such a way that guides users to the correct way to fix their code. One alternative to deprecating things is to break them, but make sure the error users see gives the migration path.

For example, if we remove aliases, we can do so in such a way that if a user imports the name or accesses it as an attribute of the numpy module they get an ImportError or AttributeError that gives them the migration path:

>>>np.NINF
AttributeError:NINF was removed in Numpy 2.0, use -np.inf instead

This allows us to do these renames and clean up namespaces, not leave the old names behind with an ignorable deprecation that only delays user pain, and break user code in such a way they hopefully immediately see what broke and how to fix their code.

rgommers · 2023-08-24T04:34:26Z

Thanks for the summary of that discussion @ngoldbaum. Overall that sounds quite good to me.

We also discussed perhaps creating a package to put on PyPI that would restore many of the removed functions from NumPy2.0.

I agree with @andyfaff here that this is a maintenance burden. Also, there is no evidence that this is necessary or even desired at this point. We should not do this for now; if there is enough demand it can easily be done quick around the RC period time.

Also, it's good to keep in mind that we already are doing, or are planning to do or at least are discussion the following:

clear error messages from __getattr__ with replacements (see comment by Nathan right above)
good release notes
a migration guide in the docs
a sed script to automatically clean up code as much as possible

At some point we've got to stop - the above list is enough. Also a reminder that we've got, according to our original planning, a little over 4 months left and a ton of work to do. We haven't even started on some of the main topics on our wish list. So let's please avoid more new work here that we didn't plan for, like a compat package.

It is good to keep in mind that the needed downstream changes should be minimal if we want to avoid a Python 3 situation

I've now heard this one too many times, so let me write down my assessment here. First, the change of turning the NumPy 1.26 to 2.0 transition into anything like the Python 2.7 to Python 3 one is extremely low. Second, if it does happen, the changes under discussion in this issue are quite unlikely to be the root cause. Instead, it'll be because either something went wrong with the C ABI change or our assessment of the impact of it. That is still the most impactful change we are planning for, and at this point it's still not completely clear what this will look like.¹

Regarding Python 3, the main causes were (highest-impact one first):

The major changes in string handling,
Large changes in the C API for which there was very little guidance or tooling to upgrade,
The low-quality early releases. Python 3.0-3.2 were worse than 2.7 and offered little of value,
Trivial-but-annoying changes to widely used things, e.g. making print a function.

This issue is about niche APIs and changes that are not hard to adapt to. It's most similar to (4), but a lot less impactful since we're not touching anything that's idiomatic or heavily used.

We do know pretty much for sure that it's going to cause a lot of breakage, since even packages like Pandas have failed to put <2.0 bounds on most of their releases. But we expect the disruptions to be short-lived, with maintainers of package putting out new releases and users to learn how to add constraints files. But still, pip install pandas==1.5 will remain broken. ↩

mtsokol · 2023-08-31T08:54:44Z

Hi all!
The final up-to-date list with only "remove" and "stay" columns is present in the PRs description.

seberg · 2023-09-05T13:11:05Z

I agree we have no big risk of a full blown Python 2-3 situation.

Unfortunately, transition may not be as smooth as we hope for mainly reasons:

C-API: how long will it take smaller libraries to recompile wheels against the newer NumPy (some might have to do more work, but that should be very uncommon)?
Python API: How many small libraries will just break mostly for silly reasons? How many of them will remain broken for a long time due to lack of active maintenance?

For Python 2-3, even the larger well maintained libs took a long time to transition I think. We will not have that problem for sure, I think (my main worry would be numba/cupy maybe due to promotion changes, but I think they may be OK with having a 95% fix for those; in practice they are only at 95% to begin with).

I don't think I believe in a compat package, downgrading NumPy seems OK as a hot-fix if we go that far.
I could see doing the normal deprecation cycle for anything that we see used downstream (e.g. in a github code search). Because that will remove almost all immediate disruption due to point 2 (yes, the downside is that it might just break later on, because the library is really unmaintained so giving them more time doesn't help).
(Maybe as a FutureWarning/VisibleDeprecationWarning, so that users cannot miss that the library they are using is broken.)

ngoldbaum · 2023-10-16T16:13:48Z

Discussion has died down in here and all of the PRs implementing the main namespace refactor have been merged. I'm going to close this.

I don't think there's much appetite for a compat package and instead I think we're going to point people to ruff to lint and update their code. If there are cases where it's not straightforward to write code that works the same in both NumPy 1 and 2 we should look closer and see if it's possible to add code in a NumPy 1 bugfix release or in NumPy 2 to ease writing code that works in both versions.

ksunden mentioned this issue Jul 31, 2023

[MNT]: Tracking issue for Numpy 2.0 transition matplotlib/matplotlib#26422

Closed

5 tasks

mtsokol changed the title ~~ENH: Overhaul of NumPy main namespace [NEP 52]~~ ENH: [WIP] Overhaul of NumPy main namespace [NEP 52] Jul 31, 2023

mtsokol mentioned this issue Aug 2, 2023

API: Cleaning numpy/__init__.py and main namespace - Part 1 [NEP 52] #24316

Merged

rgommers added 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. 01 - Enhancement labels Aug 2, 2023

This was referenced Aug 17, 2023

deprecate jax.numpy.issubsctype google/jax#17160

Merged

API: Cleaning numpy/__init__.py and main namespace - Part 4 [NEP 52] #24445

Merged

mtsokol changed the title ~~ENH: [WIP] Overhaul of NumPy main namespace [NEP 52]~~ ENH: Overhaul of NumPy main namespace [NEP 52] Aug 23, 2023

rgommers mentioned this issue Aug 24, 2023

Tracking issue: Python API cleanup for NumPy 2.0 (NEP 52) #23999

Closed

18 tasks

ngoldbaum added the Numpy 2.0 API Changes label Aug 24, 2023

rgommers mentioned this issue Aug 28, 2023

Document, deprecate or remove everything exposed in the "numpy" namespace #12385

Closed

mtsokol mentioned this issue Aug 28, 2023

API: Readd add_docstring and add_newdoc to np.lib #24564

Merged

mtsokol self-assigned this Aug 29, 2023

rgommers mentioned this issue Aug 30, 2023

MAINT: Remove deprecated functions [NEP 52] #24477

Merged

mtsokol mentioned this issue Aug 30, 2023

API: Cleaning numpy/__init__.py and main namespace - Part 5 [NEP 52] #24587

Merged

ngoldbaum closed this as completed Oct 16, 2023

ngoldbaum mentioned this issue Nov 6, 2023

API: Add and redefine numpy.bool [Array API] #25080

Merged

lucascolley mentioned this issue Nov 12, 2023

Deprecate the aliased constants in the top-level namespaces #13705

Closed

ENH: Overhaul of NumPy main namespace [NEP 52] #24306

ENH: Overhaul of NumPy main namespace [NEP 52] #24306

Comments

mtsokol commented Jul 31, 2023 • edited

Footnotes

ngoldbaum commented Jul 31, 2023

ngoldbaum commented Jul 31, 2023

ksunden commented Jul 31, 2023

mtsokol commented Jul 31, 2023

seberg commented Aug 1, 2023

mtsokol commented Aug 1, 2023

ngoldbaum commented Aug 1, 2023

timhoffm commented Aug 2, 2023

mtsokol commented Aug 2, 2023

timhoffm commented Aug 2, 2023

rgommers commented Aug 2, 2023 • edited

mtsokol commented Aug 2, 2023 • edited

rgommers commented Aug 2, 2023

rkern commented Aug 2, 2023

mtsokol commented Aug 2, 2023

ngoldbaum commented Aug 2, 2023 • edited

rgommers commented Aug 2, 2023

mtsokol commented Aug 3, 2023 • edited

rgommers commented Aug 3, 2023

mtsokol commented Aug 3, 2023

rgommers commented Aug 3, 2023 • edited

seberg commented Aug 3, 2023

rgommers commented Aug 3, 2023 • edited

seberg commented Aug 3, 2023 • edited

rgommers commented Aug 3, 2023

seberg commented Aug 3, 2023

ngoldbaum commented Aug 23, 2023 • edited

mattip commented Aug 23, 2023

andyfaff commented Aug 23, 2023

charris commented Aug 23, 2023 • edited

mtsokol commented Aug 23, 2023

ngoldbaum commented Aug 24, 2023 • edited

rgommers commented Aug 24, 2023

Footnotes

mtsokol commented Aug 31, 2023

seberg commented Sep 5, 2023

ngoldbaum commented Oct 16, 2023

mtsokol commented Jul 31, 2023 •

edited

rgommers commented Aug 2, 2023 •

edited

mtsokol commented Aug 2, 2023 •

edited

ngoldbaum commented Aug 2, 2023 •

edited

mtsokol commented Aug 3, 2023 •

edited

rgommers commented Aug 3, 2023 •

edited

rgommers commented Aug 3, 2023 •

edited

seberg commented Aug 3, 2023 •

edited

ngoldbaum commented Aug 23, 2023 •

edited

charris commented Aug 23, 2023 •

edited

ngoldbaum commented Aug 24, 2023 •

edited