ENH: New Name for "numpy_nullable" dtype_backend #59032

WillAyd · 2024-06-17T14:14:45Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

Many I/O methods today accept a "numpy_nullable" argument for the dtype_backend= parameter. While historically our extension arrays exclusively used NumPy, this is no longer true with the string dtype so the name "numpy_nullable" is a misnomer.

Feature Description

To make for a less confusing API, I would suggest adding "pandas_nullable" or maybe even just "pandas" as an argument. This can have the exact same behavior as "numpy_nullable" today but abstracts and corrects the semantics. "numpy_nullable" can be slowly deprecated over time

Alternative Solutions

n/a

Additional Context

dtype_backend="pandas" would also make for a smoother transition into the logical type system proposed as part of PDEP-13 #58455

...but even if that PDEP is not accepted, I still see value in changing the value "numpy_nullable" to something else

WillAyd · 2024-06-17T14:15:40Z

@jorisvandenbossche maybe a good follow up to the discussion we had as part of PDEP-14

WillAyd · 2024-07-11T18:39:13Z

@pandas-dev/pandas-core this wasn't major enough to include as part of PDEP-14, but I think is a logical follow up to clean up semantics. Curious what others may think

Dr-Irv · 2024-07-11T18:46:38Z

I think it should be pandas_nullable . Keeps options open with respect to the whole pd.NA/np.nan discussion

chaarvii · 2024-07-22T21:54:31Z

Hey! I’d like to work on this

chaarvii · 2024-07-22T21:54:35Z

Take

WillAyd · 2024-08-01T03:10:28Z

Any other team feedback on this? I think would be good to use the new name starting with 3.0

simonjayhawkins · 2024-08-01T14:21:26Z

We have pandas.api.types.pandas_dtype where we Convert input into a pandas only dtype object ... and this returns np.dtype or a pandas dtype.

Given that the term “pandas dtype” already has a precedent, using dtype_backend="pandas" would indeed align well with existing conventions. It provides clarity and maintains consistency.

WillAyd · 2024-08-01T14:55:38Z

I also have a slight preference for pandas because it is shorter, and I don't see us every introducing a non-nullable type system, so "_nullable" is superfluous

jorisvandenbossche · 2024-08-01T14:56:23Z

On the other hand, when specifying dtype_backend="pyarrow", you also get back a "pandas dtype" in that sense (i.e. a pandas ExtensionDtype subclass). And at the same time, some of the non-nullable default dtypes we have are also pandas dtypes.

So I don't think dtype_backend="pandas" is an ideal naming, but I also don't have any better suggestion ..

jbrockmendel · 2024-08-01T15:00:32Z

masked

WillAyd · 2024-08-01T15:06:58Z

On the other hand, when specifying dtype_backend="pyarrow", you also get back a "pandas dtype" in that sense (i.e. a pandas ExtensionDtype subclass). And at the same time, some of the non-nullable default dtypes we have are also pandas dtypes.

That's true as a matter of implementation, but I don't think end users are going to know that

Dr-Irv · 2024-08-01T15:14:57Z

So I don't think dtype_backend="pandas" is an ideal naming, but I also don't have any better suggestion ..

I did suggest pandas_nullable above. I think I may have been the one to introduce the word "nullable" into our lexicon. So if we use pandas_nullable, it's clear that we are storing a pandas rep of missing values in the backend. I'm concerned that just using pandas could prevent some other usage that we don't see now, but want to introduce in the future.

WillAyd · 2024-08-01T15:52:31Z

That's a fair point, though I'm not sure that adding _nullable prevents that. I think that would only prevent an issue if we decided to offer non-nullable types

Dr-Irv · 2024-08-01T17:22:09Z

That's a fair point, though I'm not sure that adding _nullable prevents that. I think that would only prevent an issue if we decided to offer non-nullable types

Or offer something else that we can't foresee today

simonjayhawkins · 2024-08-01T17:26:48Z

On the other hand, when specifying dtype_backend="pyarrow", you also get back a "pandas dtype" in that sense (i.e. a pandas ExtensionDtype subclass). And at the same time, some of the non-nullable default dtypes we have are also pandas dtypes.

So I don't think dtype_backend="pandas" is an ideal naming, but I also don't have any better suggestion ..

PyArrow types indeed are pandas extension types, enhancing the functionality of the base PyArrow library to suit our use case of backing DataFrames or Series.

We don't always rigidly adhere to the behavior of NumPy arrays for a Series with a NumPy dtype. We allow expansion, upcasting, and other conversions that may diverge from NumPy behavior, even though we return a NumPy type as the dtype.

But I see no problems when we use the terms "pyarrow" or "numpy" when we talk about the backend. So it would seem reasonable to me to use the term "pandas" to describe the pandas nullable extension types.

I did suggest pandas_nullable above. I think I may have been the one to introduce the word "nullable" into our lexicon. So if we use pandas_nullable, it's clear that we are storing a pandas rep of missing values in the backend. I'm concerned that just using pandas could prevent some other usage that we don't see now, but want to introduce in the future.

The dtype_backend argument is forward-thinking, enabling early adoption of experimental data types that aren't currently the default.

Presently, the available options for dtype_backend in I/O methods and .convert_dtypes are limited to 'numpy_nullable' and 'pyarrow'.

If we aim to allow users to continue using legacy types even when nullable types become the default, introducing an additional argument makes sense.

Considering package names, options like pyarrow, pandas, and numpy would be meaningful, clear, concise, and consistent choices?

WillAyd · 2024-08-03T19:24:01Z

I'm on board with what @simonjayhawkins is suggesting - pyarrow, pandas, and numpy as arguments reflect the core of the type system evolution, even if they may not be 100% technically accurate

WillAyd · 2024-08-03T19:25:33Z

If we do decide on those terms, I also wonder if we should change the default value of None to "numpy"

WillAyd added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 17, 2024

WillAyd added API Design and removed Enhancement labels Jun 17, 2024

github-actions bot assigned chaarvii Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: New Name for "numpy_nullable" dtype_backend #59032

ENH: New Name for "numpy_nullable" dtype_backend #59032

WillAyd commented Jun 17, 2024 •

edited

Loading

WillAyd commented Jun 17, 2024

WillAyd commented Jul 11, 2024

Dr-Irv commented Jul 11, 2024

chaarvii commented Jul 22, 2024

chaarvii commented Jul 22, 2024

WillAyd commented Aug 1, 2024

simonjayhawkins commented Aug 1, 2024

WillAyd commented Aug 1, 2024

jorisvandenbossche commented Aug 1, 2024 •

edited

Loading

jbrockmendel commented Aug 1, 2024

WillAyd commented Aug 1, 2024

Dr-Irv commented Aug 1, 2024

WillAyd commented Aug 1, 2024

Dr-Irv commented Aug 1, 2024

simonjayhawkins commented Aug 1, 2024

WillAyd commented Aug 3, 2024

WillAyd commented Aug 3, 2024

ENH: New Name for "numpy_nullable" dtype_backend #59032

ENH: New Name for "numpy_nullable" dtype_backend #59032

Comments

WillAyd commented Jun 17, 2024 • edited Loading

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

WillAyd commented Jun 17, 2024

WillAyd commented Jul 11, 2024

Dr-Irv commented Jul 11, 2024

chaarvii commented Jul 22, 2024

chaarvii commented Jul 22, 2024

WillAyd commented Aug 1, 2024

simonjayhawkins commented Aug 1, 2024

WillAyd commented Aug 1, 2024

jorisvandenbossche commented Aug 1, 2024 • edited Loading

jbrockmendel commented Aug 1, 2024

WillAyd commented Aug 1, 2024

Dr-Irv commented Aug 1, 2024

WillAyd commented Aug 1, 2024

Dr-Irv commented Aug 1, 2024

simonjayhawkins commented Aug 1, 2024

WillAyd commented Aug 3, 2024

WillAyd commented Aug 3, 2024

WillAyd commented Jun 17, 2024 •

edited

Loading

jorisvandenbossche commented Aug 1, 2024 •

edited

Loading