Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: FloatingArray constructor with input of all missing values: np.nan vs. pd.NA #38751

Open
arw2019 opened this issue Dec 28, 2020 · 3 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Ice Cream Agreement Issues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprint Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Discussion Requires discussion from core team before further action

Comments

@arw2019
Copy link
Member

arw2019 commented Dec 28, 2020

Not sure this is really an issue but maybe(or not?) a slight inconsistency.

The following throws:

In [18]: import numpy as np
    ...: import pandas as pd
    ...: 
    ...: arr = pd.array([pd.NA, pd.NA], dtype="float")
    ...: ser = pd.Series(arr)
    ...: pd.to_numeric(ser, downcast="float")
    ...: 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-c7fb0ba8f6eb> in <module>
      2 import pandas as pd
      3 
----> 4 arr = pd.array([pd.NA, pd.NA], dtype="float")
      5 ser = pd.Series(arr)
      6 pd.to_numeric(ser, downcast="float")

~/repos/pandas/pandas/core/construction.py in array(data, dtype, copy)
    344         return TimedeltaArray._from_sequence(data, dtype=dtype, copy=copy)
    345 
--> 346     result = PandasArray._from_sequence(data, dtype=dtype, copy=copy)
    347     return result
    348 

~/repos/pandas/pandas/core/arrays/numpy_.py in _from_sequence(cls, scalars, dtype, copy)
    178             dtype = dtype._dtype
    179 
--> 180         result = np.asarray(scalars, dtype=dtype)
    181         if copy and result is scalars:
    182             result = result.copy()

~/anaconda3/envs/pandas-dev/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

TypeError: float() argument must be a string or a number, not 'NAType'

but with np.nan it runs fine:

In [19]: import numpy as np
    ...: import pandas as pd
    ...: 
    ...: arr = pd.array([np.nan, np.nan], dtype="float")
    ...: ser = pd.Series(arr)
    ...: pd.to_numeric(ser, downcast="float")
    ...: 
Out[19]: 
0   NaN
1   NaN
dtype: float32
@arw2019 arw2019 added Bug Needs Triage Issue that has not been reviewed by a pandas team member Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Dtype Conversions Unexpected or buggy dtype conversions and removed Bug labels Dec 28, 2020
@jorisvandenbossche
Copy link
Member

Note that this is not directly related to to_numeric, as it is the pd.array() construction that fails:

In [11]: pd.array([pd.NA, pd.NA], dtype="float")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-73361050ec83> in <module>
----> 1 pd.array([pd.NA, pd.NA], dtype="float")

~/scipy/pandas/pandas/core/construction.py in array(data, dtype, copy)
    344         return TimedeltaArray._from_sequence(data, dtype=dtype, copy=copy)
    345 
--> 346     result = PandasArray._from_sequence(data, dtype=dtype, copy=copy)
    347     return result
    348 

~/scipy/pandas/pandas/core/arrays/numpy_.py in _from_sequence(cls, scalars, dtype, copy)
    178             dtype = dtype._dtype
    179 
--> 180         result = np.asarray(scalars, dtype=dtype)
    181         if copy and result is scalars:
    182             result = result.copy()

~/miniconda3/envs/dev/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

TypeError: float() argument must be a string or a number, not 'NAType'

And this is because with dtype="float" it actually tries to make a numpy-based float array, not a nullable pandas FloatingArray. And therefore it tries to convert pd.NA to a float, under the hood the error comes from:

In [12]: float(pd.NA)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-7541b87f4222> in <module>
----> 1 float(pd.NA)

TypeError: float() argument must be a string or a number, not 'NAType'

@arw2019 arw2019 changed the title BUG: pd.to_numeric with exclusively missing value input: np.nan vs. pd.NA BUG: FloatingArray constructor with input of all missing values: np.nan vs. pd.NA Dec 30, 2020
@arw2019
Copy link
Member Author

arw2019 commented Dec 30, 2020

Updated the title to reflect the actual issue.

Are we ok with this behavior or is it something we want to "fix"?

@arw2019 arw2019 added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 30, 2020
@jorisvandenbossche
Copy link
Member

I don't think it's something we plan to fix on the short term.
At some point in the future, we might want that those lower-case names like "float" will mean the nullable dtypes instead of the plain numpy ones within a pandas context. But then this issue will be resolved automatically (since converting all NA list to nullable float is already working).

@mroeschke mroeschke added the Bug label Aug 14, 2021
@jbrockmendel jbrockmendel added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Jan 8, 2022
@jbrockmendel jbrockmendel added the Ice Cream Agreement Issues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprint label Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Ice Cream Agreement Issues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprint Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants