Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support casting Int arrays with nulls to np.float? #37460

Closed
arw2019 opened this issue Oct 28, 2020 · 2 comments · Fixed by #55058
Closed

ENH: support casting Int arrays with nulls to np.float? #37460

arw2019 opened this issue Oct 28, 2020 · 2 comments · Fixed by #55058
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Discussion Requires discussion from core team before further action

Comments

@arw2019
Copy link
Member

arw2019 commented Oct 28, 2020

xref numpy/numpy#17659

I would expect numpy to automatically convert such arrays into a float type and fill with np.nan. However, for some reason it converts it to object:

pd.DataFrame({'col': [1, np.nan, 3]}).astype('UInt8').values.dtype

which leads to errors like this one

np.nanmax(pd.DataFrame({'col': [1, np.nan, 3]}).astype('UInt8').values)

returns "TypeError: boolean value of NA is ambiguous"

Is that expected ?

Personally I don't have a strong opinion re: whether it's a good idea to support this

@arw2019 arw2019 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 28, 2020
@jorisvandenbossche
Copy link
Member

We discussed this (somewhat) before when the nullable integer dtype was first introduced, and then also when pd.NA was introduced / in the nullable float dtype discussion.

It's a difficult topic. On the one hand, automatically using np.nan as missing value indicator is not fully correct (it looses information, and np.nan behaves differently than pd.NA) and you could therefore argue that it should take explicit action from the user to ask for it.
On the other hand, most code that is written to work on numpy arrays will typically be able to handle np.nan, but none will be able to handle pd.NA, making this default behaviour of using object dtype in the conversion of nullable arrays to numpy probably rather annoying.

@Kreol64
Copy link

Kreol64 commented Oct 29, 2020

@jorisvandenbossche , I would vote for backward compatibility with the existing users` codebase that includes thousands of np.nan*() functions applied to pandas columns/group by statements.

@mzeitlin11 mzeitlin11 added Dtype Conversions Unexpected or buggy dtype conversions NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 1, 2021
@mroeschke mroeschke added the Needs Discussion Requires discussion from core team before further action label Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants