Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame.fillna() doesn't work for 0/0 with Int64Dtype #46677

Closed
2 of 3 tasks
YunJD opened this issue Apr 7, 2022 · 3 comments
Closed
2 of 3 tasks

BUG: DataFrame.fillna() doesn't work for 0/0 with Int64Dtype #46677

YunJD opened this issue Apr 7, 2022 · 3 comments
Labels
Bug Duplicate Report Duplicate issue or pull request NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@YunJD
Copy link

YunJD commented Apr 7, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame({'a': [0.0,0.0], 'b':0})
df.b = df.b.astype(pd.Int64Dtype())
df['c'] = df.a / df.b
df.fillna(0.0) # Doesn't fill
df.iloc[0].fillna(0.0) # Does fill

Issue Description

As shown above, creating a pd.NA value using pd.Int64Dtype as the dtype causes .fillna() to malfunction. .fillna(0.0) still works after using .iloc[0].

This behavior suddenly manifested when I upgraded to version 1.4.2. I am grabbing a table from Big Query and some of the columns are set to Int64Dtype.

Expected Behavior

import pandas as pd
df = pd.DataFrame({'a': [0.0,0.0], 'b':0})
df.b = df.b.astype(pd.Int64Dtype())
df['c'] = df.a / df.b
df.fillna(0.0)

# Should output:
"""
     a  b    c
0  0.0  0  0.0
1  0.0  0  0.0
"""

# But got:
"""
     a  b    c
0  0.0  0  NaN
1  0.0  0  NaN
"""

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.8.9.final.0
python-bits : 64
OS : Darwin
OS-release : 21.4.0
Version : Darwin Kernel Version 21.4.0: Mon Feb 21 20:35:58 PST 2022; root:xnu-8020.101.4~2/RELEASE_ARM64_T6000
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 21.3.1
setuptools : 62.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.1
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
brotli : None
fastparquet : None

@YunJD YunJD added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 7, 2022
@phofl
Copy link
Member

phofl commented Apr 8, 2022

duplicate of #39926

@phofl phofl closed this as completed Apr 8, 2022
@phofl phofl added NA - MaskedArrays Related to pd.NA and nullable extension arrays Duplicate Report Duplicate issue or pull request and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 8, 2022
@simonjayhawkins
Copy link
Member

.fillna(0.0) still works after using .iloc[0]

There maybe an indexing issue here that is not explicitly discussed in #39926 which may have created the confusion.

df.iloc[0] gives

a     0.0
b     0.0
c    <NA>
Name: 0, dtype: Float64

if we are preseving np.nan in FloatArray, I think this should be

a     0.0
b     0.0
c    NaN
Name: 0, dtype: Float64

as for instance, df["c"] gives

0    NaN
1    NaN
Name: c, dtype: Float64

This behavior suddenly manifested when I upgraded to version 1.4.2. I am grabbing a table from Big Query and some of the columns are set to Int64Dtype.

@YunJD AFAICT The code sample in the OP gave the same output in prior versions of pandas. Have you got an reproducible code sample (preferably without using Big Query) that demonstrates the behavioral change

@YunJD
Copy link
Author

YunJD commented Apr 10, 2022

@simonjayhawkins I can confirm the behaviour was consistent in 1.4.0.

Reverting back to our old configuration, the type was indeed using numpy.dtype('int64') rather than Int64Dtype(). I have some other details that I find curious, and I will jump on the other open issue to add them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

No branches or pull requests

3 participants