Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fillna(0) does not fill NaNs for Float64Arrays #39926

Open
2 tasks done
evyasonov opened this issue Feb 20, 2021 · 13 comments
Open
2 tasks done

BUG: fillna(0) does not fill NaNs for Float64Arrays #39926

evyasonov opened this issue Feb 20, 2021 · 13 comments
Labels
Bug Ice Cream Agreement Issues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprint NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Discussion Requires discussion from core team before further action

Comments

@evyasonov
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.


Code Sample, a copy-pastable example

f = pandas.DataFrame(data = [[0, 0]], columns = ['A', 'B']).astype('Int64')
f['C'] = f['A'] / f['B']
f.fillna(0)

Output:

	A	B	C
0	0	0	NaN

Problem description

fillna does not work as expected. It do not fill na

Expected Output

	A	B	C
0	0	0	0

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 7d32926
python : 3.7.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.2.2
numpy : 1.18.3
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.3.3.post20210118
Cython : 0.29.21
pytest : 6.2.1
hypothesis : None
sphinx : 3.4.3
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : 0.3.3
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : 0.16.0
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.21
tables : 3.6.1
tabulate : 0.8.7
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.51.2

@evyasonov evyasonov added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 20, 2021
@phofl
Copy link
Member

phofl commented Feb 20, 2021

I don't think that fillna is the issue here. The vaue in C should be pd.NA not np.nan. I am not sure if fillna is supposed to work for np.nan in a Float64 Array. Maybe @jorisvandenbossche can help here?

@phofl phofl added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Feb 20, 2021
@evyasonov
Copy link
Author

@phofl
The point that according to documentation I expect that fillna will fill all nans regardless of the origin. It's really confusing when it does not do its job..

@phofl
Copy link
Member

phofl commented Feb 21, 2021

I am not sure if this is the expected behavior here, hence I pinged joris.

If this is the expected behavior in case of np.nan in Float64Arrays, then we should adjust the documnetation, if not then we got 2 unrelated bugs here.

@simonjayhawkins
Copy link
Member

the discussion on how to interpret np.nan in a nullable float array is ongoing. #32265

so for now this is not yet a bug. but indeed as part of #32265 the documentation may need to be updated to clarify the behavior for nullable float arrays.

@simonjayhawkins simonjayhawkins added Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 21, 2021
@phofl
Copy link
Member

phofl commented Feb 21, 2021

Thx, then let's repurpose this:

f = pandas.DataFrame(data = [[0, 0]], columns = ['A', 'B']).astype('Int64')
f['C'] = f['A'] / f['B']

this should return

   A  B     C
0  0  0  <NA>

and not

   A  B     C
0  0  0  NaN

@phofl phofl added Numeric Operations Arithmetic, Comparison, and Logical operations and removed Closing Candidate May be closeable, needs more eyeballs labels Feb 21, 2021
@phofl
Copy link
Member

phofl commented Feb 21, 2021

Could maybe someone rename the issue?

@simonjayhawkins
Copy link
Member

I think the return of f['C'] = f['A'] / f['B'] is correct. A and B are not missing values and therefore C is not missing but correctly np.NAN.

@simonjayhawkins simonjayhawkins changed the title BUG: fillna(0) does not fill NaNs BUG: fillna(0) does not fill NaNs for Float64Arrays Feb 21, 2021
@phofl
Copy link
Member

phofl commented Feb 21, 2021

In this case forgot what I have said. Did not think about this.

@phofl phofl added Closing Candidate May be closeable, needs more eyeballs and removed Numeric Operations Arithmetic, Comparison, and Logical operations labels Feb 21, 2021
@jorisvandenbossche jorisvandenbossche added API Design Needs Discussion Requires discussion from core team before further action and removed Closing Candidate May be closeable, needs more eyeballs labels Feb 24, 2021
@jorisvandenbossche
Copy link
Member

Indeed, as @simonjayhawkins mentioned, this is still being discussed in #32265. So for now this is a bit "grey area".

There are two different aspects being mentioned here (and I think both are still open for discussion):

  • Should 0/0 in a nullable integer column result in NaN or NA? (strictly speaking it is NaN (to follow numpy), but we could also decide to deviate and make this NA instead to keep it closer to existing behaviour)
  • Should NaN (if present, in addition to NA) also be interpreted as a "missing value" or not (eg return true for isna(), get filled with fillna, etc)?

But so that's discussion for #32265, which is a bit stalled at the moment, but something we need to revive.

(let's keep the issue open for now, to have a reference for this aspect if people search for it, until we make a final decision on the topic)

@jbrockmendel
Copy link
Member

The OP is right that f['A'] / f['B'] should have a np.nan entry, not a pd.NA entry. If we take pd.NA seriously as meaning "missing", then it unequivocally is not missing when we do division by zero. If we don't take it seriously, then Float64Dtype becomes pointless.

@Me-williams
Copy link

fillna(0) does seem to catch np.nan for me in pandas 1.3.3, however, if using it on a slice of a dataframe inplace doesn't work

@YunJD
Copy link

YunJD commented Apr 10, 2022

I'm not sure what to make of this, I'm wondering if it's a symptom of something else. replace([pd.NA, np.NaN], 0) is not working after following the division steps. In that sense I'm wondering how one actually goes about filling in values aside from changing the dtype -- np.dtype('int64') responds to both .replace and .fillna? Also, I get different return values using .iloc. Full sample code:

>>> df = pd.DataFrame({'a': [0,0], 'b': [0,0]}, dtype=pd.Int64Dtype())
>>> df
   a  b
0  0  0
1  0  0
>>> df['c'] = df.a / df.b
>>> df
   a  b    c
0  0  0  NaN
1  0  0  NaN
>>> df.replace([pd.NA, np.NaN], 0)
   a  b    c
0  0  0  NaN
1  0  0  NaN
>>> df.iloc[0].c # They print out different values?
<NA>

Using np.dtype().

>>> df = pd.DataFrame({'a': [0,0], 'b': [0,0]}, dtype=np.dtype('int64'))
>>> df['c'] = df.a / df.b
>>> df
   a  b   c
0  0  0 NaN
1  0  0 NaN
>>> df.replace(np.NaN, 0)
   a  b    c
0  0  0  0.0
1  0  0  0.0
>>> df.fillna(0)
   a  b    c
0  0  0  0.0
1  0  0  0.0

@jbrockmendel jbrockmendel added the Ice Cream Agreement Issues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprint label Oct 26, 2023
@HenriqueAJNB
Copy link

Can someone explain what is the difference of float64 and Float64? Float64 seems not to be consider a numeric type and breaking a lot of math operations using it, including .fillna() and all missing value related methods like .replace(np.NaN, 0), pd.interpolate...

Saw in this SO question and I'm having problems with it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Ice Cream Agreement Issues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprint NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

9 participants