-
-
Couldn't load subscription status.
- Fork 19.2k
Closed
Labels
Arrowpyarrow functionalitypyarrow functionalityBugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateUpstream issueIssue related to pandas dependencyIssue related to pandas dependency
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
def test_show_double_pyarrow_has_issue_when_fillna():
"""
Very strange behavior observed when using pyarrow 'double' dtype with pandas Series.
When using `fillna(0.0)`, the second element which is supposed no op becomes is "filled".
"""
for raw_ls in [
[1, 2, 3, 4, 5, pd.NA],
[1, 2, 3, 4, pd.NA, 6],
[1, 2, pd.NA, 4, 5, 6],
# [pd.NA, 2, 3, 4, 5, 6], # this works fine when pd.NA is first
# [1, 2, 3, 4, pd.NA], # this works fine when there are five elements
# [1, 2, 3, pd.NA, pd.NA, 6], # this works fine when there are two pd.NA
]:
def get_series(ls):
return pd.Series(ls, dtype="double[pyarrow]")
s = get_series(raw_ls)
s_filled = s.fillna(0.0)
zero_counts = (s_filled == 0.0).sum()
assert zero_counts == 2
assert raw_ls[1] != 0 and not(pd.isna(raw_ls[1])) and s_filled.iloc[1] == 0.0
def test_show_int_pyarrow_has_issue_when_fillna():
"""
Similar to double dtype, very strange behavior observed when using pyarrow 'int64' dtype.
"""
for raw_ls in [
[1, 2, 3, 4, 5, pd.NA],
[1, 2, 3, 4, pd.NA, 6],
[1, 2, pd.NA, 4, 5, 6],
# [pd.NA, 2, 3, 4, 5, 6], # this works fine when pd.NA is first
# [1, 2, 3, 4, pd.NA], # this works fine when there are five elements
# [1, 2, 3, pd.NA, pd.NA, 6], # this works fine when there are two pd.NA
]:
def get_series(ls):
return pd.Series(ls, dtype="int64[pyarrow]")
s = get_series(raw_ls)
s_filled = s.fillna(0)
zero_counts = (s_filled == 0).sum()
assert zero_counts == 2
assert raw_ls[1] != 0 and not(pd.isna(raw_ls[1])) and s_filled.iloc[1] == 0
test_show_double_pyarrow_has_issue_when_fillna()
test_show_int_pyarrow_has_issue_when_fillna()Issue Description
There are some issues when using fillna with pyarrow numeric series len >= 6. It will some row that is not supposed to be "filled."
Expected Behavior
Only the rows with pd.NA should be filled.
Installed Versions
INSTALLED VERSIONS
commit : 9c8bc3e
python : 3.13.3
python-bits : 64
OS : Windows
OS-release : 11
pandas : 2.3.3
numpy : 2.3.3
pyarrow : 21.0.0
Metadata
Metadata
Assignees
Labels
Arrowpyarrow functionalitypyarrow functionalityBugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateUpstream issueIssue related to pandas dependencyIssue related to pandas dependency