Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: replace casts columns to object #32988

Closed
dmh43 opened this issue Mar 24, 2020 · 5 comments · Fixed by #34048
Closed

REGR: replace casts columns to object #32988

dmh43 opened this issue Mar 24, 2020 · 5 comments · Fixed by #34048
Labels
Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version replace replace method
Milestone

Comments

@dmh43
Copy link

dmh43 commented Mar 24, 2020

Calling df.replace casts columns to object

pd.DataFrame(np.eye(2)).replace(to_replace=[None, -np.inf, np.inf], value=pd.NA).dtypes
# 0    object
# 1     object
# dtype: object

Problem description

I'd expect that the dtypes of the columns remain the same after replacing with pd.NA (especially considering no values are replaced in the above call). We do not get this issue if to_replace is any subset of [None, -np.inf, np.inf]. We get the same issue if value is instead np.nan.

Expected Output

# 0    float64
# 1     float64
# dtype: object

Output of pd.show_versions()

Installed with `pip install -e .` on current master. Was also able to reproduce on `1.0.1`.

INSTALLED VERSIONS

commit : bed9103
python : 3.7.0.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
Version : Darwin Kernel Version 18.7.0: Tue Aug 20 16:57:14 PDT 2019; root:xnu-4903.271.2~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.26.0.dev0+2701.gbed9103e5
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.1.post20200323
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@phofl
Copy link
Member

phofl commented Mar 26, 2020

I debugged this a bit. The reason seems to be here:
https://github.com/pandas-dev/pandas/blob/master/pandas/core/internals/blocks.py#L736

            return block.replace(
                to_replace=to_replace,
                value=value,
                inplace=inplace,
                filter=filter,
                regex=regex,
                convert=convert,
            )

The function tries the replace again after converting the FloatBlock to an ObjectBlock. We could try to convert the ObjectBlock back after executing the replace. I tried this for the example mentioned above and ran the relevant (I hope I ran all relevant :)) tests.

I committed the small change here. If this is a desired fix, I would add tests and create a pull request.

@dmh43
Copy link
Author

dmh43 commented Mar 27, 2020

Thanks for looking into this! Unfortunately, I'm not familiar enough with pandas internals to review your changes.

@simonjayhawkins
Copy link
Member

We get the same issue if value is instead np.nan

this was working in 0.25.3 so this is a regression

>>> import numpy as np
>>>
>>> import pandas as pd
>>>
>>> pd.__version__
'0.25.3'
>>>
>>> pd.DataFrame(np.eye(2)).replace(to_replace=[None, -np.inf, np.inf], value=np.nan).dtypes
0    float64
1    float64
dtype: object
>>>
>>> import numpy as np
>>>
>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1008.g60b0e9fbc'
>>>
>>> pd.DataFrame(np.eye(2)).replace(to_replace=[None, -np.inf, np.inf], value=np.nan).dtypes
0    object
1    object
dtype: object

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Mar 28, 2020
@simonjayhawkins
Copy link
Member

simonjayhawkins commented Mar 28, 2020

this was working in 0.25.3 so this is a regression (#27768 i.e. 1.0.0)

01f90c1 is the first bad commit
commit 01f90c1
Author: jbrockmendel jbrockmendel@gmail.com
Date: Mon Aug 12 11:58:42 2019 -0700

CLN: short-circuit case in Block.replace (#27768)

@simonjayhawkins simonjayhawkins added the Dtype Conversions Unexpected or buggy dtype conversions label Mar 29, 2020
@simonjayhawkins
Copy link
Member

cc @jbrockmendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version replace replace method
Projects
None yet
5 participants