BUG: breaking change in df.replace() from 1.0.5 to 1.1.0 #35931

agnesbao · 2020-08-27T16:45:23Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
df = pd.DataFrame({"age": range(1,100)})
df["age"] = pd.cut(df["age"], [0, 50, 65, 99], right=False)
df.replace(to_replace={
    "age":{
        pd.Interval(0, 50, closed='left'): "Ages < 50",
        pd.Interval(50, 65, closed='left'): "Ages 50-64",
        pd.Interval(65, 99, closed='left'): "Ages 65+"
    }
})

Problem description

The above sample code works in 1.0.5 but error out in 1.1.0 and above with error msg:

TypeError: Cannot compare types 'ndarray(dtype=object)' and 'Interval'

It only happens with pd.Interval dtype, nested dict of string to string mapping works fine.

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : e3bcf8d
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+165.ge3bcf8d19
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.6.0.post20200814
Cython : 0.29.21
pytest : 6.0.1
hypothesis : 5.29.0
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.3.3
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.17.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.8.0
fastparquet : 0.4.1
gcsfs : None
matplotlib : 3.3.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pyxlsb : None
s3fs : 0.2.0
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.0
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

The text was updated successfully, but these errors were encountered:

asishm · 2020-08-27T20:44:13Z

#32107

c6e3e15 is the first bad commit
commit c6e3e15
Author: Daniel Saxton 2658661+dsaxton@users.noreply.github.com
Date: Sat Apr 25 18:58:51 2020 -0500
BUG: Allow addition of Timedelta to Timestamp interval (#32107)

dsaxton · 2020-08-27T21:42:37Z

Hmm, this is apparently due to

pandas/pandas/_libs/interval.pyx

Line 302 in a1f6056

__array_priority__ = 1000

which was added to allow np.timedelta64 + pd.Interval of timestamps in the linked PR: #32107 (comment). Not sure exactly what's happening to be honest but will take a look.

cc @jbrockmendel if any thoughts

dsaxton · 2020-08-28T00:51:03Z

So evidently __array_priority__ = 1000 has the not so nice side effect of no longer broadcasting things like equality comparisons (which is causing the bug in the OP):

In [1]: import numpy as np
   ...: import pandas as pd
   ...:
   ...: arr = np.array([pd.Interval(0, 1), pd.Interval(1, 2)])
   ...: arr == pd.Interval(0, 1)
   ...:
Out[1]: False

Probably this behavior is worse than not being able to add np.timedelta64 + pd.Interval(pd.Timestamp, pd.Timestamp).

simonjayhawkins · 2020-09-01T14:56:21Z

fixed in #35938

agnesbao added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 27, 2020

dsaxton added Interval Interval data type Regression Functionality that used to work in a prior pandas version and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 27, 2020

dsaxton mentioned this issue Aug 28, 2020

REGR: Fix comparison broadcasting over array of Intervals #35938

Merged

5 tasks

dsaxton added this to the 1.1.2 milestone Aug 28, 2020

simonjayhawkins closed this as completed Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: breaking change in df.replace() from 1.0.5 to 1.1.0 #35931

BUG: breaking change in df.replace() from 1.0.5 to 1.1.0 #35931

agnesbao commented Aug 27, 2020

INSTALLED VERSIONS

asishm commented Aug 27, 2020

dsaxton commented Aug 27, 2020

dsaxton commented Aug 28, 2020

simonjayhawkins commented Sep 1, 2020

BUG: breaking change in df.replace() from 1.0.5 to 1.1.0 #35931

BUG: breaking change in df.replace() from 1.0.5 to 1.1.0 #35931

Comments

agnesbao commented Aug 27, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

asishm commented Aug 27, 2020

dsaxton commented Aug 27, 2020

dsaxton commented Aug 28, 2020

simonjayhawkins commented Sep 1, 2020

Output of `pd.show_versions()`