Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.combine() fails with ExtensionArray inside of Series #20825

Closed
Dr-Irv opened this issue Apr 25, 2018 · 0 comments

Comments

Projects
None yet
3 participants
@Dr-Irv
Copy link
Contributor

commented Apr 25, 2018

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: from pandas.tests.extension.decimal.array import DecimalArray, make_dat
   ...: a

In [3]: da1= make_data()
   ...: da2= make_data()
   ...:

In [4]: s1 = pd.Series(DecimalArray(da1))
   ...: s2 = pd.Series(DecimalArray(da2))
   ...:

In [5]: s1.head(), s2.head()
Out[5]:
(0    0.57581534881735985109685316274408251047134399...
 1    0.05647135567908745379384072293760254979133605...
 2    0.41049738961593973396446699553052894771099090...
 3    0.13724377491342376611527242857846431434154510...
 4    0.24154934068629707599740186196868307888507843...
 dtype: decimal, 0    0.40855027024154888515283801098121330142021179...
 1    0.21243084028671055385473209753399714827537536...
 2    0.15218065149055393092680787958670407533645629...
 3    0.87747422249812989658579454044229350984096527...
 4    0.53991488184898328572813852588296867907047271...
 dtype: decimal)

In [6]: s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-14abc20f0095> in <module>()
----> 1 s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)

C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\series.py in combine(self, other, func, fill_value)
   2220             new_index = self.index.union(other.index)
   2221             new_name = ops.get_op_result_name(self, other)
-> 2222             new_values = np.empty(len(new_index), dtype=self.dtype)
   2223             for i, idx in enumerate(new_index):
   2224                 lv = self.get(idx, fill_value)

TypeError: data type not understood

Problem description

The Series.combine() method uses numpy.empty with the dtype of the ExtensionArray, and numpy isn't happy with that.

Note: This also happens with Categorical in v0.22 and in master:

In [3]: cat1 = pd.Categorical(values=["one","two","three","three","two","one"],
   ...:  categories=["one","two","three"], ordered=True)
   ...: cat2 = pd.Categorical(values=["three","two","one","one","two","three"],
   ...:  categories=["one","two","three"], ordered=True)
   ...: s1 = pd.Series(cat1)
   ...: s2 = pd.Series(cat2)
   ...: s1, s2
   ...:
Out[3]:
(0      one
 1      two
 2    three
 3    three
 4      two
 5      one
 dtype: category
 Categories (3, object): [one < two < three], 0    three
 1      two
 2      one
 3      one
 4      two
 5    three
 dtype: category
 Categories (3, object): [one < two < three])

In [4]: s1.combine(s2, lambda x1, x2: x1 <= x2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-b597231c2d3c> in <module>()
----> 1 s1.combine(s2, lambda x1, x2: x1 <= x2)

C:\Anaconda3\lib\site-packages\pandas\core\series.py in combine(self, other, func, fill_value)
   1768             new_index = self.index.union(other.index)
   1769             new_name = _maybe_match_name(self, other)
-> 1770             new_values = np.empty(len(new_index), dtype=self.dtype)
   1771             for i, idx in enumerate(new_index):
   1772                 lv = self.get(idx, fill_value)

TypeError: data type not understood

NOTE: I will look into fixing this as part of my attempt to get ops() working for ExtensionArray

Expected Output

A Series of True and False values.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 60fe82c
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0.dev0+799.g60fe82c8a
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.25.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger TomAugspurger added this to the 0.23.1 milestone Apr 26, 2018

@Dr-Irv Dr-Irv referenced this issue Apr 30, 2018

Closed

ENH: Support operators for ExtensionArray #20889

4 of 4 tasks complete

@jreback jreback modified the milestones: 0.23.1, 0.24.0 Jun 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.