BUG: Series.combine() fails with ExtensionArray inside of Series #20825

Dr-Irv · 2018-04-25T21:24:18Z

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: from pandas.tests.extension.decimal.array import DecimalArray, make_dat
   ...: a

In [3]: da1= make_data()
   ...: da2= make_data()
   ...:

In [4]: s1 = pd.Series(DecimalArray(da1))
   ...: s2 = pd.Series(DecimalArray(da2))
   ...:

In [5]: s1.head(), s2.head()
Out[5]:
(0    0.57581534881735985109685316274408251047134399...
 1    0.05647135567908745379384072293760254979133605...
 2    0.41049738961593973396446699553052894771099090...
 3    0.13724377491342376611527242857846431434154510...
 4    0.24154934068629707599740186196868307888507843...
 dtype: decimal, 0    0.40855027024154888515283801098121330142021179...
 1    0.21243084028671055385473209753399714827537536...
 2    0.15218065149055393092680787958670407533645629...
 3    0.87747422249812989658579454044229350984096527...
 4    0.53991488184898328572813852588296867907047271...
 dtype: decimal)

In [6]: s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-14abc20f0095> in <module>()
----> 1 s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)

C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\series.py in combine(self, other, func, fill_value)
   2220             new_index = self.index.union(other.index)
   2221             new_name = ops.get_op_result_name(self, other)
-> 2222             new_values = np.empty(len(new_index), dtype=self.dtype)
   2223             for i, idx in enumerate(new_index):
   2224                 lv = self.get(idx, fill_value)

TypeError: data type not understood

Problem description

The Series.combine() method uses numpy.empty with the dtype of the ExtensionArray, and numpy isn't happy with that.

Note: This also happens with Categorical in v0.22 and in master:

In [3]: cat1 = pd.Categorical(values=["one","two","three","three","two","one"],
   ...:  categories=["one","two","three"], ordered=True)
   ...: cat2 = pd.Categorical(values=["three","two","one","one","two","three"],
   ...:  categories=["one","two","three"], ordered=True)
   ...: s1 = pd.Series(cat1)
   ...: s2 = pd.Series(cat2)
   ...: s1, s2
   ...:
Out[3]:
(0      one
 1      two
 2    three
 3    three
 4      two
 5      one
 dtype: category
 Categories (3, object): [one < two < three], 0    three
 1      two
 2      one
 3      one
 4      two
 5    three
 dtype: category
 Categories (3, object): [one < two < three])

In [4]: s1.combine(s2, lambda x1, x2: x1 <= x2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-b597231c2d3c> in <module>()
----> 1 s1.combine(s2, lambda x1, x2: x1 <= x2)

C:\Anaconda3\lib\site-packages\pandas\core\series.py in combine(self, other, func, fill_value)
   1768             new_index = self.index.union(other.index)
   1769             new_name = _maybe_match_name(self, other)
-> 1770             new_values = np.empty(len(new_index), dtype=self.dtype)
   1771             for i, idx in enumerate(new_index):
   1772                 lv = self.get(idx, fill_value)

TypeError: data type not understood

NOTE: I will look into fixing this as part of my attempt to get ops() working for ExtensionArray

Expected Output

A Series of True and False values.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: 60fe82c
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0.dev0+799.g60fe82c8a
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.25.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Apr 26, 2018

TomAugspurger added this to the 0.23.1 milestone Apr 26, 2018

Dr-Irv mentioned this issue Apr 30, 2018

ENH: Support operators for ExtensionArray #20889

Closed

4 tasks

Dr-Irv mentioned this issue May 23, 2018

BUG: Series.combine() fails with ExtensionArray inside of Series #21183

Merged

4 tasks

jreback modified the milestones: 0.23.1, 0.24.0 Jun 5, 2018

jreback closed this as completed in #21183 Jun 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Series.combine() fails with ExtensionArray inside of Series #20825

BUG: Series.combine() fails with ExtensionArray inside of Series #20825

Dr-Irv commented Apr 25, 2018

INSTALLED VERSIONS

BUG: Series.combine() fails with ExtensionArray inside of Series #20825

BUG: Series.combine() fails with ExtensionArray inside of Series #20825

Comments

Dr-Irv commented Apr 25, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

Output of `pd.show_versions()`