Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.combine() fails with ExtensionArray inside of Series #20825

Closed
Dr-Irv opened this issue Apr 25, 2018 · 0 comments · Fixed by #21183
Closed

BUG: Series.combine() fails with ExtensionArray inside of Series #20825

Dr-Irv opened this issue Apr 25, 2018 · 0 comments · Fixed by #21183
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Milestone

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 25, 2018

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: from pandas.tests.extension.decimal.array import DecimalArray, make_dat
   ...: a

In [3]: da1= make_data()
   ...: da2= make_data()
   ...:

In [4]: s1 = pd.Series(DecimalArray(da1))
   ...: s2 = pd.Series(DecimalArray(da2))
   ...:

In [5]: s1.head(), s2.head()
Out[5]:
(0    0.57581534881735985109685316274408251047134399...
 1    0.05647135567908745379384072293760254979133605...
 2    0.41049738961593973396446699553052894771099090...
 3    0.13724377491342376611527242857846431434154510...
 4    0.24154934068629707599740186196868307888507843...
 dtype: decimal, 0    0.40855027024154888515283801098121330142021179...
 1    0.21243084028671055385473209753399714827537536...
 2    0.15218065149055393092680787958670407533645629...
 3    0.87747422249812989658579454044229350984096527...
 4    0.53991488184898328572813852588296867907047271...
 dtype: decimal)

In [6]: s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-14abc20f0095> in <module>()
----> 1 s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)

C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\series.py in combine(self, other, func, fill_value)
   2220             new_index = self.index.union(other.index)
   2221             new_name = ops.get_op_result_name(self, other)
-> 2222             new_values = np.empty(len(new_index), dtype=self.dtype)
   2223             for i, idx in enumerate(new_index):
   2224                 lv = self.get(idx, fill_value)

TypeError: data type not understood

Problem description

The Series.combine() method uses numpy.empty with the dtype of the ExtensionArray, and numpy isn't happy with that.

Note: This also happens with Categorical in v0.22 and in master:

In [3]: cat1 = pd.Categorical(values=["one","two","three","three","two","one"],
   ...:  categories=["one","two","three"], ordered=True)
   ...: cat2 = pd.Categorical(values=["three","two","one","one","two","three"],
   ...:  categories=["one","two","three"], ordered=True)
   ...: s1 = pd.Series(cat1)
   ...: s2 = pd.Series(cat2)
   ...: s1, s2
   ...:
Out[3]:
(0      one
 1      two
 2    three
 3    three
 4      two
 5      one
 dtype: category
 Categories (3, object): [one < two < three], 0    three
 1      two
 2      one
 3      one
 4      two
 5    three
 dtype: category
 Categories (3, object): [one < two < three])

In [4]: s1.combine(s2, lambda x1, x2: x1 <= x2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-b597231c2d3c> in <module>()
----> 1 s1.combine(s2, lambda x1, x2: x1 <= x2)

C:\Anaconda3\lib\site-packages\pandas\core\series.py in combine(self, other, func, fill_value)
   1768             new_index = self.index.union(other.index)
   1769             new_name = _maybe_match_name(self, other)
-> 1770             new_values = np.empty(len(new_index), dtype=self.dtype)
   1771             for i, idx in enumerate(new_index):
   1772                 lv = self.get(idx, fill_value)

TypeError: data type not understood

NOTE: I will look into fixing this as part of my attempt to get ops() working for ExtensionArray

Expected Output

A Series of True and False values.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 60fe82c
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0.dev0+799.g60fe82c8a
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.25.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Apr 26, 2018
@TomAugspurger TomAugspurger added this to the 0.23.1 milestone Apr 26, 2018
@jreback jreback modified the milestones: 0.23.1, 0.24.0 Jun 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants