BUG: .unstack() with recarray column raises TypeError since 1.4.0 #49388

stefan-jansen · 2022-10-29T13:29:59Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd

c = np.array([2] * 20, dtype='f8')
r = np.rec.fromarrays([c], names=['c'])

df = pd.DataFrame({'a': np.arange(20) // 5, 'b': list('ABCDE') * 4, 'c': r})
# df.info() => see GH48526
df.set_index(['a', 'b']).c.unstack()

Issue Description

Starting with 1.4.0, Including a column of dtype np.record(related to #48526) as follows:

c = np.array([2] * 9, dtype='f8')
r = np.rec.fromarrays([c], names=['c'])
df = pd.DataFrame({'a': np.arange(9) // 3,
                   'b': list('ABC') * 3,
                   'c': r})
print(df.dtypes)

a                             int64
b                            object
c    (numpy.record, [('c', '<f8')])

raises a TypeError:

   df.set_index(['a', 'b']).unstack()
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/frame.py", line 9060, in unstack
    result = unstack(self, level, fill_value)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 479, in unstack
    return _unstack_frame(obj, level, fill_value=fill_value)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 508, in _unstack_frame
    return unstacker.get_result(
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 215, in get_result
    values, _ = self.get_new_values(values, fill_value)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 228, in get_new_values
    sorted_values = self._make_sorted_values(values)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/reshape/reshape.py", line 167, in _make_sorted_values
    sorted_values = algos.take_nd(values, indexer, axis=0)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/array_algos/take.py", line 118, in take_nd
    return _take_nd_ndarray(arr, indexer, axis, fill_value, allow_fill)
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/array_algos/take.py", line 135, in _take_nd_ndarray
    dtype, fill_value, mask_info = _take_preprocess_indexer_and_fill_value(
  File "/home/stefan/.pyenv/versions/pd_bug/lib/python3.9/site-packages/pandas/core/array_algos/take.py", line 587, in _take_preprocess_indexer_and_fill_value
    dtype, fill_value = arr.dtype, arr.dtype.type()
TypeError: void() takes exactly 1 positional argument (0 given)

Expected Behavior

Before 1.4.0, the output of the same code shows that the recarray has dtype object and the unstack does not throw an error:

a     int64
b    object
c    object
dtype: object
        c                
b       A       B       C
a                        
0  (2.0,)  (2.0,)  (2.0,)
1  (2.0,)  (2.0,)  (2.0,)
2  (2.0,)  (2.0,)  (2.0,)

Installed Versions

INSTALLED VERSIONS

commit : 91111fd
python : 3.9.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-52-generic
Version : #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.1
numpy : 1.23.4
pytz : 2022.5
dateutil : 2.8.2
setuptools : 65.5.0
pip : 22.3
Cython : 0.29.32
pytest : 6.2.5
hypothesis : None
sphinx : 5.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : 1.3.5
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.1
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.3
snappy : None
sqlalchemy : 1.4.42
tables : 3.7.0
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

The text was updated successfully, but these errors were encountered:

topper-123 · 2022-10-30T15:06:50Z

Yes. I can confirm this error.

Also, just printing you original dataframe gives an error also:

import numpy as np
import pandas as pd

c = np.array([2] * 20, dtype='f8')
r = np.rec.fromarrays([c], names=['c'])

df = pd.DataFrame({'a': np.arange(20) // 5, 'b': list('ABCDE') * 4, 'c': r})
print(df)

gives error:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Want to take a stab at this, @stefan-jansen?

jbrockmendel · 2022-10-30T16:31:23Z

i dont think we support record-dtypes. these should be cast somewhere along the way

topper-123 · 2022-10-30T17:16:35Z

Ok, thanks, makes sense several errors pop up then, when using them:-).

I see instantiating Series using recarrays gives an error:

In [1]:  r = np.rec.fromarrays([c], names=['c'])
In [2]: pd.Series(r)
ValueError: Cannot construct a Series from an ndarray with compound dtype.  Use DataFrame instead.

The case with pd.DataFrame({'c': r}) should logically give a similar error, so I say OP's example should have raised in the DataFrame construction.

topper-123 · 2022-10-31T06:13:44Z

Looking further, we also can't use dataframes (a multidim object) as single columns in a dataframe:

In [1]: df = pd.DataFrame({"a": "a b a b".split(), "b": range(4)})
In [2]: pd.DataFrame({"a": df}, index=df.index)
ValueError: Data must be 1-dimensional

IMO we should disallow single columns being constructed from multidim object (like recarrays and DataFrames), in order to keep things consistent.

@jbrockmendel, do you agree?

stefan-jansen · 2022-10-31T14:10:13Z

It seems like the np.record case worked before 1.4.0 because it was cast to / treated as dtype 'object'.

I couldn't find which change in 1.4.0 caused this change in behavior where np.record would keep its dtype through the constructor, but this may be the source. It appeared to relate to the BlockManager but I didn't have the time to drill deeper. #48637 (the print / .info() error) also appears with np.record but has a different cause.

stefan-jansen added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 29, 2022

topper-123 added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 30, 2022

topper-123 added DataFrame DataFrame data structure Constructors Series/DataFrame/Index/pd.array Constructors and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 30, 2022

topper-123 mentioned this issue Oct 31, 2022

BUG: dataframe construction with dict of recarrays should raise #49405

Closed

5 tasks

topper-123 closed this as completed Oct 31, 2022

topper-123 reopened this Oct 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: .unstack() with recarray column raises TypeError since 1.4.0 #49388

BUG: .unstack() with recarray column raises TypeError since 1.4.0 #49388

stefan-jansen commented Oct 29, 2022

INSTALLED VERSIONS

topper-123 commented Oct 30, 2022 •

edited

jbrockmendel commented Oct 30, 2022

topper-123 commented Oct 30, 2022

topper-123 commented Oct 31, 2022 •

edited

stefan-jansen commented Oct 31, 2022

BUG: .unstack() with recarray column raises TypeError since 1.4.0 #49388

BUG: .unstack() with recarray column raises TypeError since 1.4.0 #49388

Comments

stefan-jansen commented Oct 29, 2022

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

topper-123 commented Oct 30, 2022 • edited

jbrockmendel commented Oct 30, 2022

topper-123 commented Oct 30, 2022

topper-123 commented Oct 31, 2022 • edited

stefan-jansen commented Oct 31, 2022

topper-123 commented Oct 30, 2022 •

edited

topper-123 commented Oct 31, 2022 •

edited