Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_records() fails for empty MultiIndex #21064

Open
fersarr opened this issue May 15, 2018 · 2 comments
Open

BUG: to_records() fails for empty MultiIndex #21064

fersarr opened this issue May 15, 2018 · 2 comments

Comments

@fersarr
Copy link
Contributor

fersarr commented May 15, 2018

Calling to_records() on an empty dataframe after a group-by results in a ValueError:

In [1]: import pandas as pd
In [2]: mydf = pd.DataFrame({'A':[10,-10], 'B':[400,400], 'C':["bla","bla"]})
In [3]: new_pd2 = pd.DataFrame(mydf.groupby(['B','C'])['A'].agg(lambda x: x.sum()))
In [4]: new_pd3 = new_pd2[new_pd2['A'] != 0 ]
In [5]: new_pd3.to_records()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-ce958ec44125> in <module>()
----> 1 new_pd3.to_records()

/.../lib/python2.7/site-packages/pandas-0.22.0+ahl1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in to_records(self, index, convert_datetime64)
   1233         return np.rec.fromarrays(
   1234             arrays,
-> 1235             dtype={'names': names, 'formats': formats}
   1236         )
   1237 

/.../.eggs/numpy-1.14.2-py2.7-linux-x86_64.egg/numpy/core/records.pyc in fromarrays(arrayList, dtype, shape, formats, names, titles, aligned, byteorder)
    605 
    606     if dtype is not None:
--> 607         descr = sb.dtype(dtype)
    608         _names = descr.names
    609     else:

ValueError: 'names', 'formats', 'offsets', and 'titles' dict entries must have the same length

Problem description

Calling to_records() on an empty dataframe after a group-by results in a ValueError. I expect this to not throw and return the records.

pandas==0.22.0
numpy==1.14.2

The same code above without column 'C' (using 2 columns) works fine.

The problem is here:
https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L1419

return np.rec.fromarrays(
            arrays,
            dtype={'names': names, 'formats': formats}
        )

where names and formats have different lengths and make numpy upset.

This seems to work in pandas==0.16.2 and returns the following

In [5]: new_pd3.to_records()
Out[5]: 
rec.array([], 
      dtype=[('B', '<i8')])
@chris-b1
Copy link
Contributor

In particular the issue seems to be an empty MultiIndex

In [35]: df = pd.DataFrame(columns=['A'], index=pd.MultiIndex([['a'], ['b']], labels=[[], []]))

In [36]: df.to_records()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

You can workaround this with index=False, but would also welcome a PR to fix.

In [39]: df.to_records(index=False)
Out[39]: 
rec.array([],
          dtype=[('A', 'O')])

@chris-b1 chris-b1 changed the title to_records() fails for empty dataframes after groupby BUG: to_records() fails for empty MultiIndex May 15, 2018
fersarr pushed a commit to fersarr/pandas that referenced this issue May 16, 2018
@fersarr
Copy link
Contributor Author

fersarr commented May 16, 2018

@chris-b1 giving it a go here, but Im not very familiar with MultiIndex
#21082

@jreback jreback added this to the Next Major Release milestone May 17, 2018
fersarr pushed a commit to fersarr/pandas that referenced this issue May 22, 2018
fersarr pushed a commit to fersarr/pandas that referenced this issue May 23, 2018
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants