Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_records() fails for MultiIndex DF (#21064) #21082

Closed
wants to merge 1 commit into from

Conversation

fersarr
Copy link
Contributor

@fersarr fersarr commented May 16, 2018

This fixes the bug that prevents using to_records on an empty dataframe that has a MultiIndex

self.index.values returns an empty array (no tuples) for an empty MultiIndex DF and that would be used for ix_values. Instead, `ix_values should have an array of empty arrays, each one with the correct dtype according to the MultiIndex level/column.

@@ -328,3 +328,26 @@ def test_to_dict_index_dtypes(self, into, expected):
result = DataFrame.from_dict(result, orient='index')[cols]
expected = DataFrame.from_dict(expected, orient='index')[cols]
tm.assert_frame_equal(result, expected)

def test_to_records_with_multiindex(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I could remove this test test_to_records_with_multiindex since test_to_records_index_name is already there on line 130 and deals with multi index?

@codecov
Copy link

codecov bot commented May 16, 2018

Codecov Report

Merging #21082 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21082      +/-   ##
==========================================
+ Coverage   91.84%   91.84%   +<.01%     
==========================================
  Files         153      153              
  Lines       49505    49508       +3     
==========================================
+ Hits        45466    45469       +3     
  Misses       4039     4039
Flag Coverage Δ
#multiple 90.23% <100%> (ø) ⬆️
#single 41.88% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.23% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1abfd1b...f89e883. Read the comment docs.

@chris-b1
Copy link
Contributor

Seems reasonable to me, @toobaz, want to have a look?

df = DataFrame(np.random.randn(size, size), index=index)

records = df.to_records(index=True)
assert len(records) == size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct a recarray and compare using assert_numpy_array_equal

multi = MultiIndex([['a'], ['b']], labels=[[], []])
df = DataFrame(columns=['A'], index=multi)

records = df.to_records(index=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -1392,7 +1392,13 @@ def to_records(self, index=True, convert_datetime64=None):
else:
if isinstance(self.index, MultiIndex):
# array of tuples to numpy cols. copy copy copy
ix_vals = lmap(np.array, zip(*self.index.values))
tuples = self.index.values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so rather than doing this here, call .tolist() on this and fix the underlying code, like this:

In [30]: pd.MultiIndex.from_product([list('abc'), range(2)]).tolist()
Out[30]: [('a', 0), ('a', 1), ('b', 0), ('b', 1), ('c', 0), ('c', 1)]

In [31]: pd.MultiIndex.from_product([list('abc'), range(2)])[0:0].tolist()
Out[31]: []

so [31] should be a nested list arrays of the correct length/dtype

this might not work, but let's try

Copy link
Contributor Author

@fersarr fersarr May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I gave this a go, but it seems a bit of an uphill fight:

When you do tolist() on the index, it goes to

def tolist(self):

which seems to be used for many other classes. Changing this for the MultiIndex breaks the others, I guess you can check the type or define it in the MultiIndex class, but might break other usages of MultiIndex. Were you thinking of adding a new tolist() to the MultiIndex class?

Also, just so you know, I modified the tests as requested

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you can override tolist() (its defined in pandas/core/base) in MultiIndex

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode MultiIndex labels May 17, 2018
@@ -1392,7 +1392,13 @@ def to_records(self, index=True, convert_datetime64=None):
else:
if isinstance(self.index, MultiIndex):
# array of tuples to numpy cols. copy copy copy
ix_vals = lmap(np.array, zip(*self.index.values))
tuples = self.index.values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you can override tolist() (its defined in pandas/core/base) in MultiIndex

@jreback
Copy link
Contributor

jreback commented Sep 25, 2018

can you rebase

1 similar comment
@jreback
Copy link
Contributor

jreback commented Nov 23, 2018

can you rebase

@jreback
Copy link
Contributor

jreback commented Dec 3, 2018

closing as stale. if you'd like to continue, pls ping.

@jreback jreback closed this Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: to_records() fails for empty MultiIndex
3 participants