BUG: to_records() fails for MultiIndex DF (#21064) #21082

fersarr · 2018-05-16T12:10:05Z

closes BUG: to_records() fails for empty MultiIndex #21064
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This fixes the bug that prevents using to_records on an empty dataframe that has a MultiIndex

self.index.values returns an empty array (no tuples) for an empty MultiIndex DF and that would be used for ix_values. Instead, `ix_values should have an array of empty arrays, each one with the correct dtype according to the MultiIndex level/column.

fersarr · 2018-05-16T12:36:35Z

pandas/tests/frame/test_convert_to.py

@@ -328,3 +328,26 @@ def test_to_dict_index_dtypes(self, into, expected):
        result = DataFrame.from_dict(result, orient='index')[cols]
        expected = DataFrame.from_dict(expected, orient='index')[cols]
        tm.assert_frame_equal(result, expected)
+
+    def test_to_records_with_multiindex(self):


I guess I could remove this test test_to_records_with_multiindex since test_to_records_index_name is already there on line 130 and deals with multi index?

codecov · 2018-05-16T13:35:37Z

Codecov Report

Merging #21082 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21082      +/-   ##
==========================================
+ Coverage   91.84%   91.84%   +<.01%     
==========================================
  Files         153      153              
  Lines       49505    49508       +3     
==========================================
+ Hits        45466    45469       +3     
  Misses       4039     4039

Flag	Coverage Δ
#multiple	`90.23% <100%> (ø)`	⬆️
#single	`41.88% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.23% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1abfd1b...f89e883. Read the comment docs.

chris-b1 · 2018-05-16T13:52:06Z

Seems reasonable to me, @toobaz, want to have a look?

jreback · 2018-05-17T00:05:52Z

pandas/tests/frame/test_convert_to.py

+        df = DataFrame(np.random.randn(size, size), index=index)
+
+        records = df.to_records(index=True)
+        assert len(records) == size


construct a recarray and compare using assert_numpy_array_equal

jreback · 2018-05-17T00:06:00Z

pandas/tests/frame/test_convert_to.py

+        multi = MultiIndex([['a'], ['b']], labels=[[], []])
+        df = DataFrame(columns=['A'], index=multi)
+
+        records = df.to_records(index=True)


jreback · 2018-05-17T00:09:27Z

pandas/core/frame.py

@@ -1392,7 +1392,13 @@ def to_records(self, index=True, convert_datetime64=None):
            else:
                if isinstance(self.index, MultiIndex):
                    # array of tuples to numpy cols. copy copy copy
-                    ix_vals = lmap(np.array, zip(*self.index.values))
+                    tuples = self.index.values


so rather than doing this here, call .tolist() on this and fix the underlying code, like this:

In [30]: pd.MultiIndex.from_product([list('abc'), range(2)]).tolist() Out[30]: [('a', 0), ('a', 1), ('b', 0), ('b', 1), ('c', 0), ('c', 1)] In [31]: pd.MultiIndex.from_product([list('abc'), range(2)])[0:0].tolist() Out[31]: []

so [31] should be a nested list arrays of the correct length/dtype

this might not work, but let's try

@jreback I gave this a go, but it seems a bit of an uphill fight:

When you do tolist() on the index, it goes to

pandas/pandas/core/base.py

Line 893 in cc8d33e

def tolist(self):

which seems to be used for many other classes. Changing this for the MultiIndex breaks the others, I guess you can check the type or define it in the MultiIndex class, but might break other usages of MultiIndex. Were you thinking of adding a new tolist() to the MultiIndex class?

Also, just so you know, I modified the tests as requested

yes you can override tolist() (its defined in pandas/core/base) in MultiIndex

jreback · 2018-05-23T11:07:16Z

pandas/core/frame.py

@@ -1392,7 +1392,13 @@ def to_records(self, index=True, convert_datetime64=None):
            else:
                if isinstance(self.index, MultiIndex):
                    # array of tuples to numpy cols. copy copy copy
-                    ix_vals = lmap(np.array, zip(*self.index.values))
+                    tuples = self.index.values


yes you can override tolist() (its defined in pandas/core/base) in MultiIndex

jreback · 2018-09-25T14:14:12Z

can you rebase

jreback · 2018-11-23T03:26:05Z

can you rebase

jreback · 2018-12-03T01:43:49Z

closing as stale. if you'd like to continue, pls ping.

fersarr mentioned this pull request May 16, 2018

BUG: to_records() fails for empty MultiIndex #21064

Open

fersarr commented May 16, 2018

View reviewed changes

jreback requested changes May 17, 2018

View reviewed changes

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode MultiIndex labels May 17, 2018

fersarr force-pushed the master branch from 81d32b8 to 6dc5a87 Compare May 22, 2018 14:55

BUG: to_records() fails for MultiIndex DF (pandas-dev#21064)

f89e883

fersarr force-pushed the master branch from 6dc5a87 to f89e883 Compare May 23, 2018 11:02

jreback requested changes May 23, 2018

View reviewed changes

jreback closed this Dec 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: to_records() fails for MultiIndex DF (#21064) #21082

BUG: to_records() fails for MultiIndex DF (#21064) #21082

fersarr commented May 16, 2018

fersarr May 16, 2018

codecov bot commented May 16, 2018 •

edited

chris-b1 commented May 16, 2018

jreback May 17, 2018

jreback May 17, 2018

jreback May 17, 2018

fersarr May 22, 2018 •

edited

jreback May 23, 2018

jreback May 23, 2018

jreback commented Sep 25, 2018

jreback commented Nov 23, 2018

jreback commented Dec 3, 2018

BUG: to_records() fails for MultiIndex DF (#21064) #21082

BUG: to_records() fails for MultiIndex DF (#21064) #21082

Conversation

fersarr commented May 16, 2018

fersarr May 16, 2018

Choose a reason for hiding this comment

codecov bot commented May 16, 2018 • edited

Codecov Report

chris-b1 commented May 16, 2018

jreback May 17, 2018

Choose a reason for hiding this comment

jreback May 17, 2018

Choose a reason for hiding this comment

jreback May 17, 2018

Choose a reason for hiding this comment

fersarr May 22, 2018 • edited

Choose a reason for hiding this comment

jreback May 23, 2018

Choose a reason for hiding this comment

jreback May 23, 2018

Choose a reason for hiding this comment

jreback commented Sep 25, 2018

jreback commented Nov 23, 2018

jreback commented Dec 3, 2018

codecov bot commented May 16, 2018 •

edited

fersarr May 22, 2018 •

edited