ENH/API: Allow dicts with tuple keys in DataFrame constructor #3323

cpcloud · 2013-04-11T23:20:57Z

related #4805

It would be nice to allow automatic conversion to a MultiIndex when a dict with tuple keys is passed into the DataFrame constructor. Here's what currently happens:

from pandas import Index, DataFrame
from numpy.random import rand
import itertools as itools
d = {(i, j): rand(10) for i, j in itools.product(xrange(3), repeat=2)}
df = DataFrame(d)
assert type(df.columns) == Index

The same issue shows up in pd.concat when you pass in a dict of sequences (lists and ndarrays and friends) and axis=1, however if you have a dict of DataFrames the columns keys are converted to a MultiIndex. E.g.,

from pandas import MultiIndex
d = {(i, j): DataFrame(rand(10, 2), columns=['a', 'b']) for i, j in itools.product(xrange(3), repeat=2)}
df = pd.concat(d, axis=1)
assert type(df.columns) == MultiIndex

The text was updated successfully, but these errors were encountered:

ghost · 2013-04-12T08:28:08Z

This looks like a natural extension to me, marked for consideration in 0.12.

cpcloud · 2013-04-12T14:46:35Z

I haven't fully tested it yet, but it looks like changing the call to Index in the _init_dict method to MultiIndex should do the trick since MultiIndex seems like it will construct the appropriate 1D index when necessary.

cpcloud · 2013-04-12T14:49:54Z

Whoops that's not entirely precise: it should be a call to one of the class method helpers.

ghost · 2013-04-12T14:50:42Z

Just need to consider the case where users want tuple labels for some reason.
Don't think that's a common case, but someone might be doint it and this change
would make that behaviour impossible.

pd.cut() style bins labels, like in Categorical are a related example though.

cpcloud · 2013-04-12T15:02:01Z

I'm not sure how to address that case without something annoying like a flag in the constructor. I guess my point in raising the issue was for cases like that. It seems like an Index of tuples and a MultiIndex are equivalent in the sense that all(index.values == multiindex.values) and thus the user-facing API should be as similar as possible. What can one do with an index of tuples that one can't do with a multiindex?

cpcloud · 2013-04-12T15:10:59Z

One difference is that the size attribute of MultiIndex is broken. An Index of tuples returns the correct size while a MultiIndex always returns 0 when the size attribute is queried, but that's easy to fix.

ghost · 2013-04-12T15:29:03Z

it's a question of backcompat mostly, tupes are valid labels right now,
interpreting tuples as levels would be a breaking change. maybe worth it.

There's nothing inherent in tuples that makes them mean levels. it's
just the semantics pandas can adopt, or not. Obiously MultiIndex representing
it's level labels as label tuples in some cases makes it reasonable to do that.

cpcloud · 2013-04-12T16:26:53Z

To implement this behavior for both the DataFrame constructor and concat it looks like having the __new__ method of Index call the MultiIndex.from_tuples class method for sequences of sequences of equal length is the most parsimonious solution since concat uses merge under the hood and merge makes many calls to Index. This would require the fewest changes to the code and (I think) only in one place.

cpcloud · 2013-04-12T16:36:31Z

This would also solve the issue that when you call Index on a MultiIndex it returns an Index of tuples.

cpcloud · 2013-05-04T00:04:23Z

Is this worth keeping open? I'll hack on it if it is, but as @y-p said there's no reason tuples must be multilevel indices, they just happen to be implemented that way. I'm just not sure if this is too big of a breaking change.

ghost · 2013-05-04T00:20:02Z

making this possible would be go, but the constructor is overloaded to
the point of bursting, I can't think of a reasonable way to do this without breaking
back-compat. Leave it open, it's worth figuring out.

hayd · 2013-07-10T11:48:47Z

Same for multiindex to DataFrame (not working atm)

m = pd.MultiIndex.from_arrays([[1,2], [3,4]])

In [11]: pd.DataFrame(m)
Out[11]:
Empty DataFrame
Columns: [0]
Index: []

As @cpcloud mentions it's m.values doesn't work "as expected" (especially shape):

In [20]: m.values
Out[20]: array([(1, 3), (2, 4)], dtype=object)

In [21]: pd.DataFrame(m.values)
Out[21]:
        0
0  (1, 3)
1  (2, 4)

In [22]: m.shape
Out[22]: (0,)

Possibly related #4187

cpcloud · 2013-08-08T00:00:53Z

i think the numpy attribute consistency can be added in a separate PR

jreback · 2013-10-11T11:44:33Z

pushing to 0.14

jreback · 2014-04-09T23:32:24Z

closed via #4805

cpcloud mentioned this issue May 11, 2013

Improve Reading and Writing of Multi-Index Columns #3571

Closed

jreback mentioned this issue Jul 10, 2013

Create DataFrame from MultiIndex #4188

Closed

cpcloud mentioned this issue Sep 10, 2013

ENH: Series constructor converts dicts with tuples in keys to MultiIndex #4805

Closed

jreback closed this as completed Apr 9, 2014

armaganthis3 mentioned this issue Jul 1, 2014

read_csv in Pandas 0.14 loads NaNs when namedtuple is used for column names #7589

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH/API: Allow dicts with tuple keys in DataFrame constructor #3323

ENH/API: Allow dicts with tuple keys in DataFrame constructor #3323

cpcloud commented Apr 11, 2013

ghost commented Apr 12, 2013

cpcloud commented Apr 12, 2013

cpcloud commented Apr 12, 2013

ghost commented Apr 12, 2013

cpcloud commented Apr 12, 2013

cpcloud commented Apr 12, 2013

ghost commented Apr 12, 2013

cpcloud commented Apr 12, 2013

cpcloud commented Apr 12, 2013

cpcloud commented May 4, 2013

ghost commented May 4, 2013

hayd commented Jul 10, 2013

cpcloud commented Aug 8, 2013

jreback commented Oct 11, 2013

jreback commented Apr 9, 2014

ENH/API: Allow dicts with tuple keys in DataFrame constructor #3323

ENH/API: Allow dicts with tuple keys in DataFrame constructor #3323

Comments

cpcloud commented Apr 11, 2013

ghost commented Apr 12, 2013

cpcloud commented Apr 12, 2013

cpcloud commented Apr 12, 2013

ghost commented Apr 12, 2013

cpcloud commented Apr 12, 2013

cpcloud commented Apr 12, 2013

ghost commented Apr 12, 2013

cpcloud commented Apr 12, 2013

cpcloud commented Apr 12, 2013

cpcloud commented May 4, 2013

ghost commented May 4, 2013

hayd commented Jul 10, 2013

cpcloud commented Aug 8, 2013

jreback commented Oct 11, 2013

jreback commented Apr 9, 2014