Improve Reading and Writing of Multi-Index Columns #3571

Closed
pblelloch opened this Issue May 10, 2013 · 8 comments

Comments

Projects
None yet
3 participants

link to #1651 (to_csv) and #3141 (read_csv)

Currently (0.11) when the read_csv and to_csv methods handle multi-index row labels fine, but don't do as well with multi-index column labels. For the column index the to_csv method writes them out as a tuple into the 1st row of the CSV file. This reads back in as a tuple. It would be better if it actually wrote out each element of the multi-index as a row of the CSV file and you could then specify a range of rows for the header on read_csv to reconstruct the multi-index column header. I'm thinking of something like "header=[0,1]" to read in the first two rows of the CSV file as a 2 element multi-index column header. What's not clear to me is where you read/write the names of the indices.

Contributor

jreback commented May 10, 2013

@pblelloch I linked the issues that r open about this
thanks for the commentary

it's really a matter of someone just having time to do it

Member

cpcloud commented May 11, 2013

possibly related #3323.

Member

cpcloud commented May 11, 2013

One issue is that it's not clear when you're reading a csv file whether you just want tuples or u want a MultiIndex (which is why I linked to #3323). (This is ignoring passing additional parameters like head=[0, 1], which is a solution).

I would like to be able to round trip, so that if I write a CSV file with multi-indexed columns and read that back in I get the same multi-indexed columns. In addition it would be nice if the indices (if that’s the correct word) were written to different rows, so that when I read this into something like Excel it looks good. What’s not clear to me is where to write the names of the index in the case where both your row and columns indices have names. Currently the row index names are written to the 1st row, but that doesn’t leave space to write the last column index name. I’m not sure what the answer is to that L.

From: Phillip Cloud [mailto:notifications@github.com]
Sent: Friday, May 10, 2013 5:12 PM
To: pydata/pandas
Cc: Paul Blelloch
Subject: Re: [pandas] Improve Reading and Writing of Multi-Index Columns (#3571)

One issue is that it's not clear when you're reading a csv file whether you just want tuples or u want a MultiIndex (which is why I linked to #3323 pydata#3323 ).


Reply to this email directly or view it on GitHub pydata#3571 (comment) . https://github.com/notifications/beacon/IfmwiDcU97mlVHGwebrm7igu51JLaFYiIAhP9L6hD4-2TvbIrP8fFW1F3iKemvwI.gif

Contributor

jreback commented May 11, 2013

yes this would be a problem for back compat

header=[0,1] is very clear

a single row of tuples is not, but I think should auto make a mi (or maybe an option for that)

Member

cpcloud commented May 11, 2013

I would prefer to clobber the tuple as not-a-multiindex and just make it one whenever there are tuples (across the board), but it's a back-compat killer like u said and i have no feeling for how common it is to use tuples without using mis. something like row_as_multi=True and col_as_multi=True then? and yes it would be ideal if read_csv and to_csv were inverses, but i'm not sure that's very practical.

Contributor

jreback commented May 13, 2013

@cpcloud you can look on the actual PR #3575, I added multi_index_column_compat to to_csv and read_csv to handle the tuple clobbering

Contributor

jreback commented May 19, 2013

closed via #3575

jreback closed this May 19, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment