Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Reading and Writing of Multi-Index Columns #3571

Closed
pblelloch opened this issue May 10, 2013 · 8 comments
Closed

Improve Reading and Writing of Multi-Index Columns #3571

pblelloch opened this issue May 10, 2013 · 8 comments
Labels
Enhancement IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@pblelloch
Copy link

link to #1651 (to_csv) and #3141 (read_csv)

Currently (0.11) when the read_csv and to_csv methods handle multi-index row labels fine, but don't do as well with multi-index column labels. For the column index the to_csv method writes them out as a tuple into the 1st row of the CSV file. This reads back in as a tuple. It would be better if it actually wrote out each element of the multi-index as a row of the CSV file and you could then specify a range of rows for the header on read_csv to reconstruct the multi-index column header. I'm thinking of something like "header=[0,1]" to read in the first two rows of the CSV file as a 2 element multi-index column header. What's not clear to me is where you read/write the names of the indices.

@jreback
Copy link
Contributor

jreback commented May 10, 2013

@pblelloch I linked the issues that r open about this
thanks for the commentary

it's really a matter of someone just having time to do it

@cpcloud
Copy link
Member

cpcloud commented May 11, 2013

possibly related #3323.

@cpcloud
Copy link
Member

cpcloud commented May 11, 2013

One issue is that it's not clear when you're reading a csv file whether you just want tuples or u want a MultiIndex (which is why I linked to #3323). (This is ignoring passing additional parameters like head=[0, 1], which is a solution).

@pblelloch
Copy link
Author

I would like to be able to round trip, so that if I write a CSV file with multi-indexed columns and read that back in I get the same multi-indexed columns. In addition it would be nice if the indices (if that’s the correct word) were written to different rows, so that when I read this into something like Excel it looks good. What’s not clear to me is where to write the names of the index in the case where both your row and columns indices have names. Currently the row index names are written to the 1st row, but that doesn’t leave space to write the last column index name. I’m not sure what the answer is to that L.

From: Phillip Cloud [mailto:notifications@github.com]
Sent: Friday, May 10, 2013 5:12 PM
To: pydata/pandas
Cc: Paul Blelloch
Subject: Re: [pandas] Improve Reading and Writing of Multi-Index Columns (#3571)

One issue is that it's not clear when you're reading a csv file whether you just want tuples or u want a MultiIndex (which is why I linked to #3323 #3323 ).


Reply to this email directly or view it on GitHub #3571 (comment) . https://github.com/notifications/beacon/IfmwiDcU97mlVHGwebrm7igu51JLaFYiIAhP9L6hD4-2TvbIrP8fFW1F3iKemvwI.gif

@jreback
Copy link
Contributor

jreback commented May 11, 2013

yes this would be a problem for back compat

header=[0,1] is very clear

a single row of tuples is not, but I think should auto make a mi (or maybe an option for that)

@cpcloud
Copy link
Member

cpcloud commented May 11, 2013

I would prefer to clobber the tuple as not-a-multiindex and just make it one whenever there are tuples (across the board), but it's a back-compat killer like u said and i have no feeling for how common it is to use tuples without using mis. something like row_as_multi=True and col_as_multi=True then? and yes it would be ideal if read_csv and to_csv were inverses, but i'm not sure that's very practical.

@jreback
Copy link
Contributor

jreback commented May 13, 2013

@cpcloud you can look on the actual PR #3575, I added multi_index_column_compat to to_csv and read_csv to handle the tuple clobbering

@jreback
Copy link
Contributor

jreback commented May 19, 2013

closed via #3575

@jreback jreback closed this as completed May 19, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

3 participants