New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: to_csv and multiindex columns with header kw #5539

Closed
jankatins opened this Issue Nov 17, 2013 · 5 comments

Comments

Projects
None yet
2 participants
@jankatins
Contributor

jankatins commented Nov 17, 2013

This used to work (October 2012), but doesn't anymore:

from pandas import DataFrame
import numpy as np
import StringIO
a = ["a","b","a","b","a","b","a","b","a","b","a","b"]
b = ["c","d","e","c","d","e","c","d","e","c","d","e"]
c = [1,2,3,4,5,6,7,8,9,10,11,12]
d = list(reversed(c))
df = DataFrame({"a":a, "b":b, "c":c, "d":d})
_agg_funs = [np.mean, np.std, np.min, np.max]
groupby_variables = ["a","b"]
df_grouped = df.groupby(groupby_variables, as_index=True).agg(_agg_funs)
output = StringIO.StringIO()
df_grouped.to_csv(output, header=[var + "_" + agg for (var, agg) in df_grouped.columns])
index = output.getvalue().split("\n")[0].split(",")
expected_index = groupby_variables + [var + "_" + agg for (var, agg) in df_grouped.columns]
print(index == expected_index) # This was true in October 2012!
print(index)
print(expected_index) 

False
['', '', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd']
['a', 'b', 'c_mean', 'c_std', 'c_amin', 'c_amax', 'd_mean', 'd_std', 'd_amin', 'd_amax']

Probably related to #3575

@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Nov 17, 2013

Contributor

It seems that "header" is simple ignored :-(

This also does not work:

df_grouped.to_csv(output, header=[var + "_" + agg for (var, agg) in df_grouped.columns], index_label=df_grouped.index.names, index=False)
Contributor

jankatins commented Nov 17, 2013

It seems that "header" is simple ignored :-(

This also does not work:

df_grouped.to_csv(output, header=[var + "_" + agg for (var, agg) in df_grouped.columns], index_label=df_grouped.index.names, index=False)
@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Nov 17, 2013

Contributor

This finally worked:

[...]
cols = [var + "_" + agg for (var, agg) in df_grouped.columns]
df_grouped.columns = cols
output = StringIO.StringIO()
df_grouped.to_csv(output)
index = output.getvalue().split("\n")[0].split(",")
expected_index = groupby_variables + cols
print(index == expected_index)
print(index)
print(expected_index) # And worked in october 2012!

True
['a', 'b', 'c_mean', 'c_std', 'c_amin', 'c_amax', 'd_mean', 'd_std', 'd_amin', 'd_amax']
['a', 'b', 'c_mean', 'c_std', 'c_amin', 'c_amax', 'd_mean', 'd_std', 'd_amin', 'd_amax']
Contributor

jankatins commented Nov 17, 2013

This finally worked:

[...]
cols = [var + "_" + agg for (var, agg) in df_grouped.columns]
df_grouped.columns = cols
output = StringIO.StringIO()
df_grouped.to_csv(output)
index = output.getvalue().split("\n")[0].split(",")
expected_index = groupby_variables + cols
print(index == expected_index)
print(index)
print(expected_index) # And worked in october 2012!

True
['a', 'b', 'c_mean', 'c_std', 'c_amin', 'c_amax', 'd_mean', 'd_std', 'd_amin', 'd_amax']
['a', 'b', 'c_mean', 'c_std', 'c_amin', 'c_amax', 'd_mean', 'd_std', 'd_amin', 'd_amax']
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 18, 2013

Contributor

header only applies to read_csv
Their is an option tupleize_cols which you can set to get the prior 0.12 behavior of writing tuples for the column multi-index if you want (though as of 0.13, its not necessary and turned off)

This is your first example on master

In [31]: output = StringIO.StringIO()

In [32]: df_grouped.to_csv(output)

In [33]: print output.getvalue()
,,c,c,c,c,d,d,d,d
,,mean,std,amin,amax,mean,std,amin,amax
a,b,,,,,,,,
a,c,4,4.242640687119285,1,7,9,4.242640687119285,6,12
a,d,8,4.242640687119285,5,11,5,4.242640687119285,2,8
a,e,6,4.242640687119285,3,9,7,4.242640687119285,4,10
b,c,7,4.242640687119285,4,10,6,4.242640687119285,3,9
b,d,5,4.242640687119285,2,8,8,4.242640687119285,5,11
b,e,9,4.242640687119285,6,12,4,4.242640687119285,1,7


In [34]: pd.read_csv(StringIO.StringIO(output.getvalue()),header=[0,1],index_col=[0,1])
Out[34]: 
        c                           d                      
     mean       std  amin  amax  mean       std  amin  amax
a b                                                        
a c     4  4.242641     1     7     9  4.242641     6    12
  d     8  4.242641     5    11     5  4.242641     2     8
  e     6  4.242641     3     9     7  4.242641     4    10
b c     7  4.242641     4    10     6  4.242641     3     9
  d     5  4.242641     2     8     8  4.242641     5    11
  e     9  4.242641     6    12     4  4.242641     1     7
Contributor

jreback commented Nov 18, 2013

header only applies to read_csv
Their is an option tupleize_cols which you can set to get the prior 0.12 behavior of writing tuples for the column multi-index if you want (though as of 0.13, its not necessary and turned off)

This is your first example on master

In [31]: output = StringIO.StringIO()

In [32]: df_grouped.to_csv(output)

In [33]: print output.getvalue()
,,c,c,c,c,d,d,d,d
,,mean,std,amin,amax,mean,std,amin,amax
a,b,,,,,,,,
a,c,4,4.242640687119285,1,7,9,4.242640687119285,6,12
a,d,8,4.242640687119285,5,11,5,4.242640687119285,2,8
a,e,6,4.242640687119285,3,9,7,4.242640687119285,4,10
b,c,7,4.242640687119285,4,10,6,4.242640687119285,3,9
b,d,5,4.242640687119285,2,8,8,4.242640687119285,5,11
b,e,9,4.242640687119285,6,12,4,4.242640687119285,1,7


In [34]: pd.read_csv(StringIO.StringIO(output.getvalue()),header=[0,1],index_col=[0,1])
Out[34]: 
        c                           d                      
     mean       std  amin  amax  mean       std  amin  amax
a b                                                        
a c     4  4.242641     1     7     9  4.242641     6    12
  d     8  4.242641     5    11     5  4.242641     2     8
  e     6  4.242641     3     9     7  4.242641     4    10
b c     7  4.242641     4    10     6  4.242641     3     9
  d     5  4.242641     2     8     8  4.242641     5    11
  e     9  4.242641     6    12     4  4.242641     1     7
@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Nov 18, 2013

Contributor

I did use master (or something a few days old). I found that very surprising, as my "half a year old code" broke with the newer pandas due to this (I wanted to import that into R).

Also:

String Form:<unbound method DataFrame.to_csv>
[...]
    def to_csv(self, path_or_buf, sep=",", na_rep='', float_format=None,
               cols=None, header=True, index=True, index_label=None,
               mode='w', nanRep=None, encoding=None, quoting=None,
               line_terminator='\n', chunksize=None,
               tupleize_cols=False, date_format=None, **kwds):
        r"""Write DataFrame to a comma-separated values (csv) file

        Parameters
        ----------
[...]
        header : boolean or list of string, default True
            Write out column names. If a list of string is given it is assumed
            to be aliases for the column names

Collab Edit: #4797

Contributor

jankatins commented Nov 18, 2013

I did use master (or something a few days old). I found that very surprising, as my "half a year old code" broke with the newer pandas due to this (I wanted to import that into R).

Also:

String Form:<unbound method DataFrame.to_csv>
[...]
    def to_csv(self, path_or_buf, sep=",", na_rep='', float_format=None,
               cols=None, header=True, index=True, index_label=None,
               mode='w', nanRep=None, encoding=None, quoting=None,
               line_terminator='\n', chunksize=None,
               tupleize_cols=False, date_format=None, **kwds):
        r"""Write DataFrame to a comma-separated values (csv) file

        Parameters
        ----------
[...]
        header : boolean or list of string, default True
            Write out column names. If a list of string is given it is assumed
            to be aliases for the column names

Collab Edit: #4797

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 18, 2013

Contributor

might not be well tested with a column multi index - and is actually very odd in that case anyhow

marking as a bug/API issue for 0.14

Contributor

jreback commented Nov 18, 2013

might not be well tested with a column multi index - and is actually very odd in that case anyhow

marking as a bug/API issue for 0.14

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Mar 28, 2014

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 4, 2017

BUG: Override mi-columns in to_csv if requested
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes pandas-devgh-5539

@jreback jreback modified the milestones: Next Major Release, 0.21.1 Nov 4, 2017

gfyoung added a commit to gfyoung/pandas that referenced this issue Nov 4, 2017

BUG: Override mi-columns in to_csv if requested
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes pandas-devgh-5539

gfyoung added a commit that referenced this issue Nov 5, 2017

BUG: Override mi-columns in to_csv if requested (#18110)
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes gh-5539

1kastner added a commit to 1kastner/pandas that referenced this issue Nov 5, 2017

BUG: Override mi-columns in to_csv if requested (pandas-dev#18110)
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes pandas-devgh-5539

No-Stream added a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

BUG: Override mi-columns in to_csv if requested (pandas-dev#18110)
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes pandas-devgh-5539

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 8, 2017

BUG: Override mi-columns in to_csv if requested (pandas-dev#18110)
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes pandas-devgh-5539

(cherry picked from commit e1f3a70)

TomAugspurger added a commit that referenced this issue Dec 11, 2017

BUG: Override mi-columns in to_csv if requested (#18110)
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes gh-5539

(cherry picked from commit e1f3a70)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment