ENH: DataFrame.to_csv support for "compression='gzip'" #7615

francescomalandrino · 2014-06-30T12:49:46Z

the DataFrame.to_csv method seems to accept a "compression" named parameter:

import numpy as np,pandas as pd
data=np.arange(10).reshape(5,2)
df=pd.DataFrame(data,columns=['a','b'])
df.to_csv('test.csv.gz',compression='gzip')

However, the file it creates is not compressed at all:
francesco@i3 ~/Desktop $ cat test.csv.gz
,a,b
0,0,1
1,2,3
2,4,5
3,6,7
4,8,9

How about either (i) actually implementing compression, or at least (ii) raise an error? The current behavior is confusing...

The text was updated successfully, but these errors were encountered:

jreback · 2014-06-30T13:18:20Z

to_csv allows **kwds so arbitrary additional arguments are 'accepted' (this is mainly for compatibility IIRC with some of the other to_* functions which allow this), but ignored. I suppose that could be removed (not sure why it was their in the first place). That said, only arguments in the doc-string are public.

Would accept a pull-request to limit this.

mmautner · 2014-06-30T16:28:52Z

I think this out-of-scope for Pandas--just use this: https://docs.python.org/2/library/gzip.html

please close

francescomalandrino · 2014-06-30T16:35:43Z

But compression='gzip' is accepted (and enacted) in pd.read_csv, which is why I was assuming to_csv behaves the same.

mmautner · 2014-06-30T16:45:35Z

The way you initially phrased the issue suggested that you were just guessing at keyword arguments--'compression' isn't a documented argument so I don't think your confusion is shared by many. You're welcome to submit a pull-request, I don't feel religious about this at all

francescomalandrino · 2014-06-30T16:52:37Z

Sorry, what I meant is:

compression is documented and working for read_csv:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html
compression is not documented and not working for to_csv.

There is no error in the documentation and both (1) and (2) make sense to me.
It's (1) and (2) together, i.e., the fact that to_csv behaves differently from read_csv without telling the user, that seemed a bit inconsistent to me.

Closing on the grounds that I won't be fixing it myself, and probably it's not a proper bug.

mmautner · 2014-06-30T17:32:15Z

Thanks! I definitely didn't mean to antagonize you--agreed that it's an unfortunate inconsistency

jorisvandenbossche · 2014-07-01T07:57:26Z

Would we want this feature, if someone would implement it? If so, we can leave it open marked as an enhancement proposal?

speakerjohnash · 2014-08-22T21:41:09Z

I would also like to_csv to have the same functionality of from_csv.

dhimmel · 2015-05-26T03:32:24Z

+1, a compression argument for DataFrame.to_csv would spare many user headaches.

In Python 3.4, I use the following workaround:

with gzip.open('path_to_file', 'wt') as write_file:
    data_frame.to_csv(write_file)

shoyer · 2015-05-26T03:49:37Z

@dhimmel If you're interested in putting in the work, I think we're still open to a PR to add this feature.

dhimmel · 2015-05-28T17:00:27Z

@shoyer, okay I will keep this in mind. I have a bit to learn first.

rbrito · 2015-10-20T06:38:50Z

Thank you so much for implementing this! Besides the aesthetics POV and fixing the asymmetry between read/write, this is a huge improvement to some people like me.

jsmedmar · 2016-06-30T16:03:07Z

This did not work for me, the output file isn't compressed. I'm using Pandas 0.18.1

TomAugspurger · 2016-06-30T19:49:21Z

@jsmedmar could you open a new issue with that demonstrating the problem? Thanks.

indera · 2016-10-18T14:46:48Z

@jsmedmar I see the "compression" argument is properly documented and it is working
http://pandas.pydata.org/pandas-docs/version/0.19.0/generated/pandas.DataFrame.to_csv.html

One confusing thing is that if you run the following code

import numpy as np
import pandas as pd

data = np.arange(10).reshape(5, 2)
df = pd.DataFrame(data, columns=['a', 'b'])
print(df)
df.to_csv('test.csv.gz', compression='gzip')
"""
   a  b
   0  0  1
   1  2  3
   2  4  5
   3  6  7
   4  8  9
"""

you get a compressed file, but opening it in vim automatically decompresses it, so to verify that compression happened use the "head" command:

$ head test.csv.gz
D5X�test.csv�70

jreback added CSV labels Jun 30, 2014

jreback added this to the Someday milestone Jun 30, 2014

francescomalandrino closed this as completed Jun 30, 2014

jorisvandenbossche changed the title ~~DataFrame.to_csv accepts but does not enact "compression='gzip'"~~ ENH: DataFrame.to_csv support for "compression='gzip'" Jul 1, 2014

jorisvandenbossche added Enhancement and removed API Design labels Jul 1, 2014

jorisvandenbossche reopened this Jul 1, 2014

jreback mentioned this issue Jul 3, 2014

read_csv/to_csv sep/delimiter inconsistency #7662

Closed

rmorgans mentioned this issue Mar 1, 2015

API: read_csv,from_csv/to_csv keyword consistency #9568

Closed

5 tasks

jreback mentioned this issue Oct 2, 2015

added a compression argument to to_csv to be sent to _get_handle #2636

Closed

yoavram mentioned this issue Oct 2, 2015

ENH: added compression kw to to_csv GH7615 #11219

Merged

jreback modified the milestones: 0.17.1, Someday Oct 5, 2015

jreback closed this as completed in #11219 Oct 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: DataFrame.to_csv support for "compression='gzip'" #7615

ENH: DataFrame.to_csv support for "compression='gzip'" #7615

francescomalandrino commented Jun 30, 2014

jreback commented Jun 30, 2014

mmautner commented Jun 30, 2014

francescomalandrino commented Jun 30, 2014

mmautner commented Jun 30, 2014

francescomalandrino commented Jun 30, 2014

mmautner commented Jun 30, 2014

jorisvandenbossche commented Jul 1, 2014

speakerjohnash commented Aug 22, 2014

dhimmel commented May 26, 2015

shoyer commented May 26, 2015

dhimmel commented May 28, 2015

rbrito commented Oct 20, 2015

jsmedmar commented Jun 30, 2016

TomAugspurger commented Jun 30, 2016

indera commented Oct 18, 2016

ENH: DataFrame.to_csv support for "compression='gzip'" #7615

ENH: DataFrame.to_csv support for "compression='gzip'" #7615

Comments

francescomalandrino commented Jun 30, 2014

jreback commented Jun 30, 2014

mmautner commented Jun 30, 2014

francescomalandrino commented Jun 30, 2014

mmautner commented Jun 30, 2014

francescomalandrino commented Jun 30, 2014

mmautner commented Jun 30, 2014

jorisvandenbossche commented Jul 1, 2014

speakerjohnash commented Aug 22, 2014

dhimmel commented May 26, 2015

shoyer commented May 26, 2015

dhimmel commented May 28, 2015

rbrito commented Oct 20, 2015

jsmedmar commented Jun 30, 2016

TomAugspurger commented Jun 30, 2016

indera commented Oct 18, 2016