Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why isn't zip compression included for read_csv? #11413

Closed
stoffprof opened this issue Oct 22, 2015 · 5 comments
Closed

Why isn't zip compression included for read_csv? #11413

stoffprof opened this issue Oct 22, 2015 · 5 comments
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@stoffprof
Copy link

Is there a reason why the compression options for read_csv and other IO functions don't include zip?

@jreback
Copy link
Contributor

jreback commented Oct 23, 2015

well you can simply use compression='gzip' to read a zip file, or see here.

I suppose in theory this could be implemented, but easy enough to wrap it.

I'll mark this an an enhancement issue if you'd like to work on it.

@jreback jreback added Enhancement IO Data IO issues that don't fit into a more specific label labels Oct 23, 2015
@jreback jreback added this to the Someday milestone Oct 23, 2015
@stoffprof
Copy link
Author

I thought compression='gzip' reads .gz files, but not .zip. If I understand your suggestion, you're saying just to incorporate the z.open('some_file_name.csv') into pd.read_csv to add a compression='zip' option. I'm fairly new to GitHub (and python), but I could give it a shot.

I suppose this approach would require there to be only one file in the .zip archive, unless a new parameter were to be added to pd.read_csv that allowed the user to specify which file to read from the archive.

@jreback
Copy link
Contributor

jreback commented Oct 23, 2015

IIRC compresssion='gzip' should read a zip file

you can always do

with open(....) as fh:
    pd.read_csv(fh...)

to do it anyhow.

http://pandas.pydata.org/pandas-docs/stable/contributing.html are the contributing docs

@bashtage
Copy link
Contributor

A zip file can contains many csv files, folders, and other files- and is closer to a file system with compression than a compressed file. IMO it isn't practical to add support for zip files in any generic way. The simplest way is for the user to pass a buffer-like object from the zip to read csv.

One could add the special case where a zip contains a single csv file in the root of the zip, but this is far from zip "support".

@jzf2101
Copy link

jzf2101 commented Aug 12, 2017

Right now we can read .zip but not write .zip files, any plans to include write to .zip?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants