ENH Enable bzip2 streaming for Python 3 #11072

Merged
merged 1 commit into from Sep 13, 2015

Conversation

Projects
None yet
2 participants

This is the one modification related to issue #11070 which affects non-S3 interactions with read_csv. The Python 3 standard library has an improved capability for handling bz2 compression, so a simple change will let read_csv stream bz2-compressed files.

Contributor

jreback commented Sep 12, 2015

tests!

I added a test for reading from an open file with the C parser. It fails on the master branch and passes here. How's that?

Contributor

jreback commented Sep 12, 2015

do you have exactly the same deps

Yes, exactly the same dependencies. This PR works because the standard library bz2 module was upgraded to accept file pointers in 3.3.

Contributor

jreback commented Sep 12, 2015

ok, this looks good. pls add a note in whatsnew for 0.17.0 (just released the rc1 yesterday, but this is ok). reference both the original issue and this PR number I think.

squash & ping when green.

jreback added this to the 0.17.0 milestone Sep 12, 2015

Note added. It doesn't look like anything else references a PR; should I leave that reference in?

@jreback jreback commented on an outdated diff Sep 12, 2015

doc/source/whatsnew/v0.17.0.txt
@@ -465,6 +465,8 @@ Other enhancements
- Improved error message when concatenating an empty iterable of dataframes (:issue:`9157`)
+- ``pd.read_csv`` can now read bz2-compressed files incrementally, and the C parser can read bz2-compressed files from AWS S3 (:issue:`110701`, :pr:`11072`).
@jreback

jreback Sep 12, 2015

Contributor

just reference it like an issue :issue:11072``, we don't distinguish

stephen-hoover referenced this pull request Sep 12, 2015

Closed

Improvements for read_csv from AWS S3 #11070

4 of 4 tasks complete
@stephen-hoover stephen-hoover ENH Enable bzip2 streaming for Python 3
Python 2 can't read bz2 files, but Python 3 can. Python 3 can also read bzip files one piece at a time.
636afbe

@jreback , tests are green!

@jreback jreback added a commit that referenced this pull request Sep 13, 2015

@jreback jreback Merge pull request #11072 from stephen-hoover/stream-bzip2-files
ENH Enable bzip2 streaming for Python 3
e8d4243

@jreback jreback merged commit e8d4243 into pandas-dev:master Sep 13, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Contributor

jreback commented Sep 13, 2015

thanks!

stephen-hoover deleted the stephen-hoover:stream-bzip2-files branch Sep 14, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment