Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
ENH Add check for inferred compression before `get_filepath_or_buffer` #11074
Conversation
|
pls change tests which are incorrect as well |
stephen-hoover
commented
Sep 12, 2015
|
I made this PR so that it didn't break any tests. Are the parsers ever accessed outside of the |
|
the infer param can be moved higher up in the stack (eg in the get_filepath_or_buffer) - makes the readers simpler in that respect |
stephen-hoover
commented
Sep 12, 2015
|
Found it. I actually didn't need to change any tests. Now the only check for file extensions happens in the |
|
gr8 |
jreback
added the
Data IO
label
Sep 12, 2015
stephen-hoover
commented
Sep 12, 2015
|
Added a test using the new files in s3://pandas-test/. |
stephen-hoover
referenced
this pull request
Sep 12, 2015
Closed
Improvements for read_csv from AWS S3 #11070
stephen-hoover
commented
Sep 14, 2015
|
Should I do anything else for this PR? |
jreback
added this to the
0.17.0
milestone
Sep 14, 2015
|
can you add a whatsnew note for this |
|
pls rebase. ping when green. |
stephen-hoover
commented
Sep 15, 2015
|
@jreback , green! I found had to tweak |
jreback
added a commit
that referenced
this pull request
Sep 15, 2015
|
|
jreback |
da6ad3f
|
jreback
merged commit da6ad3f
into pandas-dev:master
Sep 15, 2015
1 check passed
|
thanks! |
stephen-hoover commentedSep 12, 2015
When reading CSVs, if
compression='infer', check the input before callingget_filepath_or_bufferin the_readfunction. This way we can catch compresion extensions on S3 files. Partially resolves issue #11070 .Checking for the file extension in the
_readfunction should make the checks inside the parsers redundant. When I tried to remove them, however, I discovered that there's tests which assume the parsers can take an "infer" compression, so I left their checks.I also discovered that the URL-reading code has a test which reads a URL ending in "gz" but which appears not to be gzip encoded, so this PR attempts to preserve its verdict in that case.