Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer compression if file extension is uppercase #35164

Merged

Conversation

willbowditch
Copy link
Contributor

Inferring compression fails for files with uppercase extensions (e.g. x.zip works but y.ZIP does not)

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@WillAyd
Copy link
Member

WillAyd commented Jul 7, 2020

Do we do this elsewhere? Outside of Windows most file systems I think are case sensitive so while relatively harmless I also am not sure this is worth doing

@willbowditch
Copy link
Contributor Author

I think the path is retained with the original case and couldn't see any uses beyond identifying the compression method. Ran into this issue on OS X, tested locally with read_csv and capitalised zip files

@gfyoung gfyoung added IO Data IO issues that don't fit into a more specific label Enhancement Needs Discussion Requires discussion from core team before further action labels Jul 9, 2020
@gfyoung
Copy link
Member

gfyoung commented Jul 9, 2020

Do we do this elsewhere?

Not that I'm aware of.

Outside of Windows most file systems I think are case sensitive so while relatively harmless I also am not sure this is worth doing

I think it would be okay to add this, but only after checking if we are on Windows.

cc @jreback

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is ok to do everywhere.

@@ -1050,6 +1050,7 @@ I/O
- Bug in :meth:`~HDFStore.create_table` now raises an error when `column` argument was not specified in `data_columns` on input (:issue:`28156`)
- :meth:`read_json` now could read line-delimited json file from a file url while `lines` and `chunksize` are set.
- Bug in :meth:`DataFrame.to_sql` when reading DataFrames with ``-np.inf`` entries with MySQL now has a more explicit ``ValueError`` (:issue:`34431`)
- Bug in :meth:`io.common.infer_compression` where capitalised files extensions were not decompressed by read_* functions.
Copy link
Contributor

@jreback jreback Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this PR number here (use the issue format). Also no need to reference this internal function at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback changed in a3558fd

@WillAyd
Copy link
Member

WillAyd commented Jul 9, 2020 via email

@jreback
Copy link
Contributor

jreback commented Jul 17, 2020

this is fine; we already have our own rules for when to decompress - this is just expanding to capitalized which is not a big deal

@jreback jreback added this to the 1.1 milestone Jul 17, 2020
@jreback jreback merged commit 4da8622 into pandas-dev:master Jul 17, 2020
@jreback
Copy link
Contributor

jreback commented Jul 17, 2020

thanks @willbowditch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants