New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ZIP file decompression and TestCompression. #12175

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@lababidi

lababidi commented Jan 29, 2016

Closes #11413

@jreback

View changes

Show outdated Hide outdated pandas/io/parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Jan 29, 2016

@jreback The idea of the Mixin makes sense. Took me a bit to wrap my head around it. I had to add self.engine to a couple of the tests because bzip decompression will not raise an exception and so the Compression Mixin needs to check what engine it is using to make sure the test runs correctly.

lababidi commented Jan 29, 2016

@jreback The idea of the Mixin makes sense. Took me a bit to wrap my head around it. I had to add self.engine to a couple of the tests because bzip decompression will not raise an exception and so the Compression Mixin needs to check what engine it is using to make sure the test runs correctly.

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Jan 29, 2016

@jreback I think this is ready for your review.

lababidi commented Jan 29, 2016

@jreback I think this is ready for your review.

@max-sixty

View changes

Show outdated Hide outdated pandas/io/common.py Outdated
@max-sixty

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@max-sixty

This comment has been minimized.

Show comment
Hide comment
@max-sixty

max-sixty Jan 29, 2016

Contributor

Jeff is going to ask you to squash your commits into one, as per the contributing docs.
Nice job overall!

Contributor

max-sixty commented Jan 29, 2016

Jeff is going to ask you to squash your commits into one, as per the contributing docs.
Nice job overall!

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Jan 29, 2016

Contributor

@MaximilianR no need for squashing anymore thanks to https://github.com/pydata/pandas/blob/master/scripts/merge-py.py ;)

EDIT: which reminds me, CONTRIBUTING.md needs to be updated. Will do this weekend unless someone beats me to it.

Contributor

TomAugspurger commented Jan 29, 2016

@MaximilianR no need for squashing anymore thanks to https://github.com/pydata/pandas/blob/master/scripts/merge-py.py ;)

EDIT: which reminds me, CONTRIBUTING.md needs to be updated. Will do this weekend unless someone beats me to it.

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Jan 29, 2016

@MaximilianR @TomAugspurger Thanks for the comments guys!

lababidi commented Jan 29, 2016

@MaximilianR @TomAugspurger Thanks for the comments guys!

@max-sixty

This comment has been minimized.

Show comment
Hide comment
@max-sixty
Contributor

max-sixty commented Jan 29, 2016

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jan 29, 2016

Contributor

pls squash

even though I can do it - it's much cleaner from a future reader perspective on a smaller change

Contributor

jreback commented Jan 29, 2016

pls squash

even though I can do it - it's much cleaner from a future reader perspective on a smaller change

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Jan 29, 2016

@jreback Squashed. Thanks.

lababidi commented Jan 29, 2016

@jreback Squashed. Thanks.

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jan 30, 2016

Contributor

looks pretty good. just some minor stylistic comments.

Contributor

jreback commented Jan 30, 2016

looks pretty good. just some minor stylistic comments.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jan 30, 2016

Contributor
======================================================================
ERROR: test_to_csv_compression_value_error (pandas.tests.frame.test_to_csv.TestDataFrameToCSV)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/frame/test_to_csv.py", line 998, in test_to_csv_compression_value_error
    filename, compression="zip")
  File "/home/travis/build/pydata/pandas/pandas/util/testing.py", line 1952, in assertRaises
    _callable(*args, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/core/frame.py", line 1338, in to_csv
    formatter.save()
  File "/home/travis/build/pydata/pandas/pandas/core/format.py", line 1524, in save
    compression=self.compression)
  File "/home/travis/build/pydata/pandas/pandas/io/common.py", line 346, in _get_handle
    zip_file = zipfile.ZipFile(path)
  File "/home/travis/miniconda/envs/pandas/lib/python2.7/zipfile.py", line 770, in __init__
    self._RealGetContents()
  File "/home/travis/miniconda/envs/pandas/lib/python2.7/zipfile.py", line 811, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file

put a test in for this error as well (e.g. try to open a non-zipfile); I don't think you need any code changes though, you can just let it raise.

Contributor

jreback commented Jan 30, 2016

======================================================================
ERROR: test_to_csv_compression_value_error (pandas.tests.frame.test_to_csv.TestDataFrameToCSV)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/frame/test_to_csv.py", line 998, in test_to_csv_compression_value_error
    filename, compression="zip")
  File "/home/travis/build/pydata/pandas/pandas/util/testing.py", line 1952, in assertRaises
    _callable(*args, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/core/frame.py", line 1338, in to_csv
    formatter.save()
  File "/home/travis/build/pydata/pandas/pandas/core/format.py", line 1524, in save
    compression=self.compression)
  File "/home/travis/build/pydata/pandas/pandas/io/common.py", line 346, in _get_handle
    zip_file = zipfile.ZipFile(path)
  File "/home/travis/miniconda/envs/pandas/lib/python2.7/zipfile.py", line 770, in __init__
    self._RealGetContents()
  File "/home/travis/miniconda/envs/pandas/lib/python2.7/zipfile.py", line 811, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file

put a test in for this error as well (e.g. try to open a non-zipfile); I don't think you need any code changes though, you can just let it raise.

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 1, 2016

@jreback Good ideas. I think I covered all your requests.

lababidi commented Feb 1, 2016

@jreback Good ideas. I think I covered all your requests.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 1, 2016

Contributor

@lababidi ok, this lgtm.

just need a whatsnew note (you can put kind of what you did in the addition to the doc-string). pls add, ping when green.

Contributor

jreback commented Feb 1, 2016

@lababidi ok, this lgtm.

just need a whatsnew note (you can put kind of what you did in the addition to the doc-string). pls add, ping when green.

@jreback jreback added this to the 0.18.0 milestone Feb 1, 2016

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 2, 2016

@jreback Thanks for all your help!

lababidi commented Feb 2, 2016

@jreback Thanks for all your help!

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 2, 2016

@jreback This should be ready to go. I'm not sure why Travis is taking such a long time.

lababidi commented Feb 2, 2016

@jreback This should be ready to go. I'm not sure why Travis is taking such a long time.

@@ -1387,6 +1390,20 @@ def _wrap_compressed(f, compression, encoding=None):
data = bz2.decompress(f.read())
f = StringIO(data)
return f
elif compression == 'zip':

This comment has been minimized.

@jreback

jreback Feb 6, 2016

Contributor

this seems quite duplicative (I know it was there before). any way you can simply use _get_handle? (which already has this embeded)

@jreback

jreback Feb 6, 2016

Contributor

this seems quite duplicative (I know it was there before). any way you can simply use _get_handle? (which already has this embeded)

This comment has been minimized.

@lababidi

lababidi Feb 6, 2016

This is not that trivial. wrap_compressed() handles file objects. I'd have to refactor this code to handle for this. See the logic below

        if isinstance(f, compat.string_types):
            f = _get_handle(f, 'r', encoding=self.encoding,
                            compression=self.compression)
        elif self.compression:
            f = _wrap_compressed(f, self.compression, self.encoding)

I would then have to modify _get_handle to deal with file objects as well as strings

@lababidi

lababidi Feb 6, 2016

This is not that trivial. wrap_compressed() handles file objects. I'd have to refactor this code to handle for this. See the logic below

        if isinstance(f, compat.string_types):
            f = _get_handle(f, 'r', encoding=self.encoding,
                            compression=self.compression)
        elif self.compression:
            f = _wrap_compressed(f, self.compression, self.encoding)

I would then have to modify _get_handle to deal with file objects as well as strings

This comment has been minimized.

@jreback

jreback Feb 6, 2016

Contributor

ok then is possible to isolate this so each is just a function call to a common function? (just trying to avoid this code duplication)

@jreback

jreback Feb 6, 2016

Contributor

ok then is possible to isolate this so each is just a function call to a common function? (just trying to avoid this code duplication)

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 22, 2016

I've added the infer. Good idea @jreback let's check this out.

lababidi commented Feb 22, 2016

I've added the infer. Good idea @jreback let's check this out.

@jreback jreback referenced this pull request Mar 1, 2016

Open

Support zip files #429

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Mar 18, 2016

@jreback The parser tests will all pass. The complete test suite will not pass because master seems to have broken. Could you please merge this?

----------------------------------------------------------------------
Ran 480 tests in 34.545s

OK (SKIP=14)

lababidi commented Mar 18, 2016

@jreback The parser tests will all pass. The complete test suite will not pass because master seems to have broken. Could you please merge this?

----------------------------------------------------------------------
Ran 480 tests in 34.545s

OK (SKIP=14)
@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.18.0.txt Outdated
with open(self.csv1, 'rb') as data_file:
data = data_file.read()
expected = self.read_csv(self.csv1)

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

you haven't updated to my comments

@jreback

jreback Mar 18, 2016

Contributor

you haven't updated to my comments

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 18, 2016

Contributor

@lababidi I am happy to merge when you respond/update to my comments. I suspect you haven't rebased to master. Master passes just fine.

Contributor

jreback commented Mar 18, 2016

@lababidi I am happy to merge when you respond/update to my comments. I suspect you haven't rebased to master. Master passes just fine.

@lababidi

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated

@jreback jreback referenced this pull request Mar 18, 2016

Closed

ENH: xz compression in to_csv() resolves #11852 #12668

4 of 4 tasks complete
@jreback

View changes

Show outdated Hide outdated pandas/io/parsers.py Outdated
@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Mar 18, 2016

@jreback previous tests passed. The most recent push only changed the version in the docstring
https://travis-ci.org/pydata/pandas/builds/117001384

lababidi commented Mar 18, 2016

@jreback previous tests passed. The most recent push only changed the version in the docstring
https://travis-ci.org/pydata/pandas/builds/117001384

compression : {'gzip', 'bz2', 'zip', 'infer', None}, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer', then use gzip,
bz2 or zip if filepath_or_buffer is a string ending in '.gz', '.bz2' or
'.zip', respectively, and no decompression otherwise. New in 0.18.1: ZIP

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

need a versionadded tag here

@jreback

jreback Mar 18, 2016

Contributor

need a versionadded tag here

result = self.read_csv(path, compression='infer')
tm.assert_frame_equal(result, expected)
if self.engine is not 'python':

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

why is this check here?

@jreback

jreback Mar 18, 2016

Contributor

why is this check here?

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py Outdated
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 18, 2016

Contributor

@lababidi ok looks pretty good. only changes are to use assertRaisesRegexp when you assert for zip in order to make sure the correct messages are raised (as there are multiple possibilites).

pls make that and ping when green.

Contributor

jreback commented Mar 18, 2016

@lababidi ok looks pretty good. only changes are to use assertRaisesRegexp when you assert for zip in order to make sure the correct messages are raised (as there are multiple possibilites).

pls make that and ping when green.

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Mar 21, 2016

@jreback could you help me? the test only failed on the following:

--------------------------------------------------------------------------------------------------------------
#176 nose.failure.Failure.runTest: direct creation of extension dtype datetime64[ns, UTC] is not supported ATM

lababidi commented Mar 21, 2016

@jreback could you help me? the test only failed on the following:

--------------------------------------------------------------------------------------------------------------
#176 nose.failure.Failure.runTest: direct creation of extension dtype datetime64[ns, UTC] is not supported ATM
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 21, 2016

Contributor

not sure where you are seeing this. your 2.7 tests is failing because of a linting issue (line too long)

git diff master | flake8 --diff

Contributor

jreback commented Mar 21, 2016

not sure where you are seeing this. your 2.7 tests is failing because of a linting issue (line too long)

git diff master | flake8 --diff

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Mar 21, 2016

@jreback it's in the Travis results

lababidi commented Mar 21, 2016

@jreback it's in the Travis results

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 21, 2016

Contributor

I restarted that job, though you need to repush anyhow (lint error). Never saw that one before; I think its a crash in something else, so let's see if it recurs.

Contributor

jreback commented Mar 21, 2016

I restarted that job, though you need to repush anyhow (lint error). Never saw that one before; I think its a crash in something else, so let's see if it recurs.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 22, 2016

Contributor

@lababidi ok this passed, except for the lint check. pls fix and repush, ping when green.

Contributor

jreback commented Mar 22, 2016

@lababidi ok this passed, except for the lint check. pls fix and repush, ping when green.

Mahmoud Lababidi
Add ZIP file decompression and TestCompression.
Fix PEP8 issues. Change Compression to be a Mixin. Add Compression Mixin correctly with current Tests. Add .format, Rename Compression, with-block, empty zip, bad-zip
@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Mar 22, 2016

Thank you @jreback for your help and patience with this. I'll help out on the other issues soon.

lababidi commented Mar 22, 2016

Thank you @jreback for your help and patience with this. I'll help out on the other issues soon.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 22, 2016

Contributor

@lababidi no thank you!

Contributor

jreback commented Mar 22, 2016

@lababidi no thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment