New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ZIP file decompression and TestCompression. #12175

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
5 participants
@lababidi

Closes #11413

@jreback

View changes

Show outdated Hide outdated pandas/io/parsers.py
klass = FixedWidthFieldParser
else: #default to engine == 'python':

This comment has been minimized.

@jreback

jreback Jan 29, 2016

Contributor

why did you modify this?

@jreback

jreback Jan 29, 2016

Contributor

why did you modify this?

This comment has been minimized.

@lababidi

lababidi Jan 29, 2016

I modified this to default to the Python engine. If this is not the wanted functionality. I can remove it.

@lababidi

lababidi Jan 29, 2016

I modified this to default to the Python engine. If this is not the wanted functionality. I can remove it.

This comment has been minimized.

@jreback

jreback Jan 29, 2016

Contributor

just change code that is relevant.

@jreback

jreback Jan 29, 2016

Contributor

just change code that is relevant.

This comment has been minimized.

@lababidi

lababidi Jan 29, 2016

will do. Does this need to a separate PR?

@lababidi

lababidi Jan 29, 2016

will do. Does this need to a separate PR?

This comment has been minimized.

@jreback

jreback Jan 29, 2016

Contributor

not sure why you are changing this in the first place.

@jreback

jreback Jan 29, 2016

Contributor

not sure why you are changing this in the first place.

This comment has been minimized.

@lababidi

lababidi Jan 29, 2016

For the use case that engine is neither 'python' nor 'python-fwf'.

Is it possible for this to happen?

@lababidi

lababidi Jan 29, 2016

For the use case that engine is neither 'python' nor 'python-fwf'.

Is it possible for this to happen?

This comment has been minimized.

@jreback

jreback Jan 29, 2016

Contributor

no, that would be an exception and should raise a ValueError (do this in another issue/PR)

@jreback

jreback Jan 29, 2016

Contributor

no, that would be an exception and should raise a ValueError (do this in another issue/PR)

This comment has been minimized.

@lababidi

lababidi Jan 29, 2016

ok, thanks for the clarification.

@lababidi

lababidi Jan 29, 2016

ok, thanks for the clarification.

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
from pandas.compat import parse_date
import pandas.lib as lib
from pandas import compat

This comment has been minimized.

@jreback

jreback Jan 29, 2016

Contributor

why did you change this?

@jreback

jreback Jan 29, 2016

Contributor

why did you change this?

This comment has been minimized.

@lababidi

lababidi Jan 29, 2016

Just reorganized the imports to be legible. Notice how now we can see a bit of simplicity that can be made after the organizations:

from pandas import DataFrame, Series, Index, MultiIndex, DatetimeIndex
from pandas import compat
from pandas.compat import(
    StringIO, BytesIO, PY3, range, long, lrange, lmap, u
)
from pandas.compat import parse_date
from pandas.io.common import DtypeWarning
from pandas.io.common import URLError
@lababidi

lababidi Jan 29, 2016

Just reorganized the imports to be legible. Notice how now we can see a bit of simplicity that can be made after the organizations:

from pandas import DataFrame, Series, Index, MultiIndex, DatetimeIndex
from pandas import compat
from pandas.compat import(
    StringIO, BytesIO, PY3, range, long, lrange, lmap, u
)
from pandas.compat import parse_date
from pandas.io.common import DtypeWarning
from pandas.io.common import URLError

This comment has been minimized.

@jreback

jreback Jan 29, 2016

Contributor

ok then. pls git diff master | flake8 --diff. linting is not enabled yet in travis to fail but will be shortly.

@jreback

jreback Jan 29, 2016

Contributor

ok then. pls git diff master | flake8 --diff. linting is not enabled yet in travis to fail but will be shortly.

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
def test_zip(self):
try:
import zipfile

This comment has been minimized.

@jreback

jreback Jan 29, 2016

Contributor

instead of making this a direct Testing class, make it class Compression(object), then add this as a mixin to the TestCHighMemoryParser/LowMemory, TestPythonParser, and TestFixedWidth, so these routines are run for each type of parser (and don't defined read_csv/read_table. That way don't have to repeat tests and they test all engines

@jreback

jreback Jan 29, 2016

Contributor

instead of making this a direct Testing class, make it class Compression(object), then add this as a mixin to the TestCHighMemoryParser/LowMemory, TestPythonParser, and TestFixedWidth, so these routines are run for each type of parser (and don't defined read_csv/read_table. That way don't have to repeat tests and they test all engines

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Jan 29, 2016

@jreback The idea of the Mixin makes sense. Took me a bit to wrap my head around it. I had to add self.engine to a couple of the tests because bzip decompression will not raise an exception and so the Compression Mixin needs to check what engine it is using to make sure the test runs correctly.

@jreback The idea of the Mixin makes sense. Took me a bit to wrap my head around it. I had to add self.engine to a couple of the tests because bzip decompression will not raise an exception and so the Compression Mixin needs to check what engine it is using to make sure the test runs correctly.

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Jan 29, 2016

@jreback I think this is ready for your review.

@jreback I think this is ready for your review.

@max-sixty

View changes

Show outdated Hide outdated pandas/io/common.py
f = zip_file.open(file_name)
else:
raise ValueError('ZIP file contains multiple files {}',
zip_file.filename)

This comment has been minimized.

@max-sixty

max-sixty Jan 29, 2016

Contributor

You need a .format here

@max-sixty

max-sixty Jan 29, 2016

Contributor

You need a .format here

@max-sixty

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
from pandas.lib import Timestamp
from pandas.tseries.index import date_range
import pandas.tseries.tools as tools
class Compression(object):

This comment has been minimized.

@max-sixty

max-sixty Jan 29, 2016

Contributor

Does this need to be called CompressionTest to get picked up? I know different test frameworks have different requirements.

@max-sixty

max-sixty Jan 29, 2016

Contributor

Does this need to be called CompressionTest to get picked up? I know different test frameworks have different requirements.

This comment has been minimized.

@lababidi

lababidi Jan 29, 2016

It's not actually a Test. It's just a Mixin that gets pulled into other Tests. Those Tests will call these methods within Compression

@lababidi

lababidi Jan 29, 2016

It's not actually a Test. It's just a Mixin that gets pulled into other Tests. Those Tests will call these methods within Compression

This comment has been minimized.

@jreback

jreback Jan 30, 2016

Contributor

move this after ParserTests, maybe call this CompressionTests to be more informative (and still avoide nose from actually running it as its a mixin)

@jreback

jreback Jan 30, 2016

Contributor

move this after ParserTests, maybe call this CompressionTests to be more informative (and still avoide nose from actually running it as its a mixin)

This comment has been minimized.

@tollycoast

tollycoast Jan 30, 2016

Thanks sir
Date: Sat, 30 Jan 2016 07:05:55 -0800
From: notifications@github.com
To: pandas@noreply.github.com
Subject: Re: [pandas] Add ZIP file decompression and TestCompression. (#12175)

In pandas/io/tests/test_parsers.py:

-from pandas.compat import parse_date
-import pandas.lib as lib
-from pandas import compat
-from pandas.lib import Timestamp
-from pandas.tseries.index import date_range
-import pandas.tseries.tools as tools
+class Compression(object):

move this after ParserTests, maybe call this CompressionTests to be more informative (and still avoide nose from actually running it as its a mixin)


Reply to this email directly or view it on GitHub.

@tollycoast

tollycoast Jan 30, 2016

Thanks sir
Date: Sat, 30 Jan 2016 07:05:55 -0800
From: notifications@github.com
To: pandas@noreply.github.com
Subject: Re: [pandas] Add ZIP file decompression and TestCompression. (#12175)

In pandas/io/tests/test_parsers.py:

-from pandas.compat import parse_date
-import pandas.lib as lib
-from pandas import compat
-from pandas.lib import Timestamp
-from pandas.tseries.index import date_range
-import pandas.tseries.tools as tools
+class Compression(object):

move this after ParserTests, maybe call this CompressionTests to be more informative (and still avoide nose from actually running it as its a mixin)


Reply to this email directly or view it on GitHub.

@max-sixty

This comment has been minimized.

Show comment
Hide comment
@max-sixty

max-sixty Jan 29, 2016

Contributor

Jeff is going to ask you to squash your commits into one, as per the contributing docs.
Nice job overall!

Contributor

max-sixty commented Jan 29, 2016

Jeff is going to ask you to squash your commits into one, as per the contributing docs.
Nice job overall!

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Jan 29, 2016

Contributor

@MaximilianR no need for squashing anymore thanks to https://github.com/pydata/pandas/blob/master/scripts/merge-py.py ;)

EDIT: which reminds me, CONTRIBUTING.md needs to be updated. Will do this weekend unless someone beats me to it.

Contributor

TomAugspurger commented Jan 29, 2016

@MaximilianR no need for squashing anymore thanks to https://github.com/pydata/pandas/blob/master/scripts/merge-py.py ;)

EDIT: which reminds me, CONTRIBUTING.md needs to be updated. Will do this weekend unless someone beats me to it.

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Jan 29, 2016

@MaximilianR @TomAugspurger Thanks for the comments guys!

@MaximilianR @TomAugspurger Thanks for the comments guys!

@max-sixty

This comment has been minimized.

Show comment
Hide comment
Contributor

max-sixty commented Jan 29, 2016

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jan 29, 2016

Contributor

pls squash

even though I can do it - it's much cleaner from a future reader perspective on a smaller change

Contributor

jreback commented Jan 29, 2016

pls squash

even though I can do it - it's much cleaner from a future reader perspective on a smaller change

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Jan 29, 2016

@jreback Squashed. Thanks.

@jreback Squashed. Thanks.

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
result = self.read_csv(path, compression='zip')
tm.assert_frame_equal(result, expected)
result = self.read_csv(open(path, 'rb'), compression='zip')

This comment has been minimized.

@jreback

jreback Jan 30, 2016

Contributor

do this in a with block to make sure the file is closed

@jreback

jreback Jan 30, 2016

Contributor

do this in a with block to make sure the file is closed

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
with tm.ensure_clean() as path:
file_names = ['test_file', 'second_file']
tmp = zipfile.ZipFile(path, mode='w')
for file_name in file_names:

This comment has been minimized.

@jreback

jreback Jan 30, 2016

Contributor

test on an empty zipfile as well

@jreback

jreback Jan 30, 2016

Contributor

test on an empty zipfile as well

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
result = self.read_csv(path, compression='gzip')
tm.assert_frame_equal(result, expected)
result = self.read_csv(open(path, 'rb'), compression='gzip')

This comment has been minimized.

@jreback

jreback Jan 30, 2016

Contributor

with block here

@jreback

jreback Jan 30, 2016

Contributor

with block here

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
@@ -2623,7 +2732,9 @@ def test_eof_states(self):
StringIO(data), escapechar='\\')
class TestPythonParser(ParserTests, tm.TestCase):
class TestPythonParser(ParserTests, tm.TestCase, Compression):

This comment has been minimized.

@jreback

jreback Jan 30, 2016

Contributor

make tm.TestCase the last class

@jreback

jreback Jan 30, 2016

Contributor

make tm.TestCase the last class

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
@@ -3442,17 +3553,19 @@ def test_buffer_rd_bytes(self):
except Exception as e:
pass
class TestCParserHighMemory(CParserTests, tm.TestCase):
class TestCParserHighMemory(CParserTests, tm.TestCase, Compression):

This comment has been minimized.

@jreback

jreback Jan 30, 2016

Contributor

same here

@jreback

jreback Jan 30, 2016

Contributor

same here

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
@@ -3753,18 +3866,20 @@ def test_single_char_leading_whitespace(self):
tm.assert_frame_equal(result, expected)
class TestCParserLowMemory(CParserTests, tm.TestCase):
class TestCParserLowMemory(CParserTests, tm.TestCase, Compression):

This comment has been minimized.

@jreback

jreback Jan 30, 2016

Contributor

same here

@jreback

jreback Jan 30, 2016

Contributor

same here

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jan 30, 2016

Contributor

looks pretty good. just some minor stylistic comments.

Contributor

jreback commented Jan 30, 2016

looks pretty good. just some minor stylistic comments.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jan 30, 2016

Contributor
======================================================================
ERROR: test_to_csv_compression_value_error (pandas.tests.frame.test_to_csv.TestDataFrameToCSV)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/frame/test_to_csv.py", line 998, in test_to_csv_compression_value_error
    filename, compression="zip")
  File "/home/travis/build/pydata/pandas/pandas/util/testing.py", line 1952, in assertRaises
    _callable(*args, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/core/frame.py", line 1338, in to_csv
    formatter.save()
  File "/home/travis/build/pydata/pandas/pandas/core/format.py", line 1524, in save
    compression=self.compression)
  File "/home/travis/build/pydata/pandas/pandas/io/common.py", line 346, in _get_handle
    zip_file = zipfile.ZipFile(path)
  File "/home/travis/miniconda/envs/pandas/lib/python2.7/zipfile.py", line 770, in __init__
    self._RealGetContents()
  File "/home/travis/miniconda/envs/pandas/lib/python2.7/zipfile.py", line 811, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file

put a test in for this error as well (e.g. try to open a non-zipfile); I don't think you need any code changes though, you can just let it raise.

Contributor

jreback commented Jan 30, 2016

======================================================================
ERROR: test_to_csv_compression_value_error (pandas.tests.frame.test_to_csv.TestDataFrameToCSV)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/frame/test_to_csv.py", line 998, in test_to_csv_compression_value_error
    filename, compression="zip")
  File "/home/travis/build/pydata/pandas/pandas/util/testing.py", line 1952, in assertRaises
    _callable(*args, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/core/frame.py", line 1338, in to_csv
    formatter.save()
  File "/home/travis/build/pydata/pandas/pandas/core/format.py", line 1524, in save
    compression=self.compression)
  File "/home/travis/build/pydata/pandas/pandas/io/common.py", line 346, in _get_handle
    zip_file = zipfile.ZipFile(path)
  File "/home/travis/miniconda/envs/pandas/lib/python2.7/zipfile.py", line 770, in __init__
    self._RealGetContents()
  File "/home/travis/miniconda/envs/pandas/lib/python2.7/zipfile.py", line 811, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
BadZipfile: File is not a zip file

put a test in for this error as well (e.g. try to open a non-zipfile); I don't think you need any code changes though, you can just let it raise.

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 1, 2016

@jreback Good ideas. I think I covered all your requests.

lababidi commented Feb 1, 2016

@jreback Good ideas. I think I covered all your requests.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 1, 2016

Contributor

@lababidi ok, this lgtm.

just need a whatsnew note (you can put kind of what you did in the addition to the doc-string). pls add, ping when green.

Contributor

jreback commented Feb 1, 2016

@lababidi ok, this lgtm.

just need a whatsnew note (you can put kind of what you did in the addition to the doc-string). pls add, ping when green.

@jreback jreback added this to the 0.18.0 milestone Feb 1, 2016

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 2, 2016

@jreback Thanks for all your help!

lababidi commented Feb 2, 2016

@jreback Thanks for all your help!

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 2, 2016

@jreback This should be ready to go. I'm not sure why Travis is taking such a long time.

lababidi commented Feb 2, 2016

@jreback This should be ready to go. I'm not sure why Travis is taking such a long time.

@@ -1387,6 +1390,20 @@ def _wrap_compressed(f, compression, encoding=None):
data = bz2.decompress(f.read())
f = StringIO(data)
return f
elif compression == 'zip':

This comment has been minimized.

@jreback

jreback Feb 6, 2016

Contributor

this seems quite duplicative (I know it was there before). any way you can simply use _get_handle? (which already has this embeded)

@jreback

jreback Feb 6, 2016

Contributor

this seems quite duplicative (I know it was there before). any way you can simply use _get_handle? (which already has this embeded)

This comment has been minimized.

@lababidi

lababidi Feb 6, 2016

This is not that trivial. wrap_compressed() handles file objects. I'd have to refactor this code to handle for this. See the logic below

        if isinstance(f, compat.string_types):
            f = _get_handle(f, 'r', encoding=self.encoding,
                            compression=self.compression)
        elif self.compression:
            f = _wrap_compressed(f, self.compression, self.encoding)

I would then have to modify _get_handle to deal with file objects as well as strings

@lababidi

lababidi Feb 6, 2016

This is not that trivial. wrap_compressed() handles file objects. I'd have to refactor this code to handle for this. See the logic below

        if isinstance(f, compat.string_types):
            f = _get_handle(f, 'r', encoding=self.encoding,
                            compression=self.compression)
        elif self.compression:
            f = _wrap_compressed(f, self.compression, self.encoding)

I would then have to modify _get_handle to deal with file objects as well as strings

This comment has been minimized.

@jreback

jreback Feb 6, 2016

Contributor

ok then is possible to isolate this so each is just a function call to a common function? (just trying to avoid this code duplication)

@jreback

jreback Feb 6, 2016

Contributor

ok then is possible to isolate this so each is just a function call to a common function? (just trying to avoid this code duplication)

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
except ImportError:
raise nose.SkipTest('need gzip to run')
data = open(self.csv1, 'rb').read()

This comment has been minimized.

@jreback

jreback Feb 6, 2016

Contributor

these filehandles are not getting closed, do it in a with block. (same below as well)

@jreback

jreback Feb 6, 2016

Contributor

these filehandles are not getting closed, do it in a with block. (same below as well)

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
except ImportError:
raise nose.SkipTest('need gzip and bz2 to run')
data = open(self.csv1, 'rb').read()

This comment has been minimized.

@jreback

jreback Feb 6, 2016

Contributor

same here

@jreback

jreback Feb 6, 2016

Contributor

same here

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
except ImportError:
raise nose.SkipTest('need bz2 to run')
data = open(self.csv1, 'rb').read()

This comment has been minimized.

@jreback

jreback Feb 6, 2016

Contributor

here

@jreback

jreback Feb 6, 2016

Contributor

here

@@ -563,6 +563,18 @@ cdef class TextReader:
else:
raise ValueError('Python 2 cannot read bz2 from open file '
'handle')
elif self.compression == 'zip':
import zipfile

This comment has been minimized.

@jreback

jreback Feb 6, 2016

Contributor

IIRC we are going to fix this in the other issue? #11666 & #11667

@jreback

jreback Feb 6, 2016

Contributor

IIRC we are going to fix this in the other issue? #11666 & #11667

This comment has been minimized.

@lababidi

lababidi Feb 6, 2016

#11666 allows gz/bz in pickles. That's next on the pipeline.

#11667 I'm not sure how this is relevant

@lababidi

lababidi Feb 6, 2016

#11666 allows gz/bz in pickles. That's next on the pipeline.

#11667 I'm not sure how this is relevant

This comment has been minimized.

@jreback

jreback Feb 6, 2016

Contributor

typo: #11677 (maybe some helpful code)

@jreback

jreback Feb 6, 2016

Contributor

typo: #11677 (maybe some helpful code)

This comment has been minimized.

@lababidi

lababidi Feb 6, 2016

@jreback Thanks. I'll tackle this next along with pickles. I don't think this will be an overnight fix, but it's worth doing.

@lababidi

lababidi Feb 6, 2016

@jreback Thanks. I'll tackle this next along with pickles. I don't think this will be an overnight fix, but it's worth doing.

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 8, 2016

@jreback I'm getting the following errors in the tests in Travis. I'm not sure how these relate to this work I've done. Any help would be appreciated:

======================================================================
FAIL: test_deprecated_levels (pandas.tests.test_categorical.TestCategorical)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/test_categorical.py", line 1444, in test_deprecated_levels
    self.assertFalse(LooseVersion(pd.__version__) >= '0.18')
AssertionError: True is not false
======================================================================
ERROR: test_deprecated_levels (pandas.tests.test_categorical.TestCategorical)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/test_categorical.py", line 1444, in test_deprecated_levels
    self.assertFalse(LooseVersion(pd.__version__) >= '0.18')
  File "/home/travis/miniconda/envs/pandas/lib/python3.4/distutils/version.py", line 76, in __ge__
    c = self._cmp(other)
  File "/home/travis/miniconda/envs/pandas/lib/python3.4/distutils/version.py", line 343, in _cmp
    if self.version < other.version:
TypeError: unorderable types: str() < int()

lababidi commented Feb 8, 2016

@jreback I'm getting the following errors in the tests in Travis. I'm not sure how these relate to this work I've done. Any help would be appreciated:

======================================================================
FAIL: test_deprecated_levels (pandas.tests.test_categorical.TestCategorical)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/test_categorical.py", line 1444, in test_deprecated_levels
    self.assertFalse(LooseVersion(pd.__version__) >= '0.18')
AssertionError: True is not false
======================================================================
ERROR: test_deprecated_levels (pandas.tests.test_categorical.TestCategorical)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/test_categorical.py", line 1444, in test_deprecated_levels
    self.assertFalse(LooseVersion(pd.__version__) >= '0.18')
  File "/home/travis/miniconda/envs/pandas/lib/python3.4/distutils/version.py", line 76, in __ge__
    c = self._cmp(other)
  File "/home/travis/miniconda/envs/pandas/lib/python3.4/distutils/version.py", line 343, in _cmp
    if self.version < other.version:
TypeError: unorderable types: str() < int()
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 10, 2016

Contributor

ok, rebase on master, we had a temporary issue. ping when green.

Contributor

jreback commented Feb 10, 2016

ok, rebase on master, we had a temporary issue. ping when green.

raise nose.SkipTest('need zipfile to run')
with open(self.csv1, 'rb') as data_file:
data = data_file.read()

This comment has been minimized.

@jreback

jreback Feb 10, 2016

Contributor

you don't really need to make this with encompass the entire block

with open(self.csv1, 'rb') as data_file:
    data = data_file.read()
expected = .....

with tm.ensure_clean() as path:
     file_name = ....
     ....
     tmp.writestr(file_name, data)

as data is in scope

@jreback

jreback Feb 10, 2016

Contributor

you don't really need to make this with encompass the entire block

with open(self.csv1, 'rb') as data_file:
    data = data_file.read()
expected = .....

with tm.ensure_clean() as path:
     file_name = ....
     ....
     tmp.writestr(file_name, data)

as data is in scope

This comment has been minimized.

@jreback

jreback Feb 22, 2016

Contributor

you need to do what I indicated above.

@jreback

jreback Feb 22, 2016

Contributor

you need to do what I indicated above.

@jreback jreback modified the milestones: 0.18.1, 0.18.0 Feb 12, 2016

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 18, 2016

@MaximilianR @jreback
Is this something in the tests need to worry about?
Could someone please help me merge this?

Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/test_categorical.py", line 1444, in test_deprecated_levels
    self.assertFalse(LooseVersion(pd.__version__) >= '0.18')
AssertionError: True is not false

@MaximilianR @jreback
Is this something in the tests need to worry about?
Could someone please help me merge this?

Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/test_categorical.py", line 1444, in test_deprecated_levels
    self.assertFalse(LooseVersion(pd.__version__) >= '0.18')
AssertionError: True is not false
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 18, 2016

Contributor

rebase on master. That was updated last week when I tagged 0.18.0.rc1

Contributor

jreback commented Feb 18, 2016

rebase on master. That was updated last week when I tagged 0.18.0.rc1

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 21, 2016

@jreback let's do this, double green checkmarks!

@jreback let's do this, double green checkmarks!

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 22, 2016

Contributor

you need to rebase on master.

Contributor

jreback commented Feb 22, 2016

you need to rebase on master.

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 22, 2016

@jreback I just rebased to master. I'm losing sleep on this. Please, I'm begging you, merge this. @MaximilianR @TomAugspurger

FAIL: test_deprecated_labels (pandas.tests.test_categorical.TestCategorical)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/test_categorical.py", line 1430, in test_deprecated_labels
    self.assertFalse(LooseVersion(pd.__version__) >= '0.18')
AssertionError: True is not false
======================================================================
FAIL: test_deprecated_levels (pandas.tests.test_categorical.TestCategorical)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/test_categorical.py", line 1444, in test_deprecated_levels
    self.assertFalse(LooseVersion(pd.__version__) >= '0.18')
AssertionError: True is not false

@jreback I just rebased to master. I'm losing sleep on this. Please, I'm begging you, merge this. @MaximilianR @TomAugspurger

FAIL: test_deprecated_labels (pandas.tests.test_categorical.TestCategorical)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/test_categorical.py", line 1430, in test_deprecated_labels
    self.assertFalse(LooseVersion(pd.__version__) >= '0.18')
AssertionError: True is not false
======================================================================
FAIL: test_deprecated_levels (pandas.tests.test_categorical.TestCategorical)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tests/test_categorical.py", line 1444, in test_deprecated_levels
    self.assertFalse(LooseVersion(pd.__version__) >= '0.18')
AssertionError: True is not false
@jreback

This comment has been minimized.

Show comment
Hide comment
Contributor

jreback commented Feb 22, 2016

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 22, 2016

@jreback I don't understand? there's only one commit: https://github.com/pydata/pandas/pull/12175/commits

I have rebased to master. Please, merge this branch.

@jreback I don't understand? there's only one commit: https://github.com/pydata/pandas/pull/12175/commits

I have rebased to master. Please, merge this branch.

@jreback

View changes

Show outdated Hide outdated pandas/io/parsers.py
bz2 or zip if filepath_or_buffer is a string ending in '.gz', '.bz2' or
'.zip', respectively, and no decompression otherwise. If using 'zip',
the ZIP file must contain only one data file to be read in. Set to None
for no decompression.

This comment has been minimized.

@jreback

jreback Feb 22, 2016

Contributor

say new in 0.18.0 for zip

@jreback

jreback Feb 22, 2016

Contributor

say new in 0.18.0 for zip

with open(self.csv1, 'rb') as data_file:
data = data_file.read()
expected = self.read_csv(self.csv1)

This comment has been minimized.

@jreback

jreback Feb 22, 2016

Contributor

same here

@jreback

jreback Feb 22, 2016

Contributor

same here

with open(self.csv1, 'rb') as data_file:
data = data_file.read()
data = data.replace(b',', b'::')

This comment has been minimized.

@jreback

jreback Feb 22, 2016

Contributor

same here

@jreback

jreback Feb 22, 2016

Contributor

same here

@@ -2636,7 +2630,145 @@ def test_eof_states(self):
StringIO(data), escapechar='\\')
class TestPythonParser(ParserTests, tm.TestCase):
class CompressionTests(object):
def test_zip(self):

This comment has been minimized.

@jreback

jreback Feb 22, 2016

Contributor

need tests that exercises infer (the default). you can do this in each individual test for that zip where you name a file with the appropriate extension and don't pass the compression option.

Its possible tests like these are elsewhere, not sure.

@jreback

jreback Feb 22, 2016

Contributor

need tests that exercises infer (the default). you can do this in each individual test for that zip where you name a file with the appropriate extension and don't pass the compression option.

Its possible tests like these are elsewhere, not sure.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Feb 22, 2016

Contributor

@lababidi thanks. some more comments. This should be the last round.

Contributor

jreback commented Feb 22, 2016

@lababidi thanks. some more comments. This should be the last round.

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Feb 22, 2016

I've added the infer. Good idea @jreback let's check this out.

I've added the infer. Good idea @jreback let's check this out.

@jreback jreback referenced this pull request Mar 1, 2016

Open

Support zip files #429

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Mar 18, 2016

@jreback The parser tests will all pass. The complete test suite will not pass because master seems to have broken. Could you please merge this?

----------------------------------------------------------------------
Ran 480 tests in 34.545s

OK (SKIP=14)

@jreback The parser tests will all pass. The complete test suite will not pass because master seems to have broken. Could you please merge this?

----------------------------------------------------------------------
Ran 480 tests in 34.545s

OK (SKIP=14)
@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.18.0.txt
@@ -522,6 +522,7 @@ Other enhancements
- ``HDFStore`` is now iterable: ``for k in store`` is equivalent to ``for k in store.keys()`` (:issue:`12221`).
- Add missing methods/fields to ``.dt`` for ``Period`` (:issue:`8848`)
- The entire codebase has been ``PEP``-ified (:issue:`12096`)
- ``read_csv`` now supports opening ZIP files that contains a single CSV (:issue:`12175`)

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

needs to go 0.18.1

@jreback

jreback Mar 18, 2016

Contributor

needs to go 0.18.1

with open(self.csv1, 'rb') as data_file:
data = data_file.read()
expected = self.read_csv(self.csv1)

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

you haven't updated to my comments

@jreback

jreback Mar 18, 2016

Contributor

you haven't updated to my comments

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 18, 2016

Contributor

@lababidi I am happy to merge when you respond/update to my comments. I suspect you haven't rebased to master. Master passes just fine.

Contributor

jreback commented Mar 18, 2016

@lababidi I am happy to merge when you respond/update to my comments. I suspect you haven't rebased to master. Master passes just fine.

@lababidi

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
self.assertRaises(ValueError, self.read_csv,
path, compression='infer')

This comment has been minimized.

@lababidi

lababidi Mar 18, 2016

@jreback here's an infer

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
file_name = 'test_file.zip'
with tm.ensure_clean(file_name) as path:
tmp = zipfile.ZipFile(path, mode='w')

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

you didn't change anything here

@jreback

jreback Mar 18, 2016

Contributor

you didn't change anything here

This comment has been minimized.

@lababidi

lababidi Mar 18, 2016

@jreback what line are you referring to? zipfile.ZipFile?

@lababidi

lababidi Mar 18, 2016

@jreback what line are you referring to? zipfile.ZipFile?

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

no, your context managers are nested, but don't need to be

 with open(self.csv1, 'rb') as data_file:
        data = data_file.read()

data can be used here
@jreback

jreback Mar 18, 2016

Contributor

no, your context managers are nested, but don't need to be

 with open(self.csv1, 'rb') as data_file:
        data = data_file.read()

data can be used here

This comment has been minimized.

@lababidi

lababidi Mar 18, 2016

@jreback thanks for clarifying. I was using the previous convention. I'll clean this up now.

@lababidi

lababidi Mar 18, 2016

@jreback thanks for clarifying. I was using the previous convention. I'll clean this up now.

@jreback jreback referenced this pull request Mar 18, 2016

Closed

ENH: xz compression in to_csv() resolves #11852 #12668

4 of 4 tasks complete
@jreback

View changes

Show outdated Hide outdated pandas/io/parsers.py
compression : {'gzip', 'bz2', 'zip', 'infer', None}, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer', then use gzip,
bz2 or zip if filepath_or_buffer is a string ending in '.gz', '.bz2' or
'.zip', respectively, and no decompression otherwise. New in 0.18.0: ZIP

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

so add in a versionadded tag and put the 0.18.0 stuff there

@jreback

jreback Mar 18, 2016

Contributor

so add in a versionadded tag and put the 0.18.0 stuff there

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Mar 18, 2016

@jreback previous tests passed. The most recent push only changed the version in the docstring
https://travis-ci.org/pydata/pandas/builds/117001384

@jreback previous tests passed. The most recent push only changed the version in the docstring
https://travis-ci.org/pydata/pandas/builds/117001384

compression : {'gzip', 'bz2', 'zip', 'infer', None}, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer', then use gzip,
bz2 or zip if filepath_or_buffer is a string ending in '.gz', '.bz2' or
'.zip', respectively, and no decompression otherwise. New in 0.18.1: ZIP

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

need a versionadded tag here

@jreback

jreback Mar 18, 2016

Contributor

need a versionadded tag here

result = self.read_csv(path, compression='infer')
tm.assert_frame_equal(result, expected)
if self.engine is not 'python':

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

why is this check here?

@jreback

jreback Mar 18, 2016

Contributor

why is this check here?

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
expected = self.read_csv(self.csv1)
file_name = 'test_file.zip'
with tm.ensure_clean(file_name) as path:

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

you can put the file_name directly in (like you do below)

@jreback

jreback Mar 18, 2016

Contributor

you can put the file_name directly in (like you do below)

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
tmp.writestr(file_name, data)
tmp.close()
self.assertRaises(ValueError, self.read_csv,

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

can you use assertRaisesRegex here (to check that the Multiple Files is raised)

@jreback

jreback Mar 18, 2016

Contributor

can you use assertRaisesRegex here (to check that the Multiple Files is raised)

@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_parsers.py
tmp = zipfile.ZipFile(path, mode='w')
tmp.close()
self.assertRaises(ValueError, self.read_csv,

This comment has been minimized.

@jreback

jreback Mar 18, 2016

Contributor

here make sure the correct ValueError is raises (use assertRaisesRegex)

@jreback

jreback Mar 18, 2016

Contributor

here make sure the correct ValueError is raises (use assertRaisesRegex)

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 18, 2016

Contributor

@lababidi ok looks pretty good. only changes are to use assertRaisesRegexp when you assert for zip in order to make sure the correct messages are raised (as there are multiple possibilites).

pls make that and ping when green.

Contributor

jreback commented Mar 18, 2016

@lababidi ok looks pretty good. only changes are to use assertRaisesRegexp when you assert for zip in order to make sure the correct messages are raised (as there are multiple possibilites).

pls make that and ping when green.

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Mar 21, 2016

@jreback could you help me? the test only failed on the following:

--------------------------------------------------------------------------------------------------------------
#176 nose.failure.Failure.runTest: direct creation of extension dtype datetime64[ns, UTC] is not supported ATM

@jreback could you help me? the test only failed on the following:

--------------------------------------------------------------------------------------------------------------
#176 nose.failure.Failure.runTest: direct creation of extension dtype datetime64[ns, UTC] is not supported ATM
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 21, 2016

Contributor

not sure where you are seeing this. your 2.7 tests is failing because of a linting issue (line too long)

git diff master | flake8 --diff

Contributor

jreback commented Mar 21, 2016

not sure where you are seeing this. your 2.7 tests is failing because of a linting issue (line too long)

git diff master | flake8 --diff

@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Mar 21, 2016

@jreback it's in the Travis results

@jreback it's in the Travis results

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 21, 2016

Contributor

I restarted that job, though you need to repush anyhow (lint error). Never saw that one before; I think its a crash in something else, so let's see if it recurs.

Contributor

jreback commented Mar 21, 2016

I restarted that job, though you need to repush anyhow (lint error). Never saw that one before; I think its a crash in something else, so let's see if it recurs.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 22, 2016

Contributor

@lababidi ok this passed, except for the lint check. pls fix and repush, ping when green.

Contributor

jreback commented Mar 22, 2016

@lababidi ok this passed, except for the lint check. pls fix and repush, ping when green.

Mahmoud Lababidi
Add ZIP file decompression and TestCompression.
Fix PEP8 issues. Change Compression to be a Mixin. Add Compression Mixin correctly with current Tests. Add .format, Rename Compression, with-block, empty zip, bad-zip
@lababidi

This comment has been minimized.

Show comment
Hide comment
@lababidi

lababidi Mar 22, 2016

Thank you @jreback for your help and patience with this. I'll help out on the other issues soon.

Thank you @jreback for your help and patience with this. I'll help out on the other issues soon.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 22, 2016

Contributor

@lababidi no thank you!

Contributor

jreback commented Mar 22, 2016

@lababidi no thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment