pandas read_csv no longer supports file-like objects from tarfile (pandas 0.20.1) #16530

Closed
gjanvier opened this Issue May 29, 2017 · 17 comments

Comments

Projects
None yet
6 participants
@gjanvier

Code Sample, a copy-pastable example if possible

import pandas as pd
import tarfile

tar = tarfile.open(name="xxx.tar.bz2", mode='r')
myfile = tar.extractfile('yyy.csv') # file-like object with a read() method
data = pd.read_csv(myfile, sep=r'\s+')

This code generates this error:

  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 392, in _read
    filepath_or_buffer, encoding, compression)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/common.py", line 210, in get_filepath_or_buffer
    raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <class 'tarfile.ExFileObject'>

Problem description

This code works with pandas 0.19.2 but fails with 0.20.1.

According to pandas doc for read_csv:

filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)

I guess the new validations are too restrictive ?

@bashtage

This comment has been minimized.

Show comment
Hide comment
@bashtage

bashtage May 29, 2017

Contributor

Works as expected on windows with 0.20.1.

pd.read_csv(myfile, sep='\s+')
Out[25]: 
  Col1,COl2
0       a,1
1       b,2
2       c,3
3       d,4

Which Python? You should include the show_versions() output in the details area of the template.

Contributor

bashtage commented May 29, 2017

Works as expected on windows with 0.20.1.

pd.read_csv(myfile, sep='\s+')
Out[25]: 
  Col1,COl2
0       a,1
1       b,2
2       c,3
3       d,4

Which Python? You should include the show_versions() output in the details area of the template.

@gjanvier

This comment has been minimized.

Show comment
Hide comment
@gjanvier

gjanvier May 29, 2017

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-78-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: None
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-78-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: None
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

@gjanvier gjanvier closed this May 29, 2017

@bashtage

This comment has been minimized.

Show comment
Hide comment
@bashtage

bashtage May 29, 2017

Contributor

Did you mean to close? If it isn't working for you, please reopen. Could be combination of Python version/OS.

Contributor

bashtage commented May 29, 2017

Did you mean to close? If it isn't working for you, please reopen. Could be combination of Python version/OS.

@gjanvier gjanvier reopened this May 29, 2017

@gjanvier

This comment has been minimized.

Show comment
Hide comment
@gjanvier

gjanvier May 29, 2017

oups sorry, reopen.

oups sorry, reopen.

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger May 29, 2017

Contributor

Working for me as well with python3 and master.

@gjanvier can you make a fully reproducible example (including writing the csv and adding it to the tar)

Contributor

TomAugspurger commented May 29, 2017

Working for me as well with python3 and master.

@gjanvier can you make a fully reproducible example (including writing the csv and adding it to the tar)

@gjanvier

This comment has been minimized.

Show comment
Hide comment
@gjanvier

gjanvier May 29, 2017

Sure, here is a test case.

import tarfile
import pandas as pd

pd.show_versions()

data = pd.DataFrame(
    data=[[1,2], [3,4]],
    columns=['col1', 'col2']
)

print "data"
print data
print ""

data.to_csv('mydata.csv', sep="\t", index=False)

tar = tarfile.open('test.tar', 'w')
tar.add('mydata.csv')
tar.close()

tar = tarfile.open('test.tar', 'r')
myfile = tar.extractfile('mydata.csv')
data2 = pd.read_csv(myfile, sep=r'\s+')

print "data2"
print data2
print ""

FYI, I run my tests in a docker container...

Result with pandas 0.19.2

root@0a35b054b4da:xxxx# pip install pandas==0.19.2
Collecting pandas==0.19.2
  Downloading pandas-0.19.2-cp27-cp27mu-manylinux1_x86_64.whl (17.2MB)
    100% |################################| 17.2MB 25kB/s 
Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /usr/lib/python2.7/dist-packages (from python-dateutil->pandas==0.19.2)
Installing collected packages: pandas
  Found existing installation: pandas 0.20.1
    Uninstalling pandas-0.20.1:
      Successfully uninstalled pandas-0.20.1
Successfully installed pandas-0.19.2
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
root@0a35b054b4da:xxxx# python test_tar_pd.py 

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-78-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2
nose: None
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
statsmodels: 0.8.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.10.3
apiclient: 1.6.2
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
data
   col1  col2
0     1     2
1     3     4

data2
   col1  col2
0     1     2
1     3     4

Result with pandas 0.20.1

root@0a35b054b4da:xxx# pip install pandas==0.20.1
Collecting pandas==0.20.1
  Downloading pandas-0.20.1-cp27-cp27mu-manylinux1_x86_64.whl (22.3MB)
    100% |################################| 22.3MB 19kB/s 
Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /usr/lib/python2.7/dist-packages (from python-dateutil->pandas==0.20.1)
Installing collected packages: pandas
  Found existing installation: pandas 0.19.2
    Uninstalling pandas-0.19.2:
      Successfully uninstalled pandas-0.19.2
Successfully installed pandas-0.20.1
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
root@0a35b054b4da:xxx# python test_tar_pd.py 

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-78-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: None
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
data
   col1  col2
0     1     2
1     3     4

Traceback (most recent call last):
  File "test_tar_pd.py", line 23, in <module>
    data2 = pd.read_csv(myfile, sep=r'\s+')
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 392, in _read
    filepath_or_buffer, encoding, compression)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/common.py", line 210, in get_filepath_or_buffer
    raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <class 'tarfile.ExFileObject'>

Sure, here is a test case.

import tarfile
import pandas as pd

pd.show_versions()

data = pd.DataFrame(
    data=[[1,2], [3,4]],
    columns=['col1', 'col2']
)

print "data"
print data
print ""

data.to_csv('mydata.csv', sep="\t", index=False)

tar = tarfile.open('test.tar', 'w')
tar.add('mydata.csv')
tar.close()

tar = tarfile.open('test.tar', 'r')
myfile = tar.extractfile('mydata.csv')
data2 = pd.read_csv(myfile, sep=r'\s+')

print "data2"
print data2
print ""

FYI, I run my tests in a docker container...

Result with pandas 0.19.2

root@0a35b054b4da:xxxx# pip install pandas==0.19.2
Collecting pandas==0.19.2
  Downloading pandas-0.19.2-cp27-cp27mu-manylinux1_x86_64.whl (17.2MB)
    100% |################################| 17.2MB 25kB/s 
Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /usr/lib/python2.7/dist-packages (from python-dateutil->pandas==0.19.2)
Installing collected packages: pandas
  Found existing installation: pandas 0.20.1
    Uninstalling pandas-0.20.1:
      Successfully uninstalled pandas-0.20.1
Successfully installed pandas-0.19.2
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
root@0a35b054b4da:xxxx# python test_tar_pd.py 

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-78-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2
nose: None
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
statsmodels: 0.8.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.10.3
apiclient: 1.6.2
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
data
   col1  col2
0     1     2
1     3     4

data2
   col1  col2
0     1     2
1     3     4

Result with pandas 0.20.1

root@0a35b054b4da:xxx# pip install pandas==0.20.1
Collecting pandas==0.20.1
  Downloading pandas-0.20.1-cp27-cp27mu-manylinux1_x86_64.whl (22.3MB)
    100% |################################| 22.3MB 19kB/s 
Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /usr/lib/python2.7/dist-packages (from python-dateutil->pandas==0.20.1)
Installing collected packages: pandas
  Found existing installation: pandas 0.19.2
    Uninstalling pandas-0.19.2:
      Successfully uninstalled pandas-0.19.2
Successfully installed pandas-0.20.1
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
root@0a35b054b4da:xxx# python test_tar_pd.py 

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-78-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: None
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
data
   col1  col2
0     1     2
1     3     4

Traceback (most recent call last):
  File "test_tar_pd.py", line 23, in <module>
    data2 = pd.read_csv(myfile, sep=r'\s+')
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 392, in _read
    filepath_or_buffer, encoding, compression)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/common.py", line 210, in get_filepath_or_buffer
    raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <class 'tarfile.ExFileObject'>
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 29, 2017

Contributor

IIRC @gfyoung fixed this by patching is_filelike, can't find the issue ATM

Contributor

jreback commented May 29, 2017

IIRC @gfyoung fixed this by patching is_filelike, can't find the issue ATM

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung May 29, 2017

Member

@jreback : The relevant PR is #16150.

Member

gfyoung commented May 29, 2017

@jreback : The relevant PR is #16150.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 29, 2017

Contributor

hmm, so maybe this IS an issue on py2.7? maybe tarfile is not a proper iterator? (or doesn't have read?)

Contributor

jreback commented May 29, 2017

hmm, so maybe this IS an issue on py2.7? maybe tarfile is not a proper iterator? (or doesn't have read?)

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung May 29, 2017

Member

This is indeed a compatibility issue. Turns out tarfile.ExFileObject is not a proper iterator object in our eyes under the Python 2.x implementation (it has no next or __next__ attribute, but Python 3.x tarfile.ExFileObject has the __next__ attribute).

I guess we just need to check for the __iter__ attribute ONLY it seems for is_file_like?

Member

gfyoung commented May 29, 2017

This is indeed a compatibility issue. Turns out tarfile.ExFileObject is not a proper iterator object in our eyes under the Python 2.x implementation (it has no next or __next__ attribute, but Python 3.x tarfile.ExFileObject has the __next__ attribute).

I guess we just need to check for the __iter__ attribute ONLY it seems for is_file_like?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 29, 2017

Contributor

yeah maybe relax in the is_file_like only

Contributor

jreback commented May 29, 2017

yeah maybe relax in the is_file_like only

gfyoung added a commit to gfyoung/pandas that referenced this issue May 29, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 29, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.
@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung May 29, 2017

Member

@jreback : So the C engine doesn't require that the file-like have a next method, but the Python engine does (we explicitly call next(self.data)). This presents a slight dilemma then: how do we check that a file-like has next and that the engine specified is Python? If possible, I would want to switch to the C engine.

Member

gfyoung commented May 29, 2017

@jreback : So the C engine doesn't require that the file-like have a next method, but the Python engine does (we explicitly call next(self.data)). This presents a slight dilemma then: how do we check that a file-like has next and that the engine specified is Python? If possible, I would want to switch to the C engine.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 29, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.
@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung May 29, 2017

Member

Also, I should add reading tarfile objects isn't actually feasible in Python's csv library. So this is just for the C engine in Python 2.x

Member

gfyoung commented May 29, 2017

Also, I should add reading tarfile objects isn't actually feasible in Python's csv library. So this is just for the C engine in Python 2.x

gfyoung added a commit to gfyoung/pandas that referenced this issue May 29, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 29, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 29, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 29, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 29, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

@jreback jreback added this to the 0.20.2 milestone May 30, 2017

gfyoung added a commit to gfyoung/pandas that referenced this issue May 30, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 30, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 30, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 30, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 31, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.
@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner May 31, 2017

Contributor

@gfyoung - I hit the same issue when trying to pass Luigi's ReadableS3File to read_csv in pandas 0.20.1.

My understanding is that that while this is not required to be true for a file-like:

next(fp)
iter(fp)

What is required is that it produce an iterator with a next method.

it = iter(fp)
next(it)

So if pandas wants to use next(self.data) perhaps just need to call iter on it first and work from there?

Contributor

jtratner commented May 31, 2017

@gfyoung - I hit the same issue when trying to pass Luigi's ReadableS3File to read_csv in pandas 0.20.1.

My understanding is that that while this is not required to be true for a file-like:

next(fp)
iter(fp)

What is required is that it produce an iterator with a next method.

it = iter(fp)
next(it)

So if pandas wants to use next(self.data) perhaps just need to call iter on it first and work from there?

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner May 31, 2017

Contributor

But I'm not sure this is explicitly defined anywhere, so much as it looks like an in-practice kind of thing.

Contributor

jtratner commented May 31, 2017

But I'm not sure this is explicitly defined anywhere, so much as it looks like an in-practice kind of thing.

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung May 31, 2017

Member

@jtratner : For an object to be file-like, I am proposing that the object just have an __iter__ method (it need not have next or __next__). What you propose might work but would require a little more beefing up, as iter objects are not file-like by themselves. If we combine attributes from iter and the object we are wrapping, we could get a valid file-like.

Interesting idea. Worth pursuing once this issue gets resolved.

Member

gfyoung commented May 31, 2017

@jtratner : For an object to be file-like, I am proposing that the object just have an __iter__ method (it need not have next or __next__). What you propose might work but would require a little more beefing up, as iter objects are not file-like by themselves. If we combine attributes from iter and the object we are wrapping, we could get a valid file-like.

Interesting idea. Worth pursuing once this issue gets resolved.

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner May 31, 2017

Contributor

@gfyoung - cool I agree with your definition :) - just was reinforcing that the definition of file-like as "has iter" is upheld many places but not having the next() method.

Contributor

jtratner commented May 31, 2017

@gfyoung - cool I agree with your definition :) - just was reinforcing that the definition of file-like as "has iter" is upheld many places but not having the next() method.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 31, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 31, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue May 31, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.

gfyoung added a commit to gfyoung/pandas that referenced this issue Jun 1, 2017

COMPAT: Consider Python 2.x tarfiles file-like
Tarfile.ExFileObject has no "next" method in
Python 2.x, making it an invalid file-like
object in read_csv. However, they can be
read in just fine, meaning our check is too
strict for file-like. This commit relaxes
the check to just look for "__iter__".

Closes gh-16530.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment