Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
pandas read_csv no longer supports file-like objects from tarfile (pandas 0.20.1) #16530
Comments
|
Works as expected on windows with 0.20.1.
Which Python? You should include the |
gjanvier
commented
May 29, 2017
|
gjanvier
closed this
May 29, 2017
|
Did you mean to close? If it isn't working for you, please reopen. Could be combination of Python version/OS. |
gjanvier
reopened this
May 29, 2017
gjanvier
commented
May 29, 2017
|
oups sorry, reopen. |
|
Working for me as well with python3 and master. @gjanvier can you make a fully reproducible example (including writing the csv and adding it to the tar) |
gjanvier
commented
May 29, 2017
|
Sure, here is a test case. import tarfile
import pandas as pd
pd.show_versions()
data = pd.DataFrame(
data=[[1,2], [3,4]],
columns=['col1', 'col2']
)
print "data"
print data
print ""
data.to_csv('mydata.csv', sep="\t", index=False)
tar = tarfile.open('test.tar', 'w')
tar.add('mydata.csv')
tar.close()
tar = tarfile.open('test.tar', 'r')
myfile = tar.extractfile('mydata.csv')
data2 = pd.read_csv(myfile, sep=r'\s+')
print "data2"
print data2
print ""FYI, I run my tests in a docker container... Result with pandas 0.19.2 root@0a35b054b4da:xxxx# pip install pandas==0.19.2
Collecting pandas==0.19.2
Downloading pandas-0.19.2-cp27-cp27mu-manylinux1_x86_64.whl (17.2MB)
100% |################################| 17.2MB 25kB/s
Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /usr/local/lib/python2.7/dist-packages (from pandas==0.19.2)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /usr/lib/python2.7/dist-packages (from python-dateutil->pandas==0.19.2)
Installing collected packages: pandas
Found existing installation: pandas 0.20.1
Uninstalling pandas-0.20.1:
Successfully uninstalled pandas-0.20.1
Successfully installed pandas-0.19.2
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
root@0a35b054b4da:xxxx# python test_tar_pd.py
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-78-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.2
nose: None
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
statsmodels: 0.8.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.10.3
apiclient: 1.6.2
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
data
col1 col2
0 1 2
1 3 4
data2
col1 col2
0 1 2
1 3 4Result with pandas 0.20.1 root@0a35b054b4da:xxx# pip install pandas==0.20.1
Collecting pandas==0.20.1
Downloading pandas-0.20.1-cp27-cp27mu-manylinux1_x86_64.whl (22.3MB)
100% |################################| 22.3MB 19kB/s
Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /usr/local/lib/python2.7/dist-packages (from pandas==0.20.1)
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in /usr/lib/python2.7/dist-packages (from python-dateutil->pandas==0.20.1)
Installing collected packages: pandas
Found existing installation: pandas 0.19.2
Uninstalling pandas-0.19.2:
Successfully uninstalled pandas-0.19.2
Successfully installed pandas-0.20.1
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
root@0a35b054b4da:xxx# python test_tar_pd.py
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-78-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: None
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
data
col1 col2
0 1 2
1 3 4
Traceback (most recent call last):
File "test_tar_pd.py", line 23, in <module>
data2 = pd.read_csv(myfile, sep=r'\s+')
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 392, in _read
filepath_or_buffer, encoding, compression)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/common.py", line 210, in get_filepath_or_buffer
raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <class 'tarfile.ExFileObject'> |
|
IIRC @gfyoung fixed this by patching |
|
hmm, so maybe this IS an issue on py2.7? maybe |
jreback
added Compat IO CSV
labels
May 29, 2017
|
This is indeed a compatibility issue. Turns out I guess we just need to check for the |
|
yeah maybe relax in the is_file_like only |
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 29, 2017
|
|
gfyoung |
957c67e
|
gfyoung
referenced
this issue
May 29, 2017
Merged
COMPAT: Consider Python 2.x tarfiles file-like #16533
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 29, 2017
|
|
gfyoung |
ed929a7
|
|
@jreback : So the C engine doesn't require that the file-like have a |
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 29, 2017
|
|
gfyoung |
6113567
|
|
Also, I should add reading |
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 29, 2017
|
|
gfyoung |
2e829f2
|
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 29, 2017
|
|
gfyoung |
7feeaf7
|
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 29, 2017
|
|
gfyoung |
e05cf2a
|
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 29, 2017
|
|
gfyoung |
f17a20c
|
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 29, 2017
|
|
gfyoung |
a9c9365
|
jreback
added this to the
0.20.2
milestone
May 30, 2017
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 30, 2017
|
|
gfyoung |
6dd7837
|
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 30, 2017
|
|
gfyoung |
5a6bad4
|
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 30, 2017
|
|
gfyoung |
0df7b2c
|
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 30, 2017
|
|
gfyoung |
e236ba5
|
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 31, 2017
|
|
gfyoung |
8611b79
|
|
@gfyoung - I hit the same issue when trying to pass Luigi's My understanding is that that while this is not required to be true for a file-like:
What is required is that it produce an iterator with a next method.
So if pandas wants to use |
|
But I'm not sure this is explicitly defined anywhere, so much as it looks like an in-practice kind of thing. |
|
@jtratner : For an object to be file-like, I am proposing that the object just have an Interesting idea. Worth pursuing once this issue gets resolved. |
|
@gfyoung - cool I agree with your definition :) - just was reinforcing that the definition of file-like as "has iter" is upheld many places but not having the |
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 31, 2017
|
|
gfyoung |
0526fd6
|
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 31, 2017
|
|
gfyoung |
7a5fcd3
|
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
May 31, 2017
|
|
gfyoung |
08efe9c
|
gfyoung
added a commit
to gfyoung/pandas
that referenced
this issue
Jun 1, 2017
|
|
gfyoung |
7c59fc9
|
gjanvier commentedMay 29, 2017
Code Sample, a copy-pastable example if possible
This code generates this error:
Problem description
This code works with pandas 0.19.2 but fails with 0.20.1.
According to pandas doc for read_csv:
I guess the new validations are too restrictive ?