New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_msgpack returns garbage for non-existing files in python2 #15296

Closed
languitar opened this Issue Feb 3, 2017 · 4 comments

Comments

Projects
None yet
4 participants
@languitar

languitar commented Feb 3, 2017

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: u'0.19.2'

In [3]: pd.read_msgpack('/tmp/bla.txt')
Out[3]: [47, 116, 109, 112, 47, 98, 108, 97, 46, 116, 120, 116]

That file does not exist.

The same code with the same version on python 3 correctly raises an exception.

Problem description

Reading a non existing file returns a list of integers instead of failing if pandas is used in python 2. This makes detecting error quite hard.

Expected Output

An exception.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.6-1-ARCH
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.utf8
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 34.1.0
Cython: None
numpy: 1.11.3
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: None
pandas_datareader: None
@jreback

This comment has been minimized.

Contributor

jreback commented Feb 3, 2017

I recall this being fixed a little while back. Are you sure you are running with 0.19.2?

In [1]: pd.read_msgpack('/tmp/bla.txt')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-7a3c5705a157> in <module>()
----> 1 pd.read_msgpack('/tmp/bla.txt')

/Users/jreback/miniconda3/envs/pandas/lib/python3.5/site-packages/pandas/io/packers.py in read_msgpack(path_or_buf, encoding, iterator, **kwargs)
    216         return read(path_or_buf)
    217 
--> 218     raise ValueError('path_or_buf needs to be a string file path or file-like')
    219 
    220 dtype_dict = {21: np.dtype('M8[ns]'),

ValueError: path_or_buf needs to be a string file path or file-like

In [2]: pd.__version__
Out[2]: '0.19.2'
@languitar

This comment has been minimized.

languitar commented Feb 3, 2017

Yes, definitely 0.19.2. It seems to be fixed only in python 3.

@jreback

This comment has been minimized.

Contributor

jreback commented Feb 3, 2017

ahh, that is possible (must not be well tested)

want to add a test and fix?

I think we are doing a better job of this now-a-days in things like read_json IIRC. IOW if its an actual string, you ask the file system if its valid or not, then raise a better error message (e.g. IOError: file not found)

yeah I see this doesn't use pandas.io.common. to actually open the file, instead it passes it to the msgpack routines. This is where checking for things e.g. in read_csv happen.

@jreback jreback added this to the 0.20.0 milestone Feb 3, 2017

@chris-b1

This comment has been minimized.

Contributor

chris-b1 commented Feb 3, 2017

This is a dupe of #12225 (closed that one) - issue is that in python 2 the string is interpreted as bytes.

@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017

chrisburr added a commit to chrisburr/pandas that referenced this issue May 27, 2017

chrisburr added a commit to chrisburr/pandas that referenced this issue May 27, 2017

chrisburr added a commit to chrisburr/pandas that referenced this issue May 27, 2017

chrisburr added a commit to chrisburr/pandas that referenced this issue May 27, 2017

chrisburr added a commit to chrisburr/pandas that referenced this issue May 28, 2017

chrisburr added a commit to chrisburr/pandas that referenced this issue May 28, 2017

chrisburr added a commit to chrisburr/pandas that referenced this issue Aug 20, 2017

chrisburr added a commit to chrisburr/pandas that referenced this issue Oct 28, 2017

@jreback jreback modified the milestones: Next Major Release, 0.22.0 Oct 28, 2017

TomAugspurger added a commit that referenced this issue Oct 30, 2017

BUG: read_msgpack raise an error when passed an non existent path in …
…Python 2 (#16523)

* TST: Add tests for trying to read non-existent files #15296

* BUG: Fix passing non-existant file to read_msgpack #15296

* TST: Fix io.test_common.test_read_non_existant for external modules

* CLN: Import FileNotFoundError in tests/io/test_common.py

peterpanmj added a commit to peterpanmj/pandas that referenced this issue Oct 31, 2017

BUG: read_msgpack raise an error when passed an non existent path in …
…Python 2 (pandas-dev#16523)

* TST: Add tests for trying to read non-existent files pandas-dev#15296

* BUG: Fix passing non-existant file to read_msgpack pandas-dev#15296

* TST: Fix io.test_common.test_read_non_existant for external modules

* CLN: Import FileNotFoundError in tests/io/test_common.py

No-Stream added a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

BUG: read_msgpack raise an error when passed an non existent path in …
…Python 2 (pandas-dev#16523)

* TST: Add tests for trying to read non-existent files pandas-dev#15296

* BUG: Fix passing non-existant file to read_msgpack pandas-dev#15296

* TST: Fix io.test_common.test_read_non_existant for external modules

* CLN: Import FileNotFoundError in tests/io/test_common.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment