Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading error for pd.read_msgpack #27160

Closed
smspillaz opened this issue Jul 1, 2019 · 5 comments

Comments

@smspillaz
Copy link

commented Jul 1, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
pd.read_msgpack('this/path/does/not/exist')

Problem description

Such an error is misleading because it suggests that there is a problem with the datatype being passed, not that the path does not exist. The error raised is:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".local/anaconda3/lib/python3.7/site-packages/pandas/io/packers.py", line 226, in read_msgpack
    raise ValueError('path_or_buf needs to be a string file path or file-like')
ValueError: path_or_buf needs to be a string file path or file-like

Expected Output

Raise an error indicating that the path was not found.

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datarea

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jul 1, 2019

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 1, 2019

yeah this is true of several routines (e.g. read_json), there is an issue about this somewhere. but for msgpack since we are deprecated, this is out of scope (would take a reasonable patch though).

@jreback jreback closed this Jul 1, 2019

@jreback jreback modified the milestones: No action, 0.25.0 Jul 1, 2019

@jreback jreback reopened this Jul 1, 2019

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 1, 2019

as PR is submitted :->

@simonjayhawkins

This comment has been minimized.

Copy link
Member

commented Jul 1, 2019

I think the first argument of read_msgpack can also be data.

I think that assuming a string passed to pd.read_msgpack is a filepath and then raising if not found is OK?

the data as bytes works as intended.

the docs for pandas.DataFrame.to_msgpack are misleading https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_msgpack.html?highlight=to_msgpack#pandas-dataframe-to-msgpack suggest that a string is returned when bytes are returned...

path : string File path, buffer-like, or None
if None, return generated string
>>> import numpy as np
>>> import pandas as pd
>>> from pandas import DataFrame
>>> df = DataFrame(np.random.randn(10, 2))
>>>
>>>
>>> df.to_msgpack(None)
b'\x84\xa3typ\xadblock_manager\xa5klass\xa9DataFrame\xa4axes\x92\x86\xa3typ\xabrange_index\xa5klass\xaaRangeIndex\xa4name\xc0\xa5start\x00\xa4s
top\x02\xa4step\x01\x86\xa3typ\xabrange_index\xa5klass\xaaRangeIndex\xa4name\xc0\xa5start\x00\xa4stop\n\xa4step\x01\xa6blocks\x91\x86\xa4locs\x
86\xa3typ\xa7ndarray\xa5shape\x91\x02\xa4ndim\x01\xa5dtype\xa5int64\xa4data\xd8\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00
\x00\xa8compress\xc0\xa6values\xc7\xa0\x00A\x10\x94Z\x0f|\xd0?F]>R\xc7\xfc\xf5\xbf\xa4\xeb\xe2:\x07X\xc5\xbf&\x1bAje\t\xbb?\x98w9\x17:"\xe1?#\x
e4\xc9\xda\x86\xdf\xaf\xbf\xec\xe63K2\x03\xee\xbf\xad0%v\x11$\xda\xbf\xa1\x02@\xff\xb7\xc8\xff?\xb0G\x11\x02\x80\x13\xe1?)\xf8l\xcb~/\xd2?\xb2\
x17I\xeb\x91k\x03@\xbf\xfaj\xb2\x89\x14\xc2\xbf\xbd5\xba\xb3j\x1c\xed?u\xe504\x17\xaf\xd0\xbf\xc7\xa5\xc3\xf3\x12\xf1\xf4?\xe6\xf0\x05\xf2\xef\
xd6\x05@\xec\xeb\xd1\x80w}\xf0\xbfx\x94\x82\x10"U\xeb?.\xbdZI\x89X\xea?\xa5shape\x92\x02\n\xa5dtype\xa7float64\xa5klass\xaaFloatBlock\xa8compre
ss\xc0'
>>>
>>> pd.read_msgpack(df.to_msgpack(None))
sys:1: FutureWarning: The read_msgpack is deprecated and will be removed in a future version.
It is recommended to use pyarrow for on-the-wire transmission of pandas objects.
          0         1
0  0.257572  0.284149
1 -1.374214  2.427524
2 -0.166749 -0.141252
3  0.105612  0.909719
4  0.535428 -0.260687
5 -0.062252  1.308856
6 -0.937890  2.729950
7 -0.408451 -1.030632
8  1.986504  0.854142
9  0.533630  0.823308
>>>

@jreback jreback modified the milestones: 0.25.0, Contributions Welcome Jul 2, 2019

U09Kane added a commit to U09Kane/pandas that referenced this issue Jul 3, 2019

@U09Kane U09Kane referenced this issue Jul 3, 2019

Merged

Misleading error for pd.read_msgpack #27201

4 of 4 tasks complete

@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Jul 6, 2019

@happyshows

This comment has been minimized.

Copy link

commented Jul 19, 2019

Hi,

I'm confused by the deprecation msg. Compare to read_msgpack, which function in pyarrow is the replacement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.