Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Big-endian buffer not supported on little-endian compiler in DataFrame repr #17396

Closed
DRMacIver opened this issue Aug 31, 2017 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request

Comments

@DRMacIver
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd
from collections import OrderedDict

data = OrderedDict((i, pd.Series([0], dtype=bool)) for i in range(20))
data[20] = pd.Series([0], dtype='>i2')

x = pd.DataFrame(data)

repr(x)

Problem description

When run this fails with:

Traceback (most recent call last):
  File "blah.py", line 9, in <module>
    repr(x)
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/base.py", line 72, in __repr__
    return str(self)
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/base.py", line 51, in __str__
    return self.__unicode__()
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/frame.py", line 586, in __unicode__
    line_width=width, show_dimensions=show_dimensions)
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/frame.py", line 1528, in to_string
    show_dimensions=show_dimensions)
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/io/formats/format.py", line 407, in __init__
    self._chk_truncate()
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/io/formats/format.py", line 468, in _chk_truncate
    frame.iloc[:, -col_num:]), axis=1)
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/indexing.py", line 1325, in __getitem__
    return self._getitem_tuple(key)
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/indexing.py", line 1678, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=axis)
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/indexing.py", line 1724, in _getitem_axis
    return self._get_slice_axis(key, axis=axis)
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/indexing.py", line 1697, in _get_slice_axis
    return self._slice(slice_obj, axis=axis, kind='iloc')
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/indexing.py", line 142, in _slice
    return self.obj._slice(obj, axis=axis, kind=kind)
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/generic.py", line 1749, in _slice
    result = self._constructor(self._data.get_slice(slobj, axis=axis))
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/internals.py", line 3389, in get_slice
    new_blocks = self._slice_take_blocks_ax0(slobj)
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/internals.py", line 3973, in _slice_take_blocks_ax0
    fill_tuple=None))
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/internals.py", line 1042, in take_nd
    allow_fill=False)
  File "/home/david/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/algorithms.py", line 1471, in take_nd
    func(arr, indexer, out, fill_value)
  File "pandas/_libs/algos_take_helper.pxi", line 1769, in pandas._libs.algos.take_2d_axis1_int16_int16 (pandas/_libs/algos.c:87069)
ValueError: Big-endian buffer not supported on little-endian compiler

The number (20) of initial columns seems to be significant: It doesn't fail if it's 19. The order does not appear to be significant.

The problem appears to be specific to repr. I can create a data frame with these column dtypes and it appears to work normally.

Expected Output

I would expect this to give no output and not crash. I would also find it reasonable to reject the DataFrame at point of construction if this is in some way an unsupported use case, but crashing only at the point of calculating the repr is very surprising.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-43-Microsoft
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.20.3
pytest: 3.1.2
pip: 9.0.1
setuptools: 36.0.1
Cython: None
numpy: 1.13.1
scipy: 0.19.0
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

@chris-b1
Copy link
Contributor

docs here if you haven't seen them, we don't support a non-native byteorder. The reason it comes up with more columns is the truncated representation requires an indexing operation, which is what is failing.
https://pandas.pydata.org/pandas-docs/stable/gotchas.html#byte-ordering-issues

I agree it would be much better to check on construction, open issue about that #4737, PR welcome!

@chris-b1 chris-b1 added the Duplicate Report Duplicate issue or pull request label Aug 31, 2017
@chris-b1 chris-b1 added this to the No action milestone Aug 31, 2017
astrogreen pushed a commit to astrogreen/fidia that referenced this issue Mar 19, 2018
…in native byte order.

This may be present elsewhere, and should probably be searched for. This probably should be done when column data types are standardised (e.g. are we always going to return Pandas Series for these kinds of things?)

Interestingly, this isn't checked at construction, so the bug only appears when trying to do certain actions with the DataFrame. See pandas-dev/pandas#17396

branch: fidia-devel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants