New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_msgpack fails for Dataframes with numpy bools #18390

Closed
PhyNerd opened this Issue Nov 20, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@PhyNerd
Contributor

PhyNerd commented Nov 20, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
pd.DataFrame(pd.Series((np.bool_(1), 2, 3, 4, 5))).to_msgpack('test.msg')

Problem description

When trying to export Dataframes containing numpy bools with the DataFrame.to_msgpack() function one gets the exception:

  File "pandas/io/msgpack/_packer.pyx", line 230, in pandas.io.msgpack._packer.Packer.pack
  File "pandas/io/msgpack/_packer.pyx", line 232, in pandas.io.msgpack._packer.Packer.pack
  File "pandas/io/msgpack/_packer.pyx", line 191, in pandas.io.msgpack._packer.Packer._pack
  File "pandas/io/msgpack/_packer.pyx", line 220, in pandas.io.msgpack._packer.Packer._pack
  File "pandas/io/msgpack/_packer.pyx", line 191, in pandas.io.msgpack._packer.Packer._pack
  File "pandas/io/msgpack/_packer.pyx", line 220, in pandas.io.msgpack._packer.Packer._pack
  File "pandas/io/msgpack/_packer.pyx", line 227, in pandas.io.msgpack._packer.Packer._pack
TypeError: can't serialize True

I'm guessing this is because the isinstance check on line 136 of pandas.io.msgpack._packer.pyx returns False for numpy.bool_ .

Expected Output

I'd expect DataFrames with numpy bools in them to serialize just like ones with Python bools do.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: None.None

pandas: 0.21.0
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.1
pyarrow: 0.7.1
xarray: None
IPython: 5.3.0
sphinx: 1.6.2
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung

This comment has been minimized.

Show comment
Hide comment
@gfyoung

gfyoung Nov 20, 2017

Member

@PhyNerd : Your diagnosis looks correct to me. That should be pretty easy to patch in light of that, so a PR would be most welcome!

Member

gfyoung commented Nov 20, 2017

@PhyNerd : Your diagnosis looks correct to me. That should be pretty easy to patch in light of that, so a PR would be most welcome!

@PhyNerd PhyNerd referenced this issue Nov 20, 2017

Merged

Numpy bool msgpack bugfix #18395

4 of 4 tasks complete

@jreback jreback added this to the 0.21.1 milestone Nov 21, 2017

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 21, 2017

Contributor

you realize that storing non-integers co-mingled is completely inefficient as this is object dtype.

Contributor

jreback commented Nov 21, 2017

you realize that storing non-integers co-mingled is completely inefficient as this is object dtype.

@PhyNerd

This comment has been minimized.

Show comment
Hide comment
@PhyNerd

PhyNerd Nov 21, 2017

Contributor

Yes I do.
The Dataframes I want to export often have missing data, where fields are None in columns that are otherwise all np.bool_. In other cases users add mixed datatypes. There isn't much I can do about that.

Contributor

PhyNerd commented Nov 21, 2017

Yes I do.
The Dataframes I want to export often have missing data, where fields are None in columns that are otherwise all np.bool_. In other cases users add mixed datatypes. There isn't much I can do about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment