New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: allow categoricals in msgpack #12573

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
3 participants
@pwaller
Contributor

pwaller commented Mar 9, 2016

Supercedes #12191.

I've made a best-effort to rebase this against master.

It seems to work here.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 9, 2016

Contributor
  • add a note in whatsnew/0.18.1
Contributor

jreback commented Mar 9, 2016

  • add a note in whatsnew/0.18.1
@chris-b1

View changes

Show outdated Hide outdated pandas/io/packers.py Outdated
@pwaller

This comment has been minimized.

Show comment
Hide comment
@pwaller

pwaller Mar 9, 2016

Contributor

add a note in whatsnew/0.18.1

Done. Please let me know if it is to taste.

Contributor

pwaller commented Mar 9, 2016

add a note in whatsnew/0.18.1

Done. Please let me know if it is to taste.

@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.18.1.txt Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/packers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_packers.py Outdated
@pwaller

View changes

Show outdated Hide outdated pandas/io/tests/test_packers.py Outdated
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 16, 2016

Contributor

ok #12574 has been merged. pls rebase/update.

Contributor

jreback commented Mar 16, 2016

ok #12574 has been merged. pls rebase/update.

@pwaller

This comment has been minimized.

Show comment
Hide comment
@pwaller

pwaller Mar 16, 2016

Contributor

Rebased.

Contributor

pwaller commented Mar 16, 2016

Rebased.

@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.18.1.txt Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/packers.py Outdated
@@ -247,6 +258,15 @@ def unconvert(values, dtype, compress=None):
if as_is_ext:
values = values.data
if is_categorical_dtype(dtype):

This comment has been minimized.

@jreback

jreback Mar 16, 2016

Contributor

same here, should be in decode

@jreback

jreback Mar 16, 2016

Contributor

same here, should be in decode

This comment has been minimized.

@pwaller

pwaller Mar 16, 2016

Contributor

Done.

@pwaller

pwaller Mar 16, 2016

Contributor

Done.

@jreback

View changes

Show outdated Hide outdated pandas/io/packers.py Outdated
@pwaller

This comment has been minimized.

Show comment
Hide comment
@pwaller

pwaller Mar 16, 2016

Contributor

Wow, I think I managed to clear all of your comments. Everything suddenly made sense in the end. Please let me know if I missed something and if I should squash or reword my commits.

Contributor

pwaller commented Mar 16, 2016

Wow, I think I managed to clear all of your comments. Everything suddenly made sense in the end. Please let me know if I missed something and if I should squash or reword my commits.

@pwaller

This comment has been minimized.

Show comment
Hide comment
@pwaller

pwaller Mar 16, 2016

Contributor

I broke the NDFrame tests. Investigating.

Contributor

pwaller commented Mar 16, 2016

I broke the NDFrame tests. Investigating.

@jreback

View changes

Show outdated Hide outdated pandas/io/packers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/packers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/tests/test_packers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/packers.py Outdated
'G': [Timestamp('20130603', tz='CET')] * 5
'G': [Timestamp('20130603', tz='CET')] * 5,
'H': Categorical(['a', 'b', 'c', 'd', 'e']),
'I': Categorical(['a', 'b', 'c', 'd', 'e'], ordered=True),

This comment has been minimized.

@pwaller

pwaller Mar 16, 2016

Contributor

I'm in a strange position here. I added this test - it passes, but I didn't add the relevant code in convert/unconvert to pass the ordered parameter through. What gives?

@pwaller

pwaller Mar 16, 2016

Contributor

I'm in a strange position here. I added this test - it passes, but I didn't add the relevant code in convert/unconvert to pass the ordered parameter through. What gives?

This comment has been minimized.

@jreback

jreback Mar 16, 2016

Contributor

that check is not used at all! I was just going to tell you to take it out. a Categorical is fully serialized/deserialized via encode/decode the dtype is NEVER category. except when its a series but that is already handled.

@jreback

jreback Mar 16, 2016

Contributor

that check is not used at all! I was just going to tell you to take it out. a Categorical is fully serialized/deserialized via encode/decode the dtype is NEVER category. except when its a series but that is already handled.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 12, 2016

Contributor

@pwaller can you rebase/update

Contributor

jreback commented Apr 12, 2016

@pwaller can you rebase/update

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 17, 2016

Contributor

this looks pretty good code wise. can you squash everything.

Contributor

jreback commented Apr 17, 2016

this looks pretty good code wise. can you squash everything.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 17, 2016

Contributor
Contributor

jreback commented Apr 17, 2016

@pwaller

This comment has been minimized.

Show comment
Hide comment
@pwaller

pwaller Apr 17, 2016

Contributor

Squashed.

Contributor

pwaller commented Apr 17, 2016

Squashed.

@jreback

View changes

Show outdated Hide outdated pandas/io/packers.py Outdated
@@ -298,6 +304,9 @@ def unconvert(values, dtype, compress=None):
if as_is_ext:
values = values.data
if is_categorical_dtype(dtype):
return values
if dtype == np.object_:

This comment has been minimized.

@jreback

jreback Apr 17, 2016

Contributor

same here

@jreback

jreback Apr 17, 2016

Contributor

same here

This comment has been minimized.

@pwaller

pwaller Apr 24, 2016

Contributor

Done

@pwaller

pwaller Apr 24, 2016

Contributor

Done

@jreback

View changes

Show outdated Hide outdated pandas/io/packers.py Outdated
return from_codes(codes=obj[u'codes'],
categories=obj[u'categories'],
ordered=obj[u'ordered'],
name=obj[u'name'])

This comment has been minimized.

@jreback

jreback Apr 17, 2016

Contributor

remove name

@jreback

jreback Apr 17, 2016

Contributor

remove name

This comment has been minimized.

@pwaller

pwaller Apr 24, 2016

Contributor

Done

@pwaller

pwaller Apr 24, 2016

Contributor

Done

@pwaller

This comment has been minimized.

Show comment
Hide comment
@pwaller

pwaller Apr 24, 2016

Contributor

Rebased and comments addressed.

Contributor

pwaller commented Apr 24, 2016

Rebased and comments addressed.

@pwaller

This comment has been minimized.

Show comment
Hide comment
@pwaller

pwaller Apr 24, 2016

Contributor

I'm getting a test failure I'm not sure how to address and I don't have much time to look into it:

======================================================================
ERROR: test_basic_frame (pandas.io.tests.test_packers.TestNDFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/io/tests/test_packers.py", line 410, in test_basic_frame
    i_rec = self.encode_decode(i)
  File "/home/travis/build/pydata/pandas/pandas/io/tests/test_packers.py", line 75, in encode_decode
    return read_msgpack(p, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/io/packers.py", line 203, in read_msgpack
    return read(fh)
  File "/home/travis/build/pydata/pandas/pandas/io/packers.py", line 188, in read
    l = list(unpack(fh, encoding=encoding, **kwargs))
  File "_unpacker.pyx", line 459, in pandas.msgpack._unpacker.Unpacker.__next__ (pandas/msgpack/_unpacker.cpp:4709)
  File "_unpacker.pyx", line 390, in pandas.msgpack._unpacker.Unpacker._unpack (pandas/msgpack/_unpacker.cpp:3843)
  File "/home/travis/build/pydata/pandas/pandas/io/packers.py", line 639, in decode
    blocks = [create_block(b) for b in obj[u'blocks']]
  File "/home/travis/build/pydata/pandas/pandas/io/packers.py", line 637, in create_block
    dtype=b[u'dtype'])
  File "/home/travis/build/pydata/pandas/pandas/core/internals.py", line 2518, in make_block
    return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
  File "/home/travis/build/pydata/pandas/pandas/core/internals.py", line 1903, in __init__
    placement=placement, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/core/internals.py", line 1314, in __init__
    raise TypeError("values must be {0}".format(self._holder.__name__))
TypeError: values must be Categorical
Contributor

pwaller commented Apr 24, 2016

I'm getting a test failure I'm not sure how to address and I don't have much time to look into it:

======================================================================
ERROR: test_basic_frame (pandas.io.tests.test_packers.TestNDFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/io/tests/test_packers.py", line 410, in test_basic_frame
    i_rec = self.encode_decode(i)
  File "/home/travis/build/pydata/pandas/pandas/io/tests/test_packers.py", line 75, in encode_decode
    return read_msgpack(p, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/io/packers.py", line 203, in read_msgpack
    return read(fh)
  File "/home/travis/build/pydata/pandas/pandas/io/packers.py", line 188, in read
    l = list(unpack(fh, encoding=encoding, **kwargs))
  File "_unpacker.pyx", line 459, in pandas.msgpack._unpacker.Unpacker.__next__ (pandas/msgpack/_unpacker.cpp:4709)
  File "_unpacker.pyx", line 390, in pandas.msgpack._unpacker.Unpacker._unpack (pandas/msgpack/_unpacker.cpp:3843)
  File "/home/travis/build/pydata/pandas/pandas/io/packers.py", line 639, in decode
    blocks = [create_block(b) for b in obj[u'blocks']]
  File "/home/travis/build/pydata/pandas/pandas/io/packers.py", line 637, in create_block
    dtype=b[u'dtype'])
  File "/home/travis/build/pydata/pandas/pandas/core/internals.py", line 2518, in make_block
    return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
  File "/home/travis/build/pydata/pandas/pandas/core/internals.py", line 1903, in __init__
    placement=placement, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/core/internals.py", line 1314, in __init__
    raise TypeError("values must be {0}".format(self._holder.__name__))
TypeError: values must be Categorical
@jreback

View changes

Show outdated Hide outdated pandas/io/packers.py Outdated
@jreback

View changes

Show outdated Hide outdated pandas/io/packers.py Outdated
@pwaller

This comment has been minimized.

Show comment
Hide comment
@pwaller

pwaller Apr 25, 2016

Contributor

Updated again. Thanks for your patience.

Contributor

pwaller commented Apr 25, 2016

Updated again. Thanks for your patience.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 25, 2016

Contributor

To fix that error

diff --git a/pandas/io/packers.py b/pandas/io/packers.py
index ba38c8e..f009793 100644
--- a/pandas/io/packers.py
+++ b/pandas/io/packers.py
@@ -266,7 +266,7 @@ def convert(values):

     # convert object
     if is_categorical_dtype(values):
-        return v
+        return values

     if dtype == np.object_:
         return v.tolist()
Contributor

jreback commented Apr 25, 2016

To fix that error

diff --git a/pandas/io/packers.py b/pandas/io/packers.py
index ba38c8e..f009793 100644
--- a/pandas/io/packers.py
+++ b/pandas/io/packers.py
@@ -266,7 +266,7 @@ def convert(values):

     # convert object
     if is_categorical_dtype(values):
-        return v
+        return values

     if dtype == np.object_:
         return v.tolist()
ENH: allow categoricals in msgpack
DOC: support for categoricals in read_msgpack

Add TestCategorical test cases

Add Catecorical ordered=True ndframe test
@pwaller

This comment has been minimized.

Show comment
Hide comment
@pwaller

pwaller Apr 25, 2016

Contributor

Forced again.

Contributor

pwaller commented Apr 25, 2016

Forced again.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 25, 2016

Contributor

ok looks good. ping when green.

Contributor

jreback commented Apr 25, 2016

ok looks good. ping when green.

@jreback jreback closed this in 2fd0a06 Apr 25, 2016

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 25, 2016

Contributor

thanks!

Contributor

jreback commented Apr 25, 2016

thanks!

@jankatins jankatins referenced this pull request Apr 25, 2016

Closed

ENH: Categorical serialized #7621

3 of 4 tasks complete

nps added a commit to nps/pandas that referenced this pull request May 17, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment