Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: GH10623 remove items from msgpack.encode for blocks #12129

Closed
wants to merge 1 commit into from

Conversation

kawochen
Copy link
Contributor

closes #10623

@kawochen
Copy link
Contributor Author

hmm legacy file for 0.17.1 not passing. I think it's tz-related. Will check later.

@kawochen
Copy link
Contributor Author

passing now. had to make changes for DatetimeTZBlock (it wasn't tested?) and skip files packed in P2 with 0.17 :( due to #12142

'locs': b.mgr_locs.as_array,
'values': convert(b.values),
'blocks': [{'locs': b.mgr_locs.as_array,
'values': convert_if_not_dt_index(b.values),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is way hacky. you shouldn't need to specifically do this (test for DatetimeIndex. It should just be converted normally. The entire difference here is the dispatch on the dtype, which in this case would be a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it gets converted to a numpy array now, wouldn't the whole encode/decode for DatetimeIndex be skipped? Or is there a way to construct DatetimeTZBlock from numpy arrays?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think you could trivially change core/internals.py/make_block to take a DatetimeTZDtype and correctly create it. Then you would have to make to serialize that properly (as a string is prob ok) I think (you would need to fix the reverse lookup as it only deals with numpy atm).

I think the same problem exists with categorical, prob not testing it.

@jreback jreback added Msgpack Deprecate Functionality to remove in pandas labels Feb 1, 2016
@kawochen kawochen force-pushed the DEPR-10623 branch 2 times, most recently from d1f9ed9 to 14c34c1 Compare February 10, 2016 10:48
@kawochen
Copy link
Contributor Author

updated

dtype = kwargs.pop('dtype')
if isinstance(dtype, compat.string_types):
dtype = DatetimeTZDtype.construct_from_string(dtype)
values = values.tz_localize('UTC').tz_convert(dtype.tz)

if not isinstance(values, self._holder):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should do this first, then you can coerce if you have a dtype

@jreback
Copy link
Contributor

jreback commented Feb 10, 2016

fyi some of these fixes will prob address: #11455, maybe #11755

@kawochen
Copy link
Contributor Author

updated... but I forgot to squash :( will do that later

@@ -245,6 +246,10 @@ def unconvert(values, dtype, compress=None):
if dtype == np.object_:
return np.array(values, dtype=object)

if isinstance(dtype, string_types):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that you have to do this, we prob need something like (in core/common.py)

def pandas_dtype(dtype):
     if isinstance(dtype, compat.string_types):
            try:
                return DatetimeTZDtype.construct_from_string(dtype)
            except TypeError:
                pass

            try:
               return CategoricalDtype.construct_from_string(dtype)
            except TypeError:
               pass

           dtype = np.dtype(dtype)

    return dtype

@jreback jreback added this to the 0.18.0 milestone Feb 11, 2016
@jreback
Copy link
Contributor

jreback commented Feb 11, 2016

pls add a whatsnew as well (put in API changes).
make a note in io.rst that the writing format is changing in 0.18.0 so we won't have forward compat (IOW older version of pandas cannot read newer version)

@jreback
Copy link
Contributor

jreback commented Feb 12, 2016

this would close #12142 as well? (should add a warning in the docs about this) (e.g. show your matrix)

@kawochen
Copy link
Contributor Author

I can change all the keys to unicode, and then it will. will update tonight

@kawochen
Copy link
Contributor Author

Ah half way through but have to go. I will get it done tonight but is that too late? :(

@jreback
Copy link
Contributor

jreback commented Feb 13, 2016

sure no prob

@kawochen
Copy link
Contributor Author

updated. manually tested with python 2.7 and 3.5 on linux (zlib and no compress).

@@ -265,28 +269,31 @@ def unconvert(values, dtype, compress=None):
return np.fromstring(values, dtype=dtype)


def _u(x):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use u() from compat/__init__.py (and amend with this).

@kawochen
Copy link
Contributor Author

updated. So this is how I tested P2 & P3 compatibility: use the script to generate legacy data for 0.18.0 in P2 and P3, put them in 0.18.0 folder, and let the tests pick them up.

@jreback jreback closed this in 3358afc Feb 16, 2016
@jreback
Copy link
Contributor

jreback commented Feb 16, 2016

thanks!

I added some more test msgpacks for 0.17.1 to cover the bases

@jreback
Copy link
Contributor

jreback commented Feb 16, 2016

https://travis-ci.org/pydata/pandas/jobs/109620001

I think we need to skip if blosc is not installed on these tests.

can you do a PR for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: msgpack encode 'items'
2 participants