Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tslibs offsets immutable #18224

Closed

Conversation

jbrockmendel
Copy link
Member

This is the big hurdle towards making DateOffsets immutable.

  • Make tslibs.offsets._BaseOffset a cython class, define n and normalize to be readonly
  • Update tests that try to alter offsets inplace
  • Poke at things until pickle tests stop failing <-- the part with the kludge

I'm not wild about the hack in pickle_compat, but haven't found any other way to avoid TestCommon.test_pickle_v0_15_2 failing with:

cls = <class 'pandas.tseries.offsets.YearBegin'>, base = <type 'object'>, state = None

    def _reconstructor(cls, base, state):
        if base is object:
>           obj = object.__new__(cls)
E           TypeError: object.__new__(YearBegin) is not safe, use pandas._libs.tslibs.offsets._BaseOffset.__new__()

After this there will then be a) a bunch of other attributes to make readonly and b) the optimizations that are available once DateOffsets are immutable. But first we need to either OK the pickle_compat hack or find another way around.

@@ -361,6 +370,58 @@ class _BaseOffset(object):
out = '<%s' % n_str + className + plural + self._repr_attrs() + '>'
return out

def __setstate__(self, state):
"""Reconstruct an instance from a pickled state"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no just define dunder reduce
you can’t use getstate/setstate in cython class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try again with __reduce__. Don't hold your breath.

If we define __setstate__ in the non-cython class, trying to set self.n raises an AttributeError because it is readonly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't __reduce__ for pickling? The problem here is un-pickling legacy offsets.

@jreback
Copy link
Contributor

jreback commented Nov 11, 2017

this is not a feasible work around. you will have to look how I handled the ndarray subclassing case (this is similar).

@jbrockmendel
Copy link
Member Author

this is not a feasible work around. you will have to look how I handled the ndarray subclassing case (this is similar).

OK, I'll keep poking at it. Where did the ndarray subclassing occur? Best guess is IndexOpsMixin?

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Nov 11, 2017

Once we get #18224 sorted out, the assert n == int(n) will be done only once (and probably as a ValueError instead of assertion) in _BaseOffset.__init__

Update Woops, this comment belongs in #18226.

@gfyoung gfyoung added Clean Internals Related to non-user accessible pandas implementation Timedelta Timedelta data type and removed Clean labels Nov 11, 2017
@jbrockmendel
Copy link
Member Author

Will wrk be m this locally, get it off reviewers plate for now. Closing.

@jbrockmendel
Copy link
Member Author

I'm not having much luck with this, have not given up. One idea: could we drop support for the 0.15.2 pickle? That would solve the current problem.

@jbrockmendel
Copy link
Member Author

Implementing __reduce__ just leads to more errors. Can you expand a bit on what the problem is with the approach in this PR via pickle_compat?

@jbrockmendel jbrockmendel deleted the tslibs-offsets-immutable branch December 8, 2017 19:38
@jbrockmendel jbrockmendel mentioned this pull request Dec 19, 2017
39 tasks
@jbrockmendel
Copy link
Member Author

@jreback suggestions on who I might be able to ask for help on this? Making offsets immutable is a pretty high priority for me (major PeriodIndex perf implications) and this is dead in the water.

@jreback
Copy link
Contributor

jreback commented Jan 2, 2018

@jbrockmendel not sure what is the specific issue

@jbrockmendel
Copy link
Member Author

not sure what is the specific issue

TLDR: cython class + pickle + back-compat

We need to make DateOffsets immutable. To get there we need to change class _BaseOffset(object) to cdef class _BaseOffset(object). But doing that leads to failure in test_pickle_v0_15_2. The only workaround I've come up with got shot down.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2018

I actually don’t have a problem with breaking pickles for offsets; you wooild rarely pickle these anyways

but i think u can still do this by using our substitution machinery - why don’t i give this a try

@jbrockmendel
Copy link
Member Author

but i think u can still do this by using our substitution machinery - why don’t i give this a try

Awesome, go for it! My attempt to use the substitution machinery didn't work out, so with a little luck I'll learn something from seeing the results.

I actually don’t have a problem with breaking pickles for offsets

If the substitution approach doesn't work out, then after my remaining outstanding PRs are closed out I'll do it this way.

@jbrockmendel
Copy link
Member Author

why don’t i give this a try

@jreback gotten a chance to look at this? I've been trying periodically and still haven't come up with anything other than the hack in pickle_compat here.

@WillAyd
Copy link
Member

WillAyd commented Jun 19, 2018

@jbrockmendel I couldn't pull down your branch locally (deleted?) but maybe this article helps:

https://stackoverflow.com/questions/4028321/any-ideas-about-the-best-work-around-for-new-losing-its-arguments

Namely that "You can't use object as the terminating class for your inheritance if you want to pass arguments to __new__ or __init__." - perhaps explicitly defining __new__ for _BaseOffset would help?

@jbrockmendel
Copy link
Member Author

I couldn't pull down your branch locally (deleted?)

Yah, looks like I deleted the branch. I'll see if I can restore it without creating a new PR.

perhaps explicitly defining __new__ for _BaseOffset would help

Nope, cython raises a compile-time error telling you to use __cinit__ instead of __new__. If we do that, then in the tests we get errors in unpickling (see OP for description of the error and the kludged workaround). See @jreback's response disapproving and suggesting I look at [I never figured out exactly what]

@jbrockmendel
Copy link
Member Author

@WillAyd the branch should be ~restored (made a new branch with the same name, copy/pasted the diff).

@WillAyd
Copy link
Member

WillAyd commented Jun 19, 2018

Still trying to recreate from the branch provided and haven't seen through yet but wondering at the moment if even just making a pass-through cinit would work, ala:

    def __cinit__(self, *args):
        pass

My current thinking there is a result of the below link:

http://cython.readthedocs.io/en/latest/src/userguide/special_methods.html#initialisation-methods-cinit-and-init

Namely "If you anticipate subclassing your extension type in Python, you may find it useful to give the cinit() method * and ** arguments so that it can accept and ignore extra arguments. Otherwise, any Python subclass which has an init() with a different signature will have to override new() [1] as well as init(), which the writer of a Python class wouldn’t expect to have to do."

Doing such did resolve the TypeError you mentioned originally, though some other errors popped up I haven't had time to look at and may or may not have been resolved by subsequent work you did. Sharing for now in case this finding helps you at all

@jbrockmendel
Copy link
Member Author

Still trying to recreate from the branch provided and haven't seen through yet

Thanks for taking a look. Note that when I put this together yesterday I got a test error in something like test_offsets.test_pickle_v_0_15_2 that wasn't present at the time of the PR. That is somewhat secondary, as The Problem is what happens when you revert the edits made in pickle_compat.

@WillAyd
Copy link
Member

WillAyd commented Jun 19, 2018

I don't get any errors when reverting the changes made in pickle_compat from the branch you remade

@jbrockmendel
Copy link
Member Author

Not even in TestCommon.test_pickle_v0_15_2? I'm getting that error (py27, trying py36 shortly) regardless of whether the pickle_compat changes are reverted or not.

@WillAyd
Copy link
Member

WillAyd commented Jun 19, 2018

No errors at all running

pytest pandas/tests/io/test_pickle.py

This was using python3 so maybe that's the differentiator

@jbrockmendel
Copy link
Member Author

Hmm I've fixed the v0_15_2 error I was getting (had mixed and matched diffs), but I'm still getting failures when I revert the pickle_compat edits (py27 and py36, cython 0.28.3).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants