Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make DateOffset immutable #21341

Merged
merged 8 commits into from Jun 21, 2018

Conversation

Projects
None yet
6 participants
@jbrockmendel
Copy link
Member

commented Jun 6, 2018

Returning to the long-standing goal of making DateOffsets immutable (recall: DateOffset.__eq__ calls DateOffset._params which is very slow. _params can't be cached ATM because DateOffset is mutable`)

Earlier attempts in this direction tried to make the base class a cdef class, but that has run into pickle problems that I haven't been able to sort out so far. This PR goes the patch-__setattr__ route instead.

Note: this PR does not implement the caching that is the underlying goal.

I'm likely to make some other PRs in this area, will try to keep them orthogonal.

@gfyoung gfyoung requested a review from jreback Jun 6, 2018

@codecov

This comment has been minimized.

Copy link

commented Jun 6, 2018

Codecov Report

❗️ No coverage uploaded for pull request base (master@e24da6c). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #21341   +/-   ##
=========================================
  Coverage          ?   91.91%           
=========================================
  Files             ?      153           
  Lines             ?    49550           
  Branches          ?        0           
=========================================
  Hits              ?    45546           
  Misses            ?     4004           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.31% <100%> (?)
#single 41.79% <100%> (?)
Impacted Files Coverage Δ
pandas/tseries/offsets.py 97.16% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e24da6c...f931224. Read the comment docs.

object.__setattr__(self, "_cache", {})

def __setattr__(self, name, value):
raise AttributeError("DateOffset objects are immutable.")

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 6, 2018

Member

The standard approach is usually to use private attribute like _n and provide access via a properties. Is there a reason why that wouldn't make sense here?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jun 7, 2018

Author Member

That would absolutely make sense, but since micro-benchmarks are a concern I prefer to avoid the property overhead. The best-case would be the implementation in #18824#18224, but I can't figure out the pickle errors there.

This comment has been minimized.

Copy link
@shoyer

shoyer Jun 7, 2018

Member

OK, seems reasonable to me.

@soerendip

This comment has been minimized.

Copy link
Contributor

commented Jun 7, 2018

OK, its working again.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2018

@jbrockmendel ok pls rebase, I think you have some residual comments about normalize (your PR was merged so maybe need changing)

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jun 14, 2018

This is better than nothing since it will allow us to move forward with some serious perf improvements to PeriodIndex ops, but is still pretty ugly. If there's a decent chance of getting someone more competent than me to look at the pickle issue holding up #18824#18224 that would be the best-case.

@@ -304,6 +304,15 @@ class _BaseOffset(object):
_day_opt = None
_attributes = frozenset(['n', 'normalize'])

def __init__(self, n=1, normalize=False):
n = self._validate_n(n)
object.__setattr__(self, "n", n)

This comment has been minimized.

Copy link
@jreback

jreback Jun 19, 2018

Contributor

I would just set these as regular attributes, I doubt this actually makes a signfiicant perf difference.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jun 19, 2018

Author Member

Not sure I understand. These are regular attributes, but the PR overrides __setattr__ so we have to use object.__setattr__.

This comment has been minimized.

Copy link
@jreback

jreback Jun 19, 2018

Contributor

oh sorry you are right.

can you set these as cdef readonly? (and maybe type them) or not yet?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jun 19, 2018

Author Member

I really really wish I could, but until someone smarter than me looks at #18824#18224 cdef class causes pickle test failures.

This comment has been minimized.

Copy link
@WillAyd

WillAyd Jun 19, 2018

Member

@jbrockmendel what pickle issue are you referring to in that change? That just looks like a master tracker but not sure the exact issue contained within

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jun 19, 2018

Author Member

@WillAyd thanks for catching that, should be #18224. Will fix above.

n = self._validate_n(n)
object.__setattr__(self, "n", n)
object.__setattr__(self, "normalize", normalize)
object.__setattr__(self, "_cache", {})

This comment has been minimized.

Copy link
@jreback

jreback Jun 19, 2018

Contributor

what is _cache for?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jun 19, 2018

Author Member

If we don't create this explicitly in __init__ then @cache_readonly lookups will try to create it, which will raise.

This comment has been minimized.

Copy link
@jreback

jreback Jun 19, 2018

Contributor

can you put this on the class? (or is that really weird)?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jun 19, 2018

Author Member
pd.DateOffset._cache = {}
off = pd.DateOffset(n=1)
hour = pd.offsets.BusinessHour(n=2)

>>> off._cache is hour._cache
True

This comment has been minimized.

Copy link
@jreback

jreback Jun 19, 2018

Contributor

right so we can't use the @cache_readonly decorator here? (or is that what you are doing by pre-emptively setting the cache)?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jun 19, 2018

Author Member

cache_readonly.__get__ checks for a _cache attribute and creates one if it does not exist. Creating it in __init__ ensures that one exists, so a new one does not have to be created (since attempting to do so would raise)

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jun 19, 2018

Author Member

i.e. this is necessary in order to retain the existing usages of cache_readonly on DateOffset subclasses.

This comment has been minimized.

Copy link
@jreback

jreback Jun 19, 2018

Contributor

ok makes sense

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2018

@jbrockmendel ok this is ok. merge before your set_state/get_state change? does it matter?

@jreback jreback added this to the 0.24.0 milestone Jun 19, 2018

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jun 19, 2018

There will be a merge conflict I'll need to address. Since I'm holding out hope WillAyd can help make #18224 viable (and this unnecessary), I'd rather the the getstate/setstate one get merged first.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2018

ok dokie

@jreback
Copy link
Contributor

left a comment

I think this needs a whatsnew as now technically DateOffsets are immutable yes? do we have some direct tests for this?

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jun 20, 2018

I think this needs a whatsnew as now technically DateOffsets are immutable yes? do we have some direct tests for this?

added

@jreback jreback added the Frequency label Jun 21, 2018

@jreback jreback merged commit 1638331 into pandas-dev:master Jun 21, 2018

0 of 3 checks passed

ci/circleci Your tests are queued behind your running builds
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2018

thanks @jbrockmendel

@jbrockmendel jbrockmendel referenced this pull request Jun 21, 2018

Merged

cache DateOffset attrs now that they are immutable #21582

1 of 2 tasks complete

@jbrockmendel jbrockmendel deleted the jbrockmendel:imm branch Jun 22, 2018

alimcmaster1 added a commit to alimcmaster1/pandas that referenced this pull request Jun 28, 2018

alimcmaster1 added a commit to alimcmaster1/pandas that referenced this pull request Jun 28, 2018

Sup3rGeo added a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.