Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make DateOffset immutable #21341

Merged
merged 8 commits into from
Jun 21, 2018
Merged

make DateOffset immutable #21341

merged 8 commits into from
Jun 21, 2018

Conversation

jbrockmendel
Copy link
Member

Returning to the long-standing goal of making DateOffsets immutable (recall: DateOffset.__eq__ calls DateOffset._params which is very slow. _params can't be cached ATM because DateOffset is mutable`)

Earlier attempts in this direction tried to make the base class a cdef class, but that has run into pickle problems that I haven't been able to sort out so far. This PR goes the patch-__setattr__ route instead.

Note: this PR does not implement the caching that is the underlying goal.

I'm likely to make some other PRs in this area, will try to keep them orthogonal.

@gfyoung gfyoung added Internals Related to non-user accessible pandas implementation Timedelta Timedelta data type labels Jun 6, 2018
@gfyoung gfyoung requested a review from jreback June 6, 2018 16:38
@codecov
Copy link

codecov bot commented Jun 6, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@e24da6c). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #21341   +/-   ##
=========================================
  Coverage          ?   91.91%           
=========================================
  Files             ?      153           
  Lines             ?    49550           
  Branches          ?        0           
=========================================
  Hits              ?    45546           
  Misses            ?     4004           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.31% <100%> (?)
#single 41.79% <100%> (?)
Impacted Files Coverage Δ
pandas/tseries/offsets.py 97.16% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e24da6c...f931224. Read the comment docs.

object.__setattr__(self, "_cache", {})

def __setattr__(self, name, value):
raise AttributeError("DateOffset objects are immutable.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard approach is usually to use private attribute like _n and provide access via a properties. Is there a reason why that wouldn't make sense here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would absolutely make sense, but since micro-benchmarks are a concern I prefer to avoid the property overhead. The best-case would be the implementation in #18824#18224, but I can't figure out the pickle errors there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, seems reasonable to me.

@sorenwacker
Copy link

OK, its working again.

@jreback
Copy link
Contributor

jreback commented Jun 14, 2018

@jbrockmendel ok pls rebase, I think you have some residual comments about normalize (your PR was merged so maybe need changing)

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Jun 14, 2018

This is better than nothing since it will allow us to move forward with some serious perf improvements to PeriodIndex ops, but is still pretty ugly. If there's a decent chance of getting someone more competent than me to look at the pickle issue holding up #18824#18224 that would be the best-case.

@@ -304,6 +304,15 @@ class _BaseOffset(object):
_day_opt = None
_attributes = frozenset(['n', 'normalize'])

def __init__(self, n=1, normalize=False):
n = self._validate_n(n)
object.__setattr__(self, "n", n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just set these as regular attributes, I doubt this actually makes a signfiicant perf difference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. These are regular attributes, but the PR overrides __setattr__ so we have to use object.__setattr__.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry you are right.

can you set these as cdef readonly? (and maybe type them) or not yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really really wish I could, but until someone smarter than me looks at #18824#18224 cdef class causes pickle test failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel what pickle issue are you referring to in that change? That just looks like a master tracker but not sure the exact issue contained within

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd thanks for catching that, should be #18224. Will fix above.

n = self._validate_n(n)
object.__setattr__(self, "n", n)
object.__setattr__(self, "normalize", normalize)
object.__setattr__(self, "_cache", {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is _cache for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't create this explicitly in __init__ then @cache_readonly lookups will try to create it, which will raise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put this on the class? (or is that really weird)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pd.DateOffset._cache = {}
off = pd.DateOffset(n=1)
hour = pd.offsets.BusinessHour(n=2)

>>> off._cache is hour._cache
True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right so we can't use the @cache_readonly decorator here? (or is that what you are doing by pre-emptively setting the cache)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache_readonly.__get__ checks for a _cache attribute and creates one if it does not exist. Creating it in __init__ ensures that one exists, so a new one does not have to be created (since attempting to do so would raise)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. this is necessary in order to retain the existing usages of cache_readonly on DateOffset subclasses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok makes sense

@jreback
Copy link
Contributor

jreback commented Jun 19, 2018

@jbrockmendel ok this is ok. merge before your set_state/get_state change? does it matter?

@jreback jreback added this to the 0.24.0 milestone Jun 19, 2018
@jbrockmendel
Copy link
Member Author

There will be a merge conflict I'll need to address. Since I'm holding out hope WillAyd can help make #18224 viable (and this unnecessary), I'd rather the the getstate/setstate one get merged first.

@jreback
Copy link
Contributor

jreback commented Jun 19, 2018

ok dokie

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a whatsnew as now technically DateOffsets are immutable yes? do we have some direct tests for this?

@jbrockmendel
Copy link
Member Author

I think this needs a whatsnew as now technically DateOffsets are immutable yes? do we have some direct tests for this?

added

@jreback jreback added the Frequency DateOffsets label Jun 21, 2018
@jreback jreback merged commit 1638331 into pandas-dev:master Jun 21, 2018
@jreback
Copy link
Contributor

jreback commented Jun 21, 2018

thanks @jbrockmendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets Internals Related to non-user accessible pandas implementation Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants