New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateOffset.offset vs DateOffset._offset #17176

Closed
jbrockmendel opened this Issue Aug 5, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@jbrockmendel
Member

jbrockmendel commented Aug 5, 2017

tseries.offsets.DateOffset has a _offset attribute. Many subclasses do not have this attribute, but do have a offset attribute. The main difference in the two appears to be that _offset is ignored by __eq__ (see #17137).

The question: why two separate attributes? Is this intentional or coincidence?

@gfyoung gfyoung added the Internals label Aug 5, 2017

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Aug 27, 2017

There are some inconsistencies in how offsets handles kwds that need to be cleaned up if we want offsets to be immutable. I have not found a way to clean them up that maintains backward compatibility in extremely weird corner cases. Take BusinessDay as an example:

class BusinessDay(SingleConstructorOffset):
    def __init__(self, n=1, normalize=False, **kwds):
        self.n = int(n)
        self.normalize = normalize
        self.kwds = kwds
        self.offset = kwds.get('offset', timedelta(0))

I'd like to get a green-light to change this behavior:

In [1]: from datetime import timedelta
In [2]: from pandas.tseries.offsets import *
In [3]: bd = BusinessDay()
In [4]: bd2 = BusinessDay(offset=timedelta(0))

In [5]: bd
Out[5]: <BusinessDay>

In [6]: bd == bd2
Out[6]: True

In [7]: bd.kwds
Out[7]: {}

In [8]: bd2.kwds
Out[8]: {'offset': datetime.timedelta(0)}

In [9]: bd - bd2
Out[9]: <0 * BusinessDays>

In [10]: bd2 - bd
Out[10]: <0 * BusinessDays>

In [11]: (bd - bd2).kwds
Out[11]: {}

In [12]: (bd2 - bd).kwds
Out[12]: {'offset': datetime.timedelta(0)}

Similar behavior shows up in the other BusinessFooOffsets, Week, and QuarterOffset. Many of the others will display unwanted behavior if users pass unsupported keyword args:

In [33]: sms = SemiMonthBegin(clark='kent')
In [34]: sms
Out[34]: <SemiMonthBegin: day_of_month=15, kwds={'clark': 'kent'}>

Best case I'd like to get rid of DateOffset.kwds altogether and explicitly specify the args/kwargs in the various __init__ methods (this will require patching a few other methods that check self.kwds, but that isn't a big deal). That will break backward compat for any users who-- for some reason-- explicitly use that attribute.

If that is too big a break, the next-best option would be to

  1. add validation in __init__ that only supported kwargs have been passed (this is done in a few existing code paths). This will take care of the SemiMonthBegin(clark='kent') cases.

  2. ensure that default values for kwds are handled consistently. e.g. in the BusinessDay.__init__ example above, kwds.get('offset', timedelta(0)) would be changed to kwds.setdefault('offset', timedelta(0)).

@jreback

This comment has been minimized.

Contributor

jreback commented Sep 7, 2017

I agree that the whole kwds bizness is fishy. So would be ok with having actual signatures.

@jreback jreback added Frequency and removed Internals labels Sep 7, 2017

@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Sep 7, 2017

I agree that the whole kwds bizness is fishy. So would be ok with having actual signatures.

Great. I'll start small implementing these one or two at a time to make sure we're on the same page.

@jbrockmendel jbrockmendel referenced this issue Sep 7, 2017

Merged

Lock down kwargs in offsets signatures #17458

0 of 4 tasks complete
@jbrockmendel

This comment has been minimized.

Member

jbrockmendel commented Sep 9, 2017

There are two places in the tests with seemingly-incorrect behavior. I'd like to confirm that these are incorrect and fix the tests before trying to fix the code.

https://github.com/pandas-dev/pandas/blob/master/pandas/tests/tseries/test_offsets.py#L2632

dt = datetime(2007, 1, 1)
offset = MonthEnd(day=20)
result = dt + offset
assert result == Timestamp(2007, 1, 31)
result = result + offset
assert result == Timestamp(2007, 2, 28)

day is not a valid argument for MonthEnd; this offset behaves the same as one with no day arg.

https://github.com/pandas-dev/pandas/blob/master/pandas/tests/tseries/test_offsets.py#L3680

offset_n = FY5253(weekday=WeekDay.TUE, startingMonth=12,
                          variation="nearest", qtr_with_extra_week=4)

qtr_with_extra_week is a kwd for FY5253Quarter, not FY5253

@jreback

This comment has been minimized.

Contributor

jreback commented Sep 9, 2017

yep those look wrong

why don't do a PR just to fix those

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment