Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QST: Howto write a custom DateOffset #35569

Open
MaxWinterstein opened this issue Aug 5, 2020 · 15 comments
Open

QST: Howto write a custom DateOffset #35569

MaxWinterstein opened this issue Aug 5, 2020 · 15 comments
Labels

Comments

@MaxWinterstein
Copy link

We need multiple custom DateOffset, e.g. so called MonthHalf.
Until Release 1.1.0 we had a working solution:

import pandas as pd

class MonthHalf(pd.offsets.DateOffset):
    """Date offset repeating on the first date of the month and a day in the middle of the month.

    For February the middle day is the 15th, for all other months the 16th."""

    _attributes = frozenset(["n"])
    _prefix = "MH"

    __init__ = pd.offsets.BaseOffset.__init__

    @pandas.tseries.offsets.apply_wraps
    def apply(self, other: dt.datetime):

        month_half_day = self._month_half_day(other.month)

        second_month_half = other.day >= month_half_day

        n = self.n
        new = other

        if n < 0:
            if not self.onOffset(other):
                # Move to the left onto an offset
                if second_month_half:
                    new = new.replace(day=month_half_day)
                else:
                    new = new.replace(day=1)

                # One step already happened
                n += 1

        odd_n = n % 2 == 1

        if n > 0:
            month_difference = n // 2 + second_month_half * odd_n
        elif n == 0:
            month_difference = 0
        else:
            month_difference = n // 2 - second_month_half * (not odd_n)

        if month_difference != 0:
            new = new + month_difference * pd.offsets.MonthBegin()

        month_half_day = self._month_half_day(new.month)

        if odd_n:
            new_day = 1 if second_month_half else month_half_day
        else:
            new_day = month_half_day if second_month_half else 1

        new = new.replace(day=new_day)

        return new

    def onOffset(self, date):
        return date.day in {1, self._month_half_day(date.month)}

    @staticmethod
    def _month_half_day(month: int):
        return 15 if month == 2 else 16

With release 1.1.0 (and i guess the merge of #34062) code is broken.

E.g. The rollback is not working anymore:

def test_rollback():
    date = pd.to_datetime('2018-01-02')
    rollback = MonthHalf().rollback(date)
    assert rollback == dt.date(2018, 1, 1)

How can i implement such need now?

@MaxWinterstein MaxWinterstein added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Aug 5, 2020
@simonjayhawkins
Copy link
Member

@jbrockmendel

@jbrockmendel
Copy link
Member

Instead of subclasses DateOffset, can you try subclassing pd._libs.tslibs.offsets.BaseOffset? You'll probably want to get apply_wraps from tslibs.offsets too

@jbrockmendel jbrockmendel added Frequency DateOffsets and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 5, 2020
@aberres
Copy link
Contributor

aberres commented Aug 5, 2020

Great, this does the trick! Thanks.

Question: Was the offset API always considered private? Are we doing something uncommon by writing our own offsets? I never found our code pretty, nor elegant, but our offsets were doing what we needed for the past years.

Is there a more common way to implement things?
Another offset we have written alternates between two dates per year representing a summer and a winter season.

@jbrockmendel
Copy link
Member

Are we doing something uncommon by writing our own offsets?

Its not unheard of, but not super-common. In some cases people upstream the functionality into pandas' built-in offsets.

Is there a more common way to implement things?

Not really. We semi-recently moved the implementation into cython for performance reasons, evidently made subclassing more difficult. If you'd like to make a PR documenting how to subclass, that would be welcome.

Be sure to alias on_offset = onOffset, since the latter is deprecated and will be removed in a future version.

@MaxWinterstein
Copy link
Author

Thanks, that one problem is gone 👍

Another thing broken for me now is a YearOffset, that starts at a specific date. Here is the code:

def year_with_date_offset(day, month):
    """Creates and instantiates an offset class incrementing yearly always starting
    at a specified day and month.

    The class is created on the fly as the original Pandas classes do not allow to
    parametrize the start day."""

    class YearWithDate(pd._libs.tslibs.offsets.YearOffset):  # noqa
        _default_month = month
        _day_opt = day
        _prefix = "ASD"

        @property
        def rule_code(self):
            return "{parent}-{day}".format(parent=super().rule_code, day=self._day_opt)

    return YearWithDate()

Any hint to archive this now?

@jbrockmendel
Copy link
Member

Does DateOffset(years=1, month=month, day=day) not work for this?

@MaxWinterstein
Copy link
Author

Does DateOffset(years=1, month=month, day=day) not work for this?

>>> import pandas as pd
>>> o = pd._libs.tslibs.offsets.DateOffset(years=1, month=3, day=5)
>>> pd.date_range(start="2017", periods=3, freq=o)
DatetimeIndex(['2017-01-01', '2018-03-05', '2019-03-05'], dtype='datetime64[ns]', freq='<DateOffset: day=5, month=3, years=1>')

Kinda, as the first one, only defined via year, will not be on that specific date.

@jbrockmendel
Copy link
Member

That looks like a problem in o.is_on_offset. It should have o.is_on_offset(pd.Timestamp("2017-01-01")) give False, but ATM is gives True. PR to fix would be welcome.

@MaxWinterstein
Copy link
Author

MaxWinterstein commented Aug 14, 2020

That looks like a problem in o.is_on_offset. It should have o.is_on_offset(pd.Timestamp("2017-01-01")) give False, but ATM is gives True. PR to fix would be welcome.

Not sure if this is the real problem. Tried it with this:

        def is_on_offset(self, dt: datetime) -> bool:
            if self.normalize and not _is_normalized(dt):  # noqa
                return False
            elif self.day == dt.day and self.month == dt.month:
                return True
            return False

which results in:

o.is_on_offset(pd.Timestamp("2017-01-01")) # False
o.is_on_offset(pd.Timestamp("2017-05-03")) # True

Now the date is correct, but the year is one too far in the future.

r = pd.date_range(start="2017", periods=5, freq=o) # DatetimeIndex(['2018-05-03', '2019-05-03', '2020-05-03', '2021-05-03', '2022-05-03'], dtype='datetime64[ns]', freq='ASD-5-3')

@aberres
Copy link
Contributor

aberres commented Oct 14, 2020

That looks like a problem in o.is_on_offset. It should have o.is_on_offset(pd.Timestamp("2017-01-01")) give False, but ATM is gives True.

@jbrockmendel Are you suggesting to change/add DateOffset.is_on_offset? Right now this method does not exist, RelativeDeltaOffset.is_on_offset is used instead, which always returns True.

Something like o = pd._libs.tslibs.offsets.YearOffset(month=4) would possibly already work for us (in the current usecase the day is always 1), but this class is not supposed to be used directly anyway?

>>> o.rollforward(pd.Timestamp("2017-04-01"))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-39-0dc8ef326884> in <module>
----> 1 o.rollforward(pd.Timestamp("2017-04-01"))
pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset.rollforward()
pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset.__add__()
pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset.__add__()
pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.apply_wraps.wrapper()
pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.YearOffset.apply()
pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.roll_qtrday()
ValueError: None

@jbrockmendel
Copy link
Member

Are you suggesting to change/add DateOffset.is_on_offset? Right now this method does not exist, RelativeDeltaOffset.is_on_offset

I think RelativeDeltaOffset.is_on_offset should look like:

def is_on_offset(self, dt: datetime) -> bool:
    if self.normalize and not _is_normalized(dt):
        return False
    return (self + dt) - self == dt

When I do this, I get:

off = pd._libs.tslibs.offsets.DateOffset(years=1, month=3, day=5)

>>> off.is_on_offset(pd.Timestamp("2017-01-01"))
False

dti = pd.date_range(start="2017", periods=3, freq=off)

>>> dti
DatetimeIndex(['2018-03-05', '2019-03-05', '2020-03-05'], dtype='datetime64[ns]', freq='<DateOffset: day=5, month=3, years=1>')

which looks right to me.

@aberres
Copy link
Contributor

aberres commented Nov 5, 2020

I fear not, the first index should be 2017-03-05, shouldn't it? rollforward/rollbackward and apply also di not really behave in the expected ways on some quick tests of mine.

@aberres
Copy link
Contributor

aberres commented Nov 6, 2020

After all its is about _day_opt now not accepting an int anymore.

As I am running out of time and the offset is not that performance critical for now, I created a quick hack for my needs:

def year_with_date_offset(*, day, month):
     class YearWithDate(pd._libs.tslibs.offsets.BaseOffset):  # noqa
        _attributes = frozenset(["n"])
        _prefix = "YWD"

        __init__ = pd.offsets.BaseOffset.__init__

        @apply_wraps
        def apply(self, other: dt.datetime):
            n = self.n
            if n > 0 and (
                (other.month < month) or (other.month == month and other.day < day)
            ):
                n -= 1
            elif n < 0 and (
                (other.month > month) or (other.month == month and other.day > day)
            ):
                n += 1

            new = other.replace(year=other.year + n, month=month, day=day)

            return new

        def onOffset(self, date):
            return (date.day == day) and (date.month == month)

        on_offset = onOffset

    return YearWithDate()

@jbrockmendel
Copy link
Member

As I am running out of time and the offset is not that performance critical for now, I created a quick hack for my needs:

Glad to hear you've got a working fix. Hopefully we can still find a longer-term solution to implement within pandas.

I fear not, the first index should be 2017-03-05, shouldn't it? rollforward/rollbackward and apply also di not really behave in the expected ways on some quick tests of mine.

Given the existing semantics of RelativeDeltaOffset (RDO), I think it should be 2018-03-05. Internally, the RDO doesn't look at the years/month/day attributes, just the self._offset dateutil.relativedelta object. If we want to support this, it probably means either a) augmenting YearOffset to allow for a self.day, or b) implementing a RelativeDeltaOffset-like class that doesn't actually use dateutil's relativedelta

@philippkraft
Copy link

I found this question while I was looking after a similar problem - I need a frequency for three periods dividing a month (known as decade but for days not years): Days: 1-10, 11-20 and 20-28/29/30/31. These decades are common in agro-metereology. Is there any documentation other than this question to guide me? What is the new date the apply function returns in both examples above? The start of the next period? The start of the current period?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants