Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: non-standard frequency Period arithmetic #23878

Closed
ArtinSarraf opened this issue Nov 23, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@ArtinSarraf
Copy link
Contributor

commented Nov 23, 2018

Code Sample

starting_date = pd.Period('20180101', freq='24H') 
base_date = pd.Period('19700101', freq='24H')
offset = starting_date - base_date
base_date + offset == starting_date  # -> False   should be True
base_date + offset == (pd.Period(str(base_date), freq='H') + (24*offset)).asfreq('24H')  # -> True  This is how it's actually being processed now
(pd.Period(str(base_date), freq='H') + offset).asfreq('24H') == starting_date  # -> True  This is what needs to be done now to get the desired result

Problem description

In short: (A - B) + B != A

The result of diffing two periods of a scaled frequency is represented in the number of periods of the base frequency, not the scaled frequency. Thus when adding back to the scaled frequency the resulting Period will be N (where N is the scaling factor) times more periods than should be added.

The 2 options that don't involve some sort of custom delta object don't really work well:

  1. offset divided by the scaling factor
  • This would not work for some scaling factors (e.g. '5H').
  1. adding an int to a Period should not scale the int by the scaling factor, but rather just add that number of base periods.
  • This breaks the logic that, for example, pd.Period('19700101', freq='3H') + 1 == pd.Period('19700101 03:00', freq='3H'), which I don't think should be compromised

Suggestion:

A simple PeriodDelta object that just contains (int, freq), where the int is the current resulting int. When added to a period, if the frequency matches the PeriodDelta freq, then the Period will be cast to it's base frequency the int added and then cast back. Ints will still be valid input, but the difference of Periods could return this object instead.

Output of pd.show_versions()

v0.23.4

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 23, 2018

we already rejected PeriodDelta as pretty niche and not worth the supports costs

that said some of the above might be bugs

@ArtinSarraf

This comment has been minimized.

Copy link
Contributor Author

commented Nov 24, 2018

I think that ints are fine to represent period differences except it needs some modifications/restrictions (of course, I'm not aware of any of the original decisions on how this should work, so please let me know if there is any reason these wouldn't work):

  1. The int should represent the number of scaled periods (since this is how addition works), not base periods
  2. Periods must be in phase - required for (1) to work.
  • In phase: pd.Period('20180101 09:00', freq='5H') - pd.Period('20180101 04:00', freq='5H')
  • Out of phase: pd.Period('20180101 04:00', freq='5H') - pd.Period('20180101 00:00', freq='5H')

I would also consider allowing something like a 2-item tuple of ints, that add would also understand, to allow for out of phase operations. Which would just be (, ).

These would be easy to implement, add almost no support cost and also make the Period operations much more reliable.

Example Implementation (allowing for out of phase arithmetic):

def __sub__(self, other):
    ret_val = super().__sub__(other)  # not actually super, just placeholder for existing logic
    return divmod(ret_val, self.freq_scaling_factor)  # not sure how to actually access this right now

def __add__(self, other):
    if is_2_item_tuple_of_ints(other):
        other_val = other[0] * self. self.freq_scaling_factor + other[1]
        return (self.astype(self.base_freq) + other_val).astype(self.freq)  # not perfect, astype doesn't always behave how I would want for this to work off the bat
    else:
        return self + other

Example Implementation (for no out of phase operations):

def __sub__(self, other):
    ret_val = super().__sub__(other)
    periods, out_of_phase = divmod(ret_val, self.freq_scaling_factor)
    if out_of_phase:
        raise ValueError('out of phase')
    return periods
@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Nov 24, 2018

Is this in master? We fixed a very similar bug recently (though that may have been for PeriodIndex and not Period)

@ArtinSarraf

This comment has been minimized.

Copy link
Contributor Author

commented Nov 24, 2018

@jbrockmendel - I was on 23.4, but looks like Master is still wrong.

So from what I can tell the pandas.tseries.offsets.Hour object when added to a Period object, only factors in the base frequency. So you'll get this:

>>> str(pd.Period('20180101', freq='H') + Hour(10)) == str(pd.Period('20180101', freq='5H') + Hour(10))  # note the freqs

This would not be a problem (in fact it behaves the way I proposed in option 2 in the original post), however, it seems when diffing scaled freqs the scaling factor is multiplied by the base hours, this is unnecessary (and wrong) because of what was deomstrated above. So all that would need to be done is remove the multiplication of the resulting offset.

>>> pd.Period('20180101 00:00', freq='12H') - pd.Period('19700101 00:00', freq='12H')
<5049216 * Hours>  # should be 5049216/12
@ArtinSarraf

This comment has been minimized.

Copy link
Contributor Author

commented Nov 25, 2018

@jbrockmendel - I believe the fix would just be to change the following line:

pandas/_libs/tslibs/period.pyx:1688

return (self.ordinal - other.ordinal) * self.freq

To

return (self.ordinal - other.ordinal) * type(self.freq)()  # couldn’t find a better way to get the base freq
@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Nov 25, 2018

Great, a PR would be welcome.

@ArtinSarraf

This comment has been minimized.

Copy link
Contributor Author

commented Nov 26, 2018

@jbrockmendel - PR created #23915

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.