Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
BUG: non-standard frequency Period arithmetic #23878
starting_date = pd.Period('20180101', freq='24H') base_date = pd.Period('19700101', freq='24H') offset = starting_date - base_date base_date + offset == starting_date # -> False should be True base_date + offset == (pd.Period(str(base_date), freq='H') + (24*offset)).asfreq('24H') # -> True This is how it's actually being processed now (pd.Period(str(base_date), freq='H') + offset).asfreq('24H') == starting_date # -> True This is what needs to be done now to get the desired result
In short: (A - B) + B != A
The result of diffing two periods of a scaled frequency is represented in the number of periods of the base frequency, not the scaled frequency. Thus when adding back to the scaled frequency the resulting Period will be N (where N is the scaling factor) times more periods than should be added.
The 2 options that don't involve some sort of custom delta object don't really work well:
A simple PeriodDelta object that just contains (int, freq), where the int is the current resulting int. When added to a period, if the frequency matches the PeriodDelta freq, then the Period will be cast to it's base frequency the int added and then cast back. Ints will still be valid input, but the difference of Periods could return this object instead.
I think that ints are fine to represent period differences except it needs some modifications/restrictions (of course, I'm not aware of any of the original decisions on how this should work, so please let me know if there is any reason these wouldn't work):
I would also consider allowing something like a 2-item tuple of ints, that add would also understand, to allow for out of phase operations. Which would just be (, ).
These would be easy to implement, add almost no support cost and also make the Period operations much more reliable.
Example Implementation (allowing for out of phase arithmetic):
def __sub__(self, other): ret_val = super().__sub__(other) # not actually super, just placeholder for existing logic return divmod(ret_val, self.freq_scaling_factor) # not sure how to actually access this right now def __add__(self, other): if is_2_item_tuple_of_ints(other): other_val = other * self. self.freq_scaling_factor + other return (self.astype(self.base_freq) + other_val).astype(self.freq) # not perfect, astype doesn't always behave how I would want for this to work off the bat else: return self + other
Example Implementation (for no out of phase operations):
def __sub__(self, other): ret_val = super().__sub__(other) periods, out_of_phase = divmod(ret_val, self.freq_scaling_factor) if out_of_phase: raise ValueError('out of phase') return periods
@jbrockmendel - I was on 23.4, but looks like Master is still wrong.
So from what I can tell the
This would not be a problem (in fact it behaves the way I proposed in option 2 in the original post), however, it seems when diffing scaled freqs the scaling factor is multiplied by the base hours, this is unnecessary (and wrong) because of what was deomstrated above. So all that would need to be done is remove the multiplication of the resulting offset.
@jbrockmendel - I believe the fix would just be to change the following line: