# BUG: non-standard frequency Period arithmetic #23878

Closed
opened this issue Nov 23, 2018 · 7 comments

Projects
None yet
Contributor

### ArtinSarraf commented Nov 23, 2018

#### Code Sample

```starting_date = pd.Period('20180101', freq='24H')
base_date = pd.Period('19700101', freq='24H')
offset = starting_date - base_date
base_date + offset == starting_date  # -> False   should be True
base_date + offset == (pd.Period(str(base_date), freq='H') + (24*offset)).asfreq('24H')  # -> True  This is how it's actually being processed now
(pd.Period(str(base_date), freq='H') + offset).asfreq('24H') == starting_date  # -> True  This is what needs to be done now to get the desired result```

#### Problem description

In short: (A - B) + B != A

The result of diffing two periods of a scaled frequency is represented in the number of periods of the base frequency, not the scaled frequency. Thus when adding back to the scaled frequency the resulting Period will be N (where N is the scaling factor) times more periods than should be added.

The 2 options that don't involve some sort of custom delta object don't really work well:

1. offset divided by the scaling factor
• This would not work for some scaling factors (e.g. '5H').
1. adding an int to a Period should not scale the int by the scaling factor, but rather just add that number of base periods.
• This breaks the logic that, for example, pd.Period('19700101', freq='3H') + 1 == pd.Period('19700101 03:00', freq='3H'), which I don't think should be compromised

#### Suggestion:

A simple PeriodDelta object that just contains (int, freq), where the int is the current resulting int. When added to a period, if the frequency matches the PeriodDelta freq, then the Period will be cast to it's base frequency the int added and then cast back. Ints will still be valid input, but the difference of Periods could return this object instead.

v0.23.4

Contributor

### jreback commented Nov 23, 2018

 we already rejected PeriodDelta as pretty niche and not worth the supports costs that said some of the above might be bugs
Contributor Author

### ArtinSarraf commented Nov 24, 2018 • edited

 I think that ints are fine to represent period differences except it needs some modifications/restrictions (of course, I'm not aware of any of the original decisions on how this should work, so please let me know if there is any reason these wouldn't work): The int should represent the number of scaled periods (since this is how addition works), not base periods Periods must be in phase - required for (1) to work. In phase: `pd.Period('20180101 09:00', freq='5H') - pd.Period('20180101 04:00', freq='5H')` Out of phase: `pd.Period('20180101 04:00', freq='5H') - pd.Period('20180101 00:00', freq='5H')` I would also consider allowing something like a 2-item tuple of ints, that add would also understand, to allow for out of phase operations. Which would just be (, ). These would be easy to implement, add almost no support cost and also make the Period operations much more reliable. Example Implementation (allowing for out of phase arithmetic): ```def __sub__(self, other): ret_val = super().__sub__(other) # not actually super, just placeholder for existing logic return divmod(ret_val, self.freq_scaling_factor) # not sure how to actually access this right now def __add__(self, other): if is_2_item_tuple_of_ints(other): other_val = other * self. self.freq_scaling_factor + other return (self.astype(self.base_freq) + other_val).astype(self.freq) # not perfect, astype doesn't always behave how I would want for this to work off the bat else: return self + other``` Example Implementation (for no out of phase operations): ```def __sub__(self, other): ret_val = super().__sub__(other) periods, out_of_phase = divmod(ret_val, self.freq_scaling_factor) if out_of_phase: raise ValueError('out of phase') return periods```
Member

### jbrockmendel commented Nov 24, 2018

 Is this in master? We fixed a very similar bug recently (though that may have been for PeriodIndex and not Period)
Contributor Author

### ArtinSarraf commented Nov 24, 2018 • edited

 @jbrockmendel - I was on 23.4, but looks like Master is still wrong. So from what I can tell the `pandas.tseries.offsets.Hour` object when added to a Period object, only factors in the base frequency. So you'll get this: ``````>>> str(pd.Period('20180101', freq='H') + Hour(10)) == str(pd.Period('20180101', freq='5H') + Hour(10)) # note the freqs `````` This would not be a problem (in fact it behaves the way I proposed in option 2 in the original post), however, it seems when diffing scaled freqs the scaling factor is multiplied by the base hours, this is unnecessary (and wrong) because of what was deomstrated above. So all that would need to be done is remove the multiplication of the resulting offset. ``````>>> pd.Period('20180101 00:00', freq='12H') - pd.Period('19700101 00:00', freq='12H') <5049216 * Hours> # should be 5049216/12 ``````
Contributor Author

### ArtinSarraf commented Nov 25, 2018 • edited

 @jbrockmendel - I believe the fix would just be to change the following line: pandas/_libs/tslibs/period.pyx:1688 ``````return (self.ordinal - other.ordinal) * self.freq `````` To ``````return (self.ordinal - other.ordinal) * type(self.freq)() # couldn’t find a better way to get the base freq ``````
Member

### jbrockmendel commented Nov 25, 2018

 Great, a PR would be welcome.

### ArtinSarraf referenced this issue Nov 26, 2018

Merged

#### BUG - remove scaling multiplier from `Period` diff result #23915

Contributor Author

### ArtinSarraf commented Nov 26, 2018

 @jbrockmendel - PR created #23915