Resampling uses inconsistent labeling for sub-daily and super-daily frequencies #9586

shoyer · 2015-03-04T09:52:47Z

Resample appears to be use an inconsistent label convention depending on whether the target frequency is sub-daily/daily or super-daily:

For sub-daily/daily frequencies, label='left' makes labels at the timestamp corresponding to the start of each frequency bin, and label='right' that makes labels at that timestamp plus the frequency (at the timestamp dividing exactly dividing bins).
For super-daily frequencies, both labels appears to shifted minus one day to the left, so the timestamps no longer cleanly divide the frequencies. Moreover, the default label shifts from 'left' to 'right'! My guess is that the default was changed here because users were confused by label='left' no longer falling inside the expected interval. (I guess I could check git blame for the details.)

I found this behavior quite surprising and confusing. Is it intentional? I would like to rationalize this if possible, because this strikes me as very poor design. The behavior also couples in a weird way with the closed argument (see the linked issues).

From my perspective (as someone who uses monthly and yearly data), the sub-daily/daily behavior makes sense and the super-daily behavior is a bug: there's no particular reason why it makes sense to use 1 day as an offset for frequencies with super-daily resolution.

CC @Cd48 @kdebrab

Here's my test script:

for orig_freq, target_freq in [('20s', '1min'), ('20min', '1H'), ('10H', '1D'),
                               ('3D', '10D'), ('10D', '1M'), ('1M', 'Q'), ('3M', 'A')]:
    print '%s -> %s:' % (orig_freq, target_freq)
    ind = pd.date_range('2000-01-01', freq=orig_freq, periods=10)
    s = pd.Series(np.arange(10), ind)
    print 'default', s.resample(target_freq, how='first').index[0]
    print 'left', s.resample(target_freq, label='left', how='first').index[0]
    print 'right', s.resample(target_freq, label='right', how='first').index[0]

20s -> 1min:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-01 00:01:00
20min -> 1H:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-01 01:00:00
10H -> 1D:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-02 00:00:00
3D -> 10D:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-11 00:00:00
10D -> 1M:
default 2000-01-31 00:00:00
left 1999-12-31 00:00:00
right 2000-01-31 00:00:00
1M -> Q:
default 2000-03-31 00:00:00
left 1999-12-31 00:00:00
right 2000-03-31 00:00:00
3M -> A:
default 2000-12-31 00:00:00
left 1999-12-31 00:00:00
right 2000-12-31 00:00:00

The text was updated successfully, but these errors were encountered:

shoyer · 2015-03-04T11:10:45Z

OK, after digging more deeply into it.... I discover that M corresponds to the offset "month end", which apparently means the start of the last day of the month. To get "month start", I need to use MS (or likewise, QS or AS).

This is... deeply non-intuitive.

I wish there was a way to change this without breaking a bunch of existing code.

I suppose we could add ME, etc., for month end, but changing the offset M from month-end to month-start seems like a non-starter. Ugh. So I guess we're left with a documentation issue (#5023), unless we want to add a hack for resample.

jreback · 2015-03-04T11:16:39Z

IIRC M matches what scikit-timeseries established, a long established pattern. You could change it but that would prob require a long deprecation cycle (IMHO not worth it).

also xref to #9528 as the fill args to .resample are different from up/down resampling. Further I think the docs to resample need much more. my2c.

MarcoGorelli · 2023-03-01T09:12:19Z

IIRC M matches what scikit-timeseries established, a long established pattern. You could change it but that would prob require a long deprecation cycle (IMHO not worth it).

Given how confusing 'M' is, I think it might actually be worth it

MarcoGorelli · 2023-03-02T10:00:29Z

Right, time for another @pandas-dev/pandas-core @pandas-dev/pandas-triage tag

Would anyone object to a deprecation cycle to remove 'M' and rename it to 'ME'? Likewise for 'Y' -> 'YE'

It's a really common source of confusion, nobody expects 'M' to mean "month end" (or 'Y' to mean "year end") the first time they see it

Concretely, I'd suggest:

pandas 2.1: passing freq='M' warns users that 'M' is deprecated and to use 'MS' (month start) or 'ME' (month end) instead
pandas 3.0: passing freq='M' errors, advising users to use 'MS' (month start) or 'ME' (month end) instead

attack68 · 2023-03-02T10:09:41Z

FWIW in finance the APIs typically use ME for "month end" and M for "monthly" when dealing with scheduling. They might have different contexts but M is never used for month end. I would support the change even if its a long way to go.

datapythonista · 2023-03-02T22:59:54Z

Not sure if there is any convention followed by other apps, or anything, but seems like an improvement. The current API seems quite confusing indeed. And changing the behavior seems easy if M won't be used by anything else after the change, which would make the change way trickier.

Just to be clear, we're talking about the rule (not freq) parameter of DataFrame.resample (and I guess Series.resample), right? Or am I missing something?

WillAyd · 2023-03-02T23:30:08Z

I'd be OK with this deprecation. I get bit by this all of the time

MarcoGorelli · 2023-03-03T10:16:10Z

Just to be clear, we're talking about the rule (not freq)

Yes, that's right, thanks - in resample it's rule and in date_range it's freq, but that's what I meant

jorisvandenbossche · 2023-03-03T14:41:59Z

Would anyone object to a deprecation cycle to remove 'M' and rename it to 'ME'? Likewise for 'Y' -> 'YE'

You only mention monthly and yearly, but the other frequencies that default to right might have the same issue? (although less used): quarterly (Q), weekly (W), and then the business version of monthly/quarterly/yearly.

MarcoGorelli · 2023-03-03T15:05:38Z

Thanks - yes, those too

jbrockmendel · 2023-03-04T20:05:25Z

if M won't be used by anything else after the change

"M" and "Y" would presumably still be used for Period/PeriodDtype.

There are occasional issues with people being surprised that freqs which are valid for date_range are not valid for Period, so it might be helpful to make a cleaner separation between the two concepts. e.g. #38859, #13871, #5091

natmokval · 2023-03-17T22:50:37Z

I would like to work on this issue.
Considering what @MarcoGorelli suggested:

Concretely, I'd suggest:

* pandas 2.1: passing `freq='M'` warns users that `'M'` is deprecated and to use `'MS'` (month start) or `'ME'` (month end) instead

* pandas 3.0: passing `freq='M'` errors, advising users to use `'MS'` (month start) or `'ME'` (month end) instead

I’ll start with warnings while passing freq=‘M’.

AlexKirko · 2023-03-21T06:51:59Z

When I worked in finance, we would do present value valuations to calculate interest rate risk. These kind of inconsistencies were present in some of the software, and they were a nightmare to detect and circumvent. Securities are quite standartized, so if bonds in a particular country mostly pay the coupon at the start of a period, they do so no matter what the period is.

Fixing this would save a lot of people somewhere a lot of hours.

MarcoGorelli · 2023-04-11T13:26:44Z

"M" and "Y" would presumably still be used for Period/PeriodDtype.

Are we sure we want this?

There's an example in the docs which shows:

In[366]: p = pd.Period("2014-07", freq="M")

In[367]: p + pd.offsets.MonthEnd(3)
Out[367]: Period('2014-10', 'M')

In[368]: p + pd.offsets.MonthBegin(3)
Traceback
   ...
ValueError: Input has different freq from Period(freq=M)

@jbrockmendel If the prefix were to stay as "M" for Period but "ME" for the offsets, then wouldn't that cause confusion when people try to sum an offset to a Period? Would people be expected to know that MonthEnd is the offset corresponding to the Period 'M'?

I think it might be simpler (and easier to teach) to just use 'MS' and 'ME', without special-casing Period

MarcoGorelli · 2023-04-12T19:19:47Z

From today's call: seeing as long-term the idea is to decouple Period from Offsets, both p + pd.offsets.MonthEnd(3) and p + pd.offsets.MonthStart(3) would raise, and so it would probably be best to keep 'M' for Period

jbrockmendel · 2023-04-13T14:15:18Z

From today's call: seeing as long-term the idea is to decouple Period from Offsets

To be clear, this is something id like to see, but have no concrete plans to actually implement. There hasn't been a targeted discussion of the idea, except for the mention of it yesterday and lack of objection.

pandas-dev/pandas#9586

shoyer added Datetime Datetime data dtype API Design Resample resample method labels Mar 4, 2015

larsrinn mentioned this issue May 3, 2019

Links on resample documentation page are broken #26275

Closed

mroeschke added the Bug label May 11, 2020

mroeschke added Frequency DateOffsets and removed API Design Datetime Datetime data dtype labels Apr 12, 2021

jorisvandenbossche mentioned this issue Mar 1, 2023

resampling closed='left' incorrect ? #5440

Closed

MarcoGorelli mentioned this issue Mar 1, 2023

WARN/ERR: raise on .resample('1M', closed='left')? #51710

Closed

natmokval mentioned this issue Mar 18, 2023

DEPR offsets: rename 'M' to 'ME' #52064

Merged

AlexKirko assigned natmokval Mar 20, 2023

yarnabrina mentioned this issue Aug 20, 2023

[BUG] fix breaking absolute/relative conversions of ForecastingHorizon when inferred frequency of a DatetimeIndex is "MS" sktime/sktime#5133

Draft

natmokval mentioned this issue Oct 19, 2023

DEPR offsets: rename 'Q' to 'QE' #55553

Merged

3 tasks

natmokval mentioned this issue Nov 1, 2023

DEPR offsets: rename ‘Y’ to ‘YE' #55792

Merged

1 task

natmokval mentioned this issue Dec 13, 2023

DEPR: raise ValueError if invalid period freq pass to asfreq #56489

Merged

natmokval mentioned this issue Jan 16, 2024

DEPR: lowercase freqs 'ye', 'qe', etc. raise a ValueError #56910

Merged

2 tasks

ChadFulton mentioned this issue Feb 27, 2024

QST: Roadmap for deprecations of Period types #56588

Open

2 tasks

martinvonk mentioned this issue Mar 7, 2024

Make Pastas compatible for Pandas 3.0 pastas/pastas#687

Open

4 tasks

shenyulu added a commit to shenyulu/easyclimate that referenced this issue Apr 7, 2024

fix: pandas renamed offsets

91c7ad5

pandas-dev/pandas#9586

femtotrader mentioned this issue Apr 18, 2024

pandas 2.2.0 causes warnings for future deprecations ig-python/trading-ig#329

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resampling uses inconsistent labeling for sub-daily and super-daily frequencies #9586

Resampling uses inconsistent labeling for sub-daily and super-daily frequencies #9586

shoyer commented Mar 4, 2015

shoyer commented Mar 4, 2015

jreback commented Mar 4, 2015

MarcoGorelli commented Mar 1, 2023

MarcoGorelli commented Mar 2, 2023 •

edited

Loading

attack68 commented Mar 2, 2023

datapythonista commented Mar 2, 2023

WillAyd commented Mar 2, 2023

MarcoGorelli commented Mar 3, 2023

jorisvandenbossche commented Mar 3, 2023

MarcoGorelli commented Mar 3, 2023

jbrockmendel commented Mar 4, 2023

natmokval commented Mar 17, 2023 •

edited

Loading

AlexKirko commented Mar 21, 2023

MarcoGorelli commented Apr 11, 2023

MarcoGorelli commented Apr 12, 2023

jbrockmendel commented Apr 13, 2023

Resampling uses inconsistent labeling for sub-daily and super-daily frequencies #9586

Resampling uses inconsistent labeling for sub-daily and super-daily frequencies #9586

Comments

shoyer commented Mar 4, 2015

shoyer commented Mar 4, 2015

jreback commented Mar 4, 2015

MarcoGorelli commented Mar 1, 2023

MarcoGorelli commented Mar 2, 2023 • edited Loading

attack68 commented Mar 2, 2023

datapythonista commented Mar 2, 2023

WillAyd commented Mar 2, 2023

MarcoGorelli commented Mar 3, 2023

jorisvandenbossche commented Mar 3, 2023

MarcoGorelli commented Mar 3, 2023

jbrockmendel commented Mar 4, 2023

natmokval commented Mar 17, 2023 • edited Loading

AlexKirko commented Mar 21, 2023

MarcoGorelli commented Apr 11, 2023

MarcoGorelli commented Apr 12, 2023

jbrockmendel commented Apr 13, 2023

MarcoGorelli commented Mar 2, 2023 •

edited

Loading

natmokval commented Mar 17, 2023 •

edited

Loading