Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'infer_freq' does not work with DST transition #8772

Closed
broessli opened this issue Nov 10, 2014 · 5 comments · Fixed by #8782
Closed

'infer_freq' does not work with DST transition #8772

broessli opened this issue Nov 10, 2014 · 5 comments · Fixed by #8782
Labels
Frequency DateOffsets Timezones Timezone data dtype
Milestone

Comments

@broessli
Copy link

Consider the following index crossing a DST transition:

In [1]: import pandas as pd;pd.__version__
Out[1]: '0.15.0'
In [2]: index = pd.date_range(pd.Timestamp("2014-10-25 03:00", tz="Europe/Paris"), periods=10, freq="3H")

In [3]: index
Out[3]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-10-25 03:00:00+02:00, ..., 2014-10-26 05:00:00+01:00]
Length: 10, Freq: 3H, Timezone: Europe/Paris

Pandas cannot infer the index frequency:

In [4]: pd.infer_freq(index) is None
Out[4]: True

If we convert the index to UTC, the frequency can be inferred:

In [5]: pd.infer_freq(index.tz_convert("UTC"))
Out[5]: '3H'

If the index does not cross a DST boundary, the frequency can be inferred as well:

In [6]: index = pd.date_range(pd.Timestamp("2014-10-25 03:00", tz="Europe/Paris"), periods=6, freq="3H")

In [7]: index
Out[7]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-10-25 03:00:00+02:00, ..., 2014-10-25 18:00:00+02:00]
Length: 6, Freq: 3H, Timezone: Europe/Paris

In [8]: pd.infer_freq(index)
Out[8]: '3H'
@jreback jreback added Bug Timezones Timezone data dtype labels Nov 11, 2014
@jreback jreback added this to the 0.15.2 milestone Nov 11, 2014
@jreback jreback removed the Bug label Nov 11, 2014
@jreback
Copy link
Contributor

jreback commented Nov 11, 2014

cc @rockg
cc @ischwabacher

I am not sure this is a bug. The frequency inferer is looking for a constant difference between each of the times (essentially diffing them). So the freq by-definition is different around DST transitions. So None is correct (it is not a regular frequency). Though that may be incorrect (e.g. maybe it should be inferring with UTC?)

@jreback jreback added the Frequency DateOffsets label Nov 11, 2014
@rockg
Copy link
Contributor

rockg commented Nov 11, 2014

I think this is a bug. In terms of hours, there is a constant difference. The code is converting the underlying UTC values to the local timezone so that week/month frequencies are calculated in local time (for example, 2010-12-25 00:00 is a Saturday in Paris but a Friday in UTC) but when that is done across timezones that shift becomes broken. The issue is in tseries.frequencies._FrequencyInferer.

    def __init__(self, index, warn=True):
        self.index = index
        self.values = np.asarray(index).view('i8')

        if hasattr(index,'tz'):
            if index.tz is not None:
                self.values = tslib.tz_convert(self.values, 'UTC', index.tz)

Using the above example, we have:

index.asi8
[1414198800000000000 1414209600000000000 1414220400000000000
 1414231200000000000 1414242000000000000 1414252800000000000
 1414263600000000000 1414274400000000000 1414285200000000000
 1414296000000000000]
np.diff(index.asi8)
[10800000000000 10800000000000 10800000000000 10800000000000 10800000000000
 10800000000000 10800000000000 10800000000000 10800000000000]
self.values
[1414206000000000000 1414216800000000000 1414227600000000000
 1414238400000000000 1414249200000000000 1414260000000000000
 1414270800000000000 1414281600000000000 1414288800000000000
 1414299600000000000]
np.diff(self.values)
[10800000000000 10800000000000 10800000000000 10800000000000 10800000000000
 10800000000000 10800000000000  7200000000000 10800000000000]

Hopefully this is enough for somebody to go off on. It's not immediately obvious what the fix is (besides treating the sub-daily and daily+ offsets differently--not converting in the former and converting in the latter).

@ischwabacher
Copy link
Contributor

I agree that infer_freq should be able to recover the frequency given to date_range since the elapsed time between successive times in the index is constant.

@MartinLinden
Copy link

I would also consider this a bug as the user is looking for the regular elapsed time between two timestamps.
Could we reopen this issue?

@jreback
Copy link
Contributor

jreback commented Aug 22, 2019

pls open a new issue with an example / this is 5 years old

sdementen added a commit to sdementen/pandas that referenced this issue Feb 7, 2021
…and pandas-dev#8772)

Fixes the issues pandas-dev#39556 and pandas-dev#8772 by ensuring that the check for delta being a multiple of a frequency also checks the delta is not 0 (which is a multiple of any number).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants