Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set locale to data frame with ambiguous local time leads to #29031

Open
phaabe opened this issue Oct 16, 2019 · 3 comments
Open

set locale to data frame with ambiguous local time leads to #29031

phaabe opened this issue Oct 16, 2019 · 3 comments
Labels
Bug Error Reporting Incorrect or improved errors from pandas Timezones Timezone data dtype

Comments

@phaabe
Copy link

phaabe commented Oct 16, 2019

I read CSV-files with ambiguous local time -- ambiguous due to daylight saving time.

When I apply tz_localize() I run in the following error:

AmbiguousTimeError: There are %i dst switches when there should only be 1.

Here is my code to reproduce the error. Note that from 2am to 3am every datetime is a duplicate due to daylight saving time.

index = pd.DatetimeIndex(['2018-10-28 01:00:00', '2018-10-28 01:15:00',
                          '2018-10-28 01:30:00', '2018-10-28 01:45:00',
                          '2018-10-28 02:00:00', '2018-10-28 02:00:00',
                          '2018-10-28 02:15:00', '2018-10-28 02:15:00',
                          '2018-10-28 02:30:00', '2018-10-28 02:30:00',
                          '2018-10-28 02:45:00', '2018-10-28 02:45:00',
                          '2018-10-28 03:00:00', '2018-10-28 03:15:00',
                          '2018-10-28 03:30:00', '2018-10-28 03:45:00',
                          '2018-10-28 04:00:00'], freq='infer')

data = list(range(len(index)))

df = pd.DataFrame(data=data,index=index)
df.index = df.index.tz_localize('Europe/Berlin', ambiguous='infer')

I don't quite understand why the error occurs even with parameter ambiguous='infer'.

Does my dataframe have to be "sorted" in any way? I think data frames are ordered by default since there is DataFrame.iloc

Or is there a specific problem with pd.read_csv()?

I wish to apply ambiguous='infer' with respect to the order of the CSV file. Precisly I mean: the order of the ambiguous time format helps to differ between summer and winter time.

I would be glad about help for a solution and about a broader explanation. I couldn't find out more about how order can play a role in data frames--does it sometimes?

If order is not the problem here, than I believe this is an issue.

@mroeschke
Copy link
Member

The infer logic essentially looks for subsequent dates that "fall behind" a previous date (see the example in https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.tz_localize.html). Your index is monotonically increasing.

Nonetheless, it looks like the error message is malformed and could be fixed. PR's to fix that are welcome!

@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas Timezones Timezone data dtype labels Oct 16, 2019
@phaabe
Copy link
Author

phaabe commented Oct 16, 2019

Thanks for the clear answer.

But then I see a feature request here. Would you agree?

@mroeschke
Copy link
Member

Indeed. The infer logic could be improved to handle cases like this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

2 participants